Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Not covered regions between bam and bed.

    Hello,

    My input files are:
    • input.bam (paired end) Illumina alignment obtained with BWA.
    • regions.bed file which contains the intervals of the current design


    Here's a sample of regions.bed

    Code:
    chr1	10292367	10292512
    chr1	10316285	10316401
    chr1	10318531	10318750
    chr1	10321943	10322048
    chr1	10327418	10327636
    chr1	10328190	10328341
    The objective is to obtain which intervals of "regions.bed" design are not being covered in bam.

    I think my problem is that i need to generate a BED file from the original BAM representing the regions that are being covered. And I've no idea how to do that.

    I've tried converting the bam input into a bed file with bedtools. Then intersecting both bed files with -v option with bedtools intersect.
    The problem comes when I get this weird output in resulting bed file:

    That 0 value is really weird...

    chr1 10833 10931 M01167:3:000000000-A37T8:1:1110:11018:11790/1 0 + 10833 10931 0,0,0 1 98, 0,
    chr1 10977 11027 M01167:3:000000000-A37T8:1:1101:10241:4209/2 0 + 10977 11027 0,0,0 1 50, 0,

    And lines being repeated even if it's not the same read:

    chr1 10292578 10292679 M01167:3:000000000-A37T8:1:2104:17089:16810/1 60 - 10292578 10292679 0,0,0 1 101, 0,
    chr1 10292579 10292680 M01167:3:000000000-A37T8:1:2105:8403:22087/1 60 - 10292579 10292680 0,0,0 1 101, 0,
    chr1 10292579 10292680 M01167:3:000000000-A37T8:1:2112:16677:2861/1 60 - 10292579 10292680 0,0,0 1 101, 0,


    Thank you.

    Best regards,
    gmarco.
    Last edited by gmarco; 05-06-2013, 08:05 AM.

  • #2
    Give bedops a try. http://code.google.com/p/bedops/

    Comment


    • #3
      Your intersectBed output suggests that you are using the Bam file converted to bed as "-a" and regions.bed as "-b". However, if you switch -a and -b files around, and then use -v, you'll get regions from your "regions.bed" file which don't have any reads aligned to them, i.e.

      intersectBed -a regions.bed -b alignment.bed -v > regions_not_covered.bed

      Comment


      • #4
        Originally posted by aggp11 View Post
        Your intersectBed output suggests that you are using the Bam file converted to bed as "-a" and regions.bed as "-b". However, if you switch -a and -b files around, and then use -v, you'll get regions from your "regions.bed" file which don't have any reads aligned to them, i.e.

        intersectBed -a regions.bed -b alignment.bed -v > regions_not_covered.bed
        The alignment.bed comming from BAM conversion is the problem.

        This is is the head for the actual BAM (being visualized in sam format with samtools), note that i skipped headlines to reduce text in post:

        Code:
        M01167:3:000000000-A37T8:1:1110:11018:11790     65      chr1    10834   0       98M3S   chr15   102520130       0       CAGGCGCAGAGAGGCGCGCCGCGCCGGCGCAGGCTCAGAGACACATGCTACCGCGTCCAGGGGTGGAGGCGTGGCGCAGGCGCAGAGAGGCGCACCGCGCC   CDFF?B@DEBEDJIH?CCHF@G1FCCJG1C>EI@CF@GA5DDCCCAIJ@AFECC4>DIBHGBH<IHCGDB;@?E>?GBJGGCA?JAIBDBA?/?@FAB;>%   X0:i:2  X1:i:0  XC:i:98 BD:Z:NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN      MD:Z:34G63      RG:Z:S6.lane1   XG:i:0  BI:Z:NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN      AM:i:0  NM:i:1  SM:i:0  XM:i:1  XO:i:0  MQ:i:0  XT:A:R
        M01167:3:000000000-A37T8:1:1101:10241:4209      163     chr1    10978   0       50M51S  =       11063   177     GCGTGGCGCAGGCGCAGAGACGCAAGCCTACGGGCGGGGGTTGGGGGGGCGGGTGTTGCAGGAGCAAAGTCGCACGGCGCCGGGCTGGGGCGGGGGGGGGG   @?B@FC@@=C6CG>FB><6?E@[email protected];622>3GG=4%%2)&;FD?BF+*&9".&')8@@>>%,*=)7&&>?7@B(.,&&*9%%;,0   X0:i:2  X1:i:2  XA:Z:chr15,-102520143,101M,0;chr12,-94585,101M,1;chr16,+60657,101M,1;   XC:i:50 BD:Z:NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN      MD:Z:50 RG:Z:S6.lane1   XG:i:0  BI:Z:NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN      AM:i:0  NM:i:0  SM:i:0  XM:i:0  XO:i:0  MQ:i:0  XT:A:R
        M01167:3:000000000-A37T8:1:1101:10241:4209      83      chr1    11063   0       9S92M   =       10978   -177    GGCGGGGGGGGGGGGGGGGGGGGGGTGGCGCCGTGCACGCGCAGAAACTCACGTCACGGTGGCGCGGCGCAGAGACGGGTAGAACCTCAGTAATCCGAAAA   %-;8">$8(4?<<7=%>7GHAE?6CCD@4FHDECDHC<GBIICEBC@KDJ@BEBI@CDECG6BFAEGBIIDFEEAADEEACGA@IJBJCD@@>BFBACBAD   X0:i:2  X1:i:2  XA:Z:chr15,+102520016,101M,3;chr12,+94455,101M,4;chr9,-11176,101M,4;    XC:i:92 BD:Z:NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN      MD:Z:0T4C6A79   RG:Z:S6.lane1   XG:i:0  BI:Z:NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN      AM:i:0  NM:i:3  SM:i:0  XM:i:3  XO:i:0  MQ:i:0  XT:A:R
        M01167:3:000000000-A37T8:1:1113:11938:2404      129     chr1    16452   16      100M1S  chr2    114354342       0       AAACCACTATTTTATGAACCAAGTAGAACAAGATACTTGAAATGGAAACTATTCAAAAAATTGAGAATTTCTGACCACTTAACAAACCCACAGAAAATCCA   8?4?>CBB>A>>@@?ACEDEBEJ<?JBE/BBHB@?>C?ICFH@KHCGGEC>BDECD2GEAABJEIBGCBBFDKDEDCEEABGEDGGFFD4CCBD9@C@AC5   X0:i:1  X1:i:5  XC:i:100        BD:Z:NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN      MD:Z:35T64      RG:Z:S6.lane1   XG:i:0  BI:Z:NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN      AM:i:0  NM:i:1  SM:i:16 XM:i:1  XO:i:0  MQ:i:0  XT:A:U
        And this are the first lines of BAM after being converted to BED with bedtools:

        Code:
        chr1	10833	10931	M01167:3:000000000-A37T8:1:1110:11018:11790/1	0	+
        chr1	10977	11027	M01167:3:000000000-A37T8:1:1101:10241:4209/2	0	+
        chr1	11062	11154	M01167:3:000000000-A37T8:1:1101:10241:4209/1	0	-
        chr1	16451	16551	M01167:3:000000000-A37T8:1:1113:11938:2404/2	16	+
        chr1	17425	17526	M01167:3:000000000-A37T8:1:2109:28144:20710/1	37	+
        chr1	17542	17643	M01167:3:000000000-A37T8:1:2109:28144:20710/2	36	-
        chr1	17600	17701	M01167:3:000000000-A37T8:1:1108:13628:12197/2	0	-
        If by default the 5th column is the mapping quality value why are reads with 0 score being reported?

        I'm a bit skeptical about "bamToBead" because it converts each read from bam into a bed read. And what I want are the intervals (in bed format) that are being aligned on original alignment.bam. That's why i'm not sure if the result is correct and how can I check it.

        Following your steps I've achieved this regions_not_covered.bed, now I would like to know can I check if that result is correct and those are really the non covered regions:

        intersectBed -a target_regions.bed -b alignment_converted.bed -v > regions_not_covered.bed:

        Code:
        chr1    161296421       161296578
        chr2    48011607        48012135
        chr2    190671129       190671248
        chr3    10073963        10074055
        chr3    10087191        10087448
        chr3    10103815        10103915
        chr3    37036301        37036464
        chr5    112145803       112145872
        chr5    131928436       131928662
        chr5    176563854       176563977
        chr5    176656746       176656895
        chr6    35420303        35420590
        chr7    116335791       116335873
        chr7    116364180       116364323
        chr11   94163057        94163172
        chr11   94216916        94217025
        chr11   111963784       111963941
        chr11   111978249       111978310
        chr11   111989995       111990035
        chr13   32949460        32949563
        chr13   32970108        32970249
        chr14   45650018        45650112
        chr16   14021287        14021532
        chr16   89829134        89829221
        chr16   89844745        89844890
        chr17   7565237 7565352
        chr17   41262532        41262617
        chr17   56783830        56783989
        chr17   59861227        59861288
        chr19   45862098        45862190
        chr22   29117568        29117639
        chr22   29125289        29125384
        chr22   29126388        29126556
        chr22   29133228        29133291
        Last edited by gmarco; 05-06-2013, 11:22 PM. Reason: missing content

        Comment


        • #5
          Originally posted by GenoMax View Post
          I've been trying to use them and after copying all bedops bin into my /usr/bin as sudo when I try to execute any bedops tool i get this error message:

          Code:
          bedops
          bash: /usr/bin/bedops: cannot execute binary file
          I'm using ubuntu 12.04 x86 (32bit)

          Comment


          • #6
            After realizing that I cannot really ensure if what I obtain is correct or not I made a small test setup:

            1) short_alignment.bed (few lines converted from original BAM align):

            Code:
            chr1	10833	10931	M01167:3:000000000-A37T8:1:1110:11018:11790/1	0	+
            chr1	10833	10883	M01167:3:000000000-A37T8:1:1101:10241:4209/2	0	+
            chr1	10833	10933	M01167:3:000000000-A37T8:1:1113:11938:2404/2	16	+
            chr1	10836	10937	M01167:3:000000000-A37T8:1:2109:28144:20710/1	37	+
            chr1	11062	11154	M01167:3:000000000-A37T8:1:1101:10241:4209/1	0	-
            2) I manually edited my target_regions.bed for test purpose:
            Code:
            chr1    10130   11154
            3) Launched the following commands:

            intersectBed -a target_regions.bed -b short_alignment.bed -v
            Code:
            No output.
            intersectBed -a target_regions.bed -b short_alignment.bed -v -f 1
            Code:
            chr1	10130	11154
            With -f 1 option I can obtain intervals not totally being covered by 1 bp gap.
            Is great however I want to obtain the part of the intervals not being covered.

            Conclusion if my target region is from 1 to 10 and I map from 3 to 7 and would like to obtain intervals 1-2, 8-10.

            I don't know if i'm explaining myself clearly.
            Last edited by gmarco; 05-07-2013, 02:51 AM.

            Comment


            • #7
              Originally posted by gmarco View Post
              I've been trying to use them and after copying all bedops bin into my /usr/bin as sudo when I try to execute any bedops tool i get this error message:

              Code:
              bedops
              bash: /usr/bin/bedops: cannot execute binary file
              I'm using ubuntu 12.04 x86 (32bit)
              The pre-compiled binaries at the bedops site are for 64-bit linux. Since you are using 32-bit linux you will need to compile the program from the source code.

              Comment


              • #8
                Ok I think I solved it !

                The tool i was looking for is subtractBed !

                Code:
                ::::::::::::::
                1.bed
                ::::::::::::::
                chr1	0	10
                chr1	11	20
                chr2	15	30
                Code:
                ::::::::::::::
                test.bed
                ::::::::::::::
                chr1	5	8
                chr1	18	30
                chr2	16	20
                subtractBed -a 1.bed -b test.bed

                Code:
                chr1	0	5
                chr1	8	10
                chr1	11	18
                chr2	15	16
                chr2	20	30

                Comment


                • #9
                  Originally posted by gmarco View Post
                  I've been trying to use them and after copying all bedops bin into my /usr/bin as sudo when I try to execute any bedops tool i get this error message:

                  Code:
                  bedops
                  bash: /usr/bin/bedops: cannot execute binary file
                  I'm using ubuntu 12.04 x86 (32bit)
                  I published 32-bit Linux binaries on the BEDOPS site earlier today. These are static binaries built on an Ubuntu 13.04 host with gcc 4.7.3.

                  $ file ./bin/bedops
                  ./bin/bedops: ELF 32-bit LSB executable, Intel 80386, version 1 (GNU/Linux), statically linked, for GNU/Linux 2.6.24, BuildID[sha1]=0x406078819a64405ac8ca15038260c6e482798f28, stripped


                  If you download them and try them out, I'd welcome any feedback, bug reports, etc.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Essential Discoveries and Tools in Epitranscriptomics
                    by seqadmin




                    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                    04-22-2024, 07:01 AM
                  • seqadmin
                    Current Approaches to Protein Sequencing
                    by seqadmin


                    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                    04-04-2024, 04:25 PM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, Yesterday, 08:47 AM
                  0 responses
                  12 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-11-2024, 12:08 PM
                  0 responses
                  60 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 10:19 PM
                  0 responses
                  59 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 09:21 AM
                  0 responses
                  54 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X