Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Fastest way to extract differing positions from each alignment in a BAM file

    Hi,

    What would be the fastest way (I have to do this hundreds of millions times) to extract for each aligned read in a BAM file:
    1) The positions where the read bases differ from a reference sequence.
    2) The PHRED base quality values of these bases. If the difference is an indel, the quality value will, of course, be skipped.

    As far as I know, I cannot use mpileup or anything I know of due to memory limitation as this is a very custom amplicon reference analysis, with >500 million coverage per base position on the reference amplicon.

    In short, I need to apply an efficient approach to extract all differing positions for each aligned read.

    Thanks.
    Last edited by CHRYSES; 12-14-2011, 06:52 AM.

  • #2
    samtools and bamtools both provide very fast APIs you can use, however this requires a minimal experience with a programming language or script...

    Comment


    • #3
      Originally posted by genericforms View Post
      samtools and bamtools both provide very fast APIs you can use, however this requires a minimal experience with a programming language or script...
      Yeah, I tried to get into that, but I am not good at "C" language, I could not follow it. I hope someone else has created/thought of something...

      I could go directly into text format (i.e. SAM) and parse it with PERL, but that's really very slooooooow.

      Comment


      • #4
        If you are going to examine every read in order then I suppose you could parse a giant text file. This sort of sequential analysis is hard to speed up unless you parallelize it. If you are parsing a text file you could break it into many parts and then run your PERL script on the many parts in parallel (if you have access to that kind of equipment).

        I am not aware of an off the shelf tool. Sorry!

        Personally I opt for pthreads and C/C++...

        Comment


        • #5
          Illumina reads are error prone. If you pull every single read with a discrepancy from reference, you are going to pull a lot of noise.

          I don't think that a pileup can be generated with only variant positions, but you could grep the pileup to only get lines with alterante letters. The pileup will have the position, all the letters called by all the reads that cross the position, and all the qualities for all the reads that cross the position.

          Comment


          • #6
            Originally posted by swbarnes2 View Post
            Illumina reads are error prone. If you pull every single read with a discrepancy from reference, you are going to pull a lot of noise.

            I don't think that a pileup can be generated with only variant positions, but you could grep the pileup to only get lines with alterante letters. The pileup will have the position, all the letters called by all the reads that cross the position, and all the qualities for all the reads that cross the position.
            Yes, but how can I run a pileup on a single position with 500 million X coverage ? I think i will need to do this read by read...

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM
            • seqadmin
              Techniques and Challenges in Conservation Genomics
              by seqadmin



              The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

              Avian Conservation
              Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
              03-08-2024, 10:41 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, Yesterday, 06:37 PM
            0 responses
            11 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, Yesterday, 06:07 PM
            0 responses
            10 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-22-2024, 10:03 AM
            0 responses
            51 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-21-2024, 07:32 AM
            0 responses
            68 views
            0 likes
            Last Post seqadmin  
            Working...
            X