Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Fastest way to extract differing positions from each alignment in a BAM file

    Hi,

    What would be the fastest way (I have to do this hundreds of millions times) to extract for each aligned read in a BAM file:
    1) The positions where the read bases differ from a reference sequence.
    2) The PHRED base quality values of these bases. If the difference is an indel, the quality value will, of course, be skipped.

    As far as I know, I cannot use mpileup or anything I know of due to memory limitation as this is a very custom amplicon reference analysis, with >500 million coverage per base position on the reference amplicon.

    In short, I need to apply an efficient approach to extract all differing positions for each aligned read.

    Thanks.
    Last edited by CHRYSES; 12-14-2011, 06:52 AM.

  • #2
    samtools and bamtools both provide very fast APIs you can use, however this requires a minimal experience with a programming language or script...

    Comment


    • #3
      Originally posted by genericforms View Post
      samtools and bamtools both provide very fast APIs you can use, however this requires a minimal experience with a programming language or script...
      Yeah, I tried to get into that, but I am not good at "C" language, I could not follow it. I hope someone else has created/thought of something...

      I could go directly into text format (i.e. SAM) and parse it with PERL, but that's really very slooooooow.

      Comment


      • #4
        If you are going to examine every read in order then I suppose you could parse a giant text file. This sort of sequential analysis is hard to speed up unless you parallelize it. If you are parsing a text file you could break it into many parts and then run your PERL script on the many parts in parallel (if you have access to that kind of equipment).

        I am not aware of an off the shelf tool. Sorry!

        Personally I opt for pthreads and C/C++...

        Comment


        • #5
          Illumina reads are error prone. If you pull every single read with a discrepancy from reference, you are going to pull a lot of noise.

          I don't think that a pileup can be generated with only variant positions, but you could grep the pileup to only get lines with alterante letters. The pileup will have the position, all the letters called by all the reads that cross the position, and all the qualities for all the reads that cross the position.

          Comment


          • #6
            Originally posted by swbarnes2 View Post
            Illumina reads are error prone. If you pull every single read with a discrepancy from reference, you are going to pull a lot of noise.

            I don't think that a pileup can be generated with only variant positions, but you could grep the pileup to only get lines with alterante letters. The pileup will have the position, all the letters called by all the reads that cross the position, and all the qualities for all the reads that cross the position.
            Yes, but how can I run a pileup on a single position with 500 million X coverage ? I think i will need to do this read by read...

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Recent Advances in Sequencing Analysis Tools
              by seqadmin


              The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
              Yesterday, 07:48 AM
            • seqadmin
              Essential Discoveries and Tools in Epitranscriptomics
              by seqadmin




              The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
              04-22-2024, 07:01 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, Today, 06:57 AM
            0 responses
            9 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, Yesterday, 07:17 AM
            0 responses
            13 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 05-02-2024, 08:06 AM
            0 responses
            19 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-30-2024, 12:17 PM
            0 responses
            22 views
            0 likes
            Last Post seqadmin  
            Working...
            X