Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • SAMtools pileup of millions of reads from a single amplicon

    Hi all,


    We would like to pileup millions of reads from a single amplicon for ultra-sensitive mutation detection.

    Considering that SAMtools pileup is limited to several thousand reads at a given position I am wondering if you could suggest us any alternative approach or workaround.


    Any feedback is highly appreciated!

  • #2
    Is that limit documented somewhere or based on personal experience?

    Heng Li has referred to pileup being able to use 200GB BAM's before (albeit not for one amplicon) http://seqanswers.com/forums/showthread.php?t=6680

    Comment


    • #3
      I use
      samtools mpileup -BQ60 -d500000 -D -f

      for our low-variant detection. The "-d" is "-d INT At a position, read maximally INT reads per input BAM. [250]" which limits the depth of the pileup. I turn off the BAQ calculation as I find it depresses scores of any variant, and while we only allow quality scores of 60 that is because our method greatly improves the quality scores so if you are looking at normal reads you might skip that or set -Q to 30.
      Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com

      Comment


      • #4
        Given the error-prone nature of Illumina sequencing, there is a limit to how ultra sensitive you can be. I am skeptical that millions of reads will give you more true positives than a hundred thousand.

        Comment


        • #5
          Originally posted by swbarnes2 View Post
          Given the error-prone nature of Illumina sequencing, there is a limit to how ultra sensitive you can be. I am skeptical that millions of reads will give you more true positives than a hundred thousand.
          Agreed. The race to the bottom for ultra-sensitive variant detection seems to be conveniently ignoring the false positive rate right now and it's quite disconcerting. Combined with your PCR induced errors, you're asking for trouble.
          Last edited by Bukowski; 02-19-2014, 04:19 PM.

          Comment


          • #6
            Originally posted by Bukowski View Post
            Agreed. The race to the bottom for ultra-sensitive variant detection seems to be conveniently ignoring the false positive rate right now and it's quite disconcerting. Combined with your PCR induced errors, you're asking for trouble.
            Of course, you're right! We are also thinking about these problems and try to face them using corresponding control samples.
            But this is another question, I just wanted to know if it would be possible to map millions of reads to one and the same location, process them with (m)pileup and call variants on it.

            Comment


            • #7
              Originally posted by svos View Post
              Of course, you're right! We are also thinking about these problems and try to face them using corresponding control samples.
              But this is another question, I just wanted to know if it would be possible to map millions of reads to one and the same location, process them with (m)pileup and call variants on it.
              It's hard to say without knowing exactly how low you are trying to go, but I would NOT believe mpileup on anything less than a few % unless I had very solid spike-in data proving that the false positive and false negative rates were acceptable.

              Comment


              • #8
                Originally posted by swbarnes2 View Post
                It's hard to say without knowing exactly how low you are trying to go, but I would NOT believe mpileup on anything less than a few % unless I had very solid spike-in data proving that the false positive and false negative rates were acceptable.
                Again, you're right, but thats another problem... Hopefully we will have control settings allowing us to perform such an analysis.

                The simple question is, is this kind of variant detection possible in respect to its technical / bioinformatic setting using e.g. (m)pileup or an alternative? Or will we face the problems already here (without thinking about the biological and sequencing background)?

                Comment


                • #9
                  Perhaps one solution is to compute it in sections (say 1000 reads at a time), computing a vector of ACGT- at each point along with confidences, and then combining those vectors together in a second round of mpileup.

                  It's not possible with the current code, but in principle the "reduced-reads" style notation (done formally) could yield a way to compute extreme depth pileups in a memory-tractable manner.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Essential Discoveries and Tools in Epitranscriptomics
                    by seqadmin




                    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                    04-22-2024, 07:01 AM
                  • seqadmin
                    Current Approaches to Protein Sequencing
                    by seqadmin


                    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                    04-04-2024, 04:25 PM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, Yesterday, 11:49 AM
                  0 responses
                  15 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-24-2024, 08:47 AM
                  0 responses
                  16 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-11-2024, 12:08 PM
                  0 responses
                  61 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 10:19 PM
                  0 responses
                  60 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X