Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Targeted amplicon sequencing, is full overlap necessary?

    Hi,

    Has anyone got experience running targeted amplicon libraries at less than full overlap?

    We have been asked to run thunderbolts libraries that recommend 2x 250 bp to achieve full overlap but we only have access to 2x 150.

    Is this a total waste of time or just not ideal?

    Ideally we'd want to reach variant frequency of <<10%

    Thanks

  • #2
    A related question...the ability to identify and remove chimeric amplicons when lacking overlapping sequences has come up for a recent data set. Does anyone have a feel for how this may impact data analysis?

    Comment


    • #3
      You don't need a full overlap for amplicon libraries. You just need enough of an overlap to merge the reads. Depending on your read quality, a aiming for 30-50bp overlap should be fine.

      Chimeric non-overlapping amplicons are hard to detect. If you cluster your pairs, and then find pairs in which the two reads map to different clusters, then you could assume that those pairs are chimeric. But the sensitivity and specificity depend completely on the quality of clustering.

      Comment


      • #4
        Thank you Brian, that makes sense.

        Comment


        • #5
          The base quality will be significantly improved if you fully overlap your reads for amplicon sequencing. see http://aem.asm.org/content/79/17/511...7-5d313d15b9a5 for a comparison of the sequence quality for different lenghts of 16s sequencing.

          If you're interested in very small number of base differences, you absolutely need to fully overlap. If you need 5-10% differences, maybe you could get away with not fully overlapping. But the cost difference between the 2 kits is only a couple of hundred $, your downstream computation time will be much much greater with marginal sequences which likely will cost thousands rather than hundreds

          ETA-my cost estimate for computational time is based on 16S, I've never dealt with any of the cancer panels so don't know how significantly poor quality bases would impact your results
          Last edited by thermophile; 10-16-2015, 08:57 AM.
          Microbial ecologist, running a sequencing core. I have lots of strong opinions on how to survey communities, pretty sure some are even correct.

          Comment


          • #6
            Originally posted by thermophile View Post
            The base quality will be significantly improved if you fully overlap your reads for amplicon sequencing. see http://aem.asm.org/content/79/17/511...7-5d313d15b9a5 for a comparison of the sequence quality for different lenghts of 16s sequencing.

            If you're interested in very small number of base differences, you absolutely need to fully overlap. If you need 5-10% differences, maybe you could get away with not fully overlapping. But the cost difference between the 2 kits is only a couple of hundred $, your downstream computation time will be much much greater with marginal sequences which likely will cost thousands rather than hundreds

            ETA-my cost estimate for computational time is based on 16S, I've never dealt with any of the cancer panels so don't know how significantly poor quality bases would impact your results
            I disagree. First off, I'm not sure what computation time you are talking about. How are you incurring thousands of dollars of compute costs from this kind of data?

            Second, the data that paper used was low quality and not indicative of what I would expect from a properly-design MiSeq 2x250 amplicon run, using staggered primers and an appropriate amount of spike-in, etc.

            Third, errors due to incorrect merges and errors in the reads themselves are conflated; since the former are due to the specific software used for overlapping, and are also a function of the overlap length, you can't really draw a conclusion about the error rates of overlapping reads using any methodology but the one described in the paper. Unfortunately, it's not described in the paper - rather, they sort of hint that it's described here, where I guess it occurs in the make.contigs command. I have not tested that, but would be very surprised if it was the best available tool for the purpose.

            Fourth, 2x150 reads have a much lower error rate than 2x250 reads. If they overlap by 50bp, then the only nonoverlapping portion is the first and last 100bp, which have around a peak 0.2% error rate for R1 and 0.5% error rate for R2 (average is lower), including all reads with no quality-filtering. Those are on HiSeq; MiSeq error rates are generally lower.

            Longer reads and longer overlaps are better, of course. But 2x150 is viable as long as there is sufficient overlap to merge, and you can tolerate a fraction of a percent error rate in the non-double-sequenced portion.

            Comment


            • #7
              If you are sequencing amplicons for 16s you need to cluster the sequences into OTUs. the more sequencing errors you have the more spurious OTUs you generate-which massively increases the memory require to cluster those (assuming you are doing de novo clustering). If you get them clustered, you then will waste a lot of time trying to find meaning in the sequencing noise or you can just throw out all of the rare OTUs which means that you will be throwing out good data along with the bad because you can't tell the difference between the good and bad rares. Ecologically this matters, for a cancer panel-maybe it doesn't.
              Microbial ecologist, running a sequencing core. I have lots of strong opinions on how to survey communities, pretty sure some are even correct.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Techniques and Challenges in Conservation Genomics
                by seqadmin



                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                Avian Conservation
                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                03-08-2024, 10:41 AM
              • seqadmin
                The Impact of AI in Genomic Medicine
                by seqadmin



                Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
                02-26-2024, 02:07 PM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 03-14-2024, 06:13 AM
              0 responses
              34 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-08-2024, 08:03 AM
              0 responses
              72 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-07-2024, 08:13 AM
              0 responses
              81 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-06-2024, 09:51 AM
              0 responses
              68 views
              0 likes
              Last Post seqadmin  
              Working...
              X