Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    My caution is because they have reported results for 3 primer pairs.

    For your region of interest 300PE is essential even though current 600 cycle reagents gives inconsistent results. Overal Q scores for the 3' end of reads is low but a subset of reads stil have good quality. Phasing reads increases yield by enabling higher cluster density and lower Phix spike-in (at least 1.5x) and therefore increases reads with good quality 3' end.

    I think Wu's method has set good benchmark if their reported lack of biase can be shown in other experiments as well. Addition of 7 base spacer would have minimal effect on the number of assembled paired reads.

    Comment


    • #17
      their benchmarks are for just v4 (~275bp) on 500 cycle v2 chemistry. You're going to have to do your own experiments to see if that can be extrapolated to v3-4 (~550bp) on v3 chemistry (which seems to cause everyone issues for amplicon sequencing).

      My opinion, staggered primers aren't worth the effort. Two step PCR introduces errors on it's own. I'm happy multiplexing a couple hundred samples per run which gives me plenty of sequences/sample without trying to max out the cluster density.
      Microbial ecologist, running a sequencing core. I have lots of strong opinions on how to survey communities, pretty sure some are even correct.

      Comment


      • #18
        @thermophile

        Staggered primers can increase output 1.5-2x resulting in decreased sequencing cost.

        To my knowledge no current community profiling method has been shown to be biase free or ever been tested except Wu's work.

        I wonder what number of reads per sample you consider adequate.

        Comment


        • #19
          Miseq data, i rarify to 10k seqs for analysis. Unless you are searching for very rare members of the community, even that is likely overkill. A decade ago we were using 100 clones/sample, then 1000-2500 seqs/sample for 454 data and finding similar patterns.

          People that ask me for more than 10k seq/sample and that aren't searching for rares, i urge to analyze more samples instead. Either more time points or more technical replicates-extract more than one 1/4 g per sample rather than just sequence the same 1/4g over and over.

          This advice is especially true for anyone who's bioinformatic processing includes a step that removes all sequences below a certain % of the community (which is a pretty common practice). If you're only interested in the most abundant organisms (anything over 1% or .5% of the community is abundant when talking bacterial communities), why spend money on sequencing the rares?
          Microbial ecologist, running a sequencing core. I have lots of strong opinions on how to survey communities, pretty sure some are even correct.

          Comment


          • #20
            I have seen absolutely terrible quality from 2x300bp Illumina kits in amplicon sequencing, to the point that the data was completely unusable with (IIRC) >15% error rate. I have not seen any Illumina 2x300bp amplicon data of good quality. That doesn't mean it doesn't exist, because I don't look at all of our data, but I have not seen it. I have, however, seen good quality 2x250bp amplicon data.

            At JGI we do use staggered primers, which I always thought was an excellent decision because it greatly increases the color diversity during sequencing. I was unaware that they might increase bias; could you explain that?

            As for rarifying to 10k sequences (I would call it subsampling), that will clearly reduce your statistical power. To achieve cost-savings... how do you even multiplex to the point that you get 10k reads per sample? And removing anything under 1%... yes, certainly, that makes deep sequencing unnecessary (even if you define "deep" as a few thousand reads ) but whether or not it is common practice, that seems like an incredibly bad idea to me. I think that if you have enough reads to form a cluster that is statistically distinct from clusters from coincidental matches of high-error or incorrectly-merged reads, and therefore indicates an active species, you can gain knowledge from it. As you noted, amplification causes bias. Potentially, a species present at 0.9% in your amplified sequencing could be 20% of your actual community that was outcompeted at the amplification stage, right? In which case, it would in fact be very important to compare at different time-points, etc. Or it could actually represent 0.9% of the community that performs a crucial role of, say, mobilizing a specific element like iron rather than doing the bulk work of eating a specific carbon-containing molecule. Which one is interesting depends on your research, of course, but a community is a community because it is diverse, and presumably the low-abundance members of the community are essential to the function of the community, or else they would not exist.

            Comment


            • #21
              I haven't worked with comm data out of jgi since 454 days, so was unaware they'd adopted staggered primers. To get *minimum* 10k seqs/sample (not average) i multiplex 150ish samples (plus either genomes, another amplicon, or phix) and aim to cluster around 750. Most samples get way more than 10k seqs, but very few end up with less.

              I don't see how rarifying/subsampling to 10k would change statistical power for comm analysis because the unit being compared is sample. Statistical power is coming from number of samples rather than number of sequences. This is certainly true for any analysis that is distance matrix based. Even if you are doing some type of modeling, the power limitation is still number of samples rather than number of observations within each otu.

              What kind of community analyses draw power from number of seqs?
              Microbial ecologist, running a sequencing core. I have lots of strong opinions on how to survey communities, pretty sure some are even correct.

              Comment


              • #22
                If you are interested in how a specific species' fraction of a community changes over time, or what that fraction is in a single sample, the quantity of data will improve your resolution. If you subsample down to 1 read, obviously, the results will be garbage. At 10 reads, they will be slightly better, but still garbage. At 1 billion reads, you will have extremely fine-grained resolution that allows very precise calculations. If you take your hypothetical 1 billion reads and subsample down to 10k... you lose all of that precision, and of course you also lose the ability to separate error clusters from real clusters in low-abundance species. What do you gain? Nothing, as far as I can tell, except that things will run faster. So if you are compute-limited, this kind of makes sense, even though the results of the research will be ... "limited". But since compute time is so much cheaper than sequencing, I really cannot imagine a scenario where it's a good idea.

                Oh... to answer your last question, ALL analyses draw power from the number of sequences. That's kind of how statistics works, not to mention that bioinformatics in general is reliant on high redundancy to compensate for errors in sequencing.

                Comment


                • #23
                  Originally posted by Brian Bushnell View Post
                  I was unaware that they might increase bias; could you explain that?
                  Any primer with non-template sequence overhang in 5’ end such as fusion primers, 5’ variable length spacers or barcodes and primers with common overhang can bias amplification of some templates. This is due to overhang being partially or completely complementary to some templates which increases their binding strength and amplification and also to possible interactions between primer pairs used in PCR.
                  Community profiling with these primers usually involves using different primer pairs for each sample where template-specific sequences are shared while the overhang is not. So, it is essential to test that all primer pairs produce similar results when used with one input DNA (preferably a control community with known composition). This becomes even more crucial for time series where community structure is followed over time and different primer pairs might be used for a sample in different time points.

                  Comment


                  • #24
                    That makes sense. Thanks!

                    Comment


                    • #25
                      Rarefaction/subsamping isn't done to speed up computer processing but rather to compare communities at the same level of sampling effort. With these data there is no biological reason that one sample has 500k sequences and another has 10k, it's mostly to do with how evenly the libraries are pooled and a little due to PCR efficiency. Since we rarely sample the community to completion, the sample with more seqs will appear more diverse just based on sequencing effort.

                      Your example of the statistical analysis is typical change detection, that is not the basis of most community analyses.
                      Microbial ecologist, running a sequencing core. I have lots of strong opinions on how to survey communities, pretty sure some are even correct.

                      Comment


                      • #26
                        Ummm, so what should I actually do????

                        Thank you all for your contributions to this discussion. I was away and just got to read the followup comments - some of which I can't really fully follow since I have yet to complete a library and get data back. Luckily we will be getting help with the analysis.

                        Should I consider using 2 separate primer sets so I can use PE250, one for V3 and another for V4? If so, could I split up the amplicons evenly and get good data from one library? If I can afford to do 2 separate libraries, one for V3 and one for V4 should I just do that instead? How much more information - or perhaps I mean power for meaningful analysis- would this provide? We may have money for that, but I won't know for sure till next month. In a perfect world I'm sure this is the best answer to the problem, but we know the world isn't perfect.

                        It seems I should definitely avoid using PE300 since the chemistry is not working consistently yet! I'm just trying to figure out how to get the most out of what we can currently do! I'm going with the staggered primers and I need to order them very soon.

                        Comment


                        • #27
                          Originally posted by urchin View Post
                          Should I consider using 2 separate primer sets so I can use PE250, one for V3 and another for V4? If so, could I split up the amplicons evenly and get good data from one library? If I can afford to do 2 separate libraries, one for V3 and one for V4 should I just do that instead? How much more information - or perhaps I mean power for meaningful analysis- would this provide? We may have money for that, but I won't know for sure till next month. In a perfect world I'm sure this is the best answer to the problem, but we know the world isn't perfect.

                          It seems I should definitely avoid using PE300 since the chemistry is not working consistently yet! I'm just trying to figure out how to get the most out of what we can currently do! I'm going with the staggered primers and I need to order them very soon.

                          Choice of variable region is dependent on the study and it is best to be based on current literature (even though they have some issues). You will not get identical results (OTU, taxonomic resolution) using different regions. But using amplicons covering two variable regions (V3-V4) will give higher resolution and more accurate results than using any of those single regions individually. If you are going to use V3-V4 regions then you have to use 300PE. This chemistry has shown improved output recently and there is option of doing more sequencing if the results are poor.

                          Comment

                          Latest Articles

                          Collapse

                          • seqadmin
                            Strategies for Sequencing Challenging Samples
                            by seqadmin


                            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                            03-22-2024, 06:39 AM
                          • seqadmin
                            Techniques and Challenges in Conservation Genomics
                            by seqadmin



                            The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                            Avian Conservation
                            Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                            03-08-2024, 10:41 AM

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by seqadmin, Yesterday, 06:37 PM
                          0 responses
                          7 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, Yesterday, 06:07 PM
                          0 responses
                          7 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 03-22-2024, 10:03 AM
                          0 responses
                          49 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 03-21-2024, 07:32 AM
                          0 responses
                          66 views
                          0 likes
                          Last Post seqadmin  
                          Working...
                          X