Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Undetermined rate in 16S sequencing

    Hi all,

    We have done a 16S Metagenomic sequencing in Miseq (PE300, v3 kit). The undetermined read pairs (% of read pairs not assigned to samples) is quite high which is around 9.5%. We would like to know what is the percentage in general for 16S Metagenomic sequencing. Would you share your experience with us?

    We used Illumina's 16S Metagenomics Sequencing Library Preparation protocol

    target region is V3 and V4 as mentioned in the protocol. PhiX spike-in was 20% and 96 samples were pooled together in one Miseq Run.

    Thanks

    Exon
    Last edited by exon; 08-09-2016, 01:06 AM.

  • #2
    How did you actually do your data processing (command lines, etc)? Also, Illumina's 300bp kits have substantially inferior quality in my tests compared to their 250bp kits, so it's not surprising to have a lot of leftovers.

    But it's impossible to answer this question unless you give more details about your data-processing methodology.

    Comment


    • #3
      We got the information from basespace %Reads Identified (PF) 66.77% (for all samples) plus %Aligned (PhiX) 23.73%. The remaining 9.5% is undetermined read pairs.

      Comment


      • #4
        In our lab we regularly see a high percentage of undetermined reads with dual indexed, 16S amplicons libraries. Can't explain it but it is common.

        Comment


        • #5
          agreed, ~10% undetermined is fairly standard for our high multiplexed runs as well (>100 samples/run)
          Microbial ecologist, running a sequencing core. I have lots of strong opinions on how to survey communities, pretty sure some are even correct.

          Comment


          • #6
            Originally posted by kmcarr View Post
            In our lab we regularly see a high percentage of undetermined reads with dual indexed, 16S amplicons libraries. Can't explain it but it is common.
            I'd expected this - errors in the INDEX1 and INDEX2 reads - causes the pairs to be broken.

            Comment


            • #7
              Originally posted by cement_head View Post
              I'd expected this - errors in the INDEX1 and INDEX2 reads - causes the pairs to be broken.
              If it is the cause of high undetermined reads, is it possible to allow more mismatch during multiplexing to rescue the reads? Or what are the errors from? Thanks

              Comment


              • #8
                Originally posted by exon View Post
                If it is the cause of high undetermined reads, is it possible to allow more mismatch during multiplexing to rescue the reads? Or what are the errors from? Thanks
                How many mismatches can be permitted is dependent upon the minimum edit distance between any two indexes within index set 1 or 2. As your index sets get larger and more complex the likelihood is that at most 1 mismatch can be tolerated; often none can be.

                Even if you had a set of dual 8bp indexes with large enough edit distances to permit up 2 mismatches per index the combinatorial explosion of possible mismatched indexes permitted would extend the time required to calculate them to unreasonable levels. (Tried it once, with 96 2x8bp indexes, 2 mismatches in each, bcl2fastq demultiplexing.)

                Comment


                • #9
                  I use mothur, you can specify the mismatches allowed. I require strict matching because that seems to be one of the easier ways to control for sequencing errors (logic being if there are errors in the index, there may be more errors in the reads) and because I'm not trying to squeeze the max reads per run-I want higher quality = less reads.
                  Microbial ecologist, running a sequencing core. I have lots of strong opinions on how to survey communities, pretty sure some are even correct.

                  Comment


                  • #10
                    Originally posted by exon View Post
                    If it is the cause of high undetermined reads, is it possible to allow more mismatch during multiplexing to rescue the reads? Or what are the errors from? Thanks
                    Yes, you can specify this is QIIME (and in mothur) and in just about every program. I think most people just use the defaults, but you can relax this parameter and potentially rescue more reads.

                    Comment


                    • #11
                      Originally posted by exon View Post
                      If it is the cause of high undetermined reads, is it possible to allow more mismatch during multiplexing to rescue the reads? Or what are the errors from? Thanks
                      One way is to modify the sample sheet to use a subset of the length of the indexes (at the cost of possibly increased false assignments). Typically the 8th base of Index1 and the 1st base of index2 tend to have a lower Q-scores; presumably this could mean they tend to have more errors in those positions (although this may not be the case; quality scores may not be all that reliable). Once when I put N in those two positions in the sample sheet and repeated the demultiplexing (MiSeq Reporter), I managed to reduce the "Undetermined" from ~5 % down to 1%. I never looked closely into whether it caused problems like false assignments. Of course all this assumes that all indexes are still unique with one base missing. I am not sure if it is worth it for a few percent more reads.

                      Comment


                      • #12
                        I highly recommend allowing zero mismatches in barcode reads, unless you didn't do any multiplexing. 10% less data will diminish the quality of your analysis... slightly. Cross-contamination can completely destroy it, even at very low levels. Cross-contamination is really hard to get rid of, so every step helps; and allowing zero mismatches does help, even with dual-indexed reads. However, I think we may cut our 8bp barcodes to 7 for this purpose because the last base is unreliable (which is true of normal reads as well).

                        Comment


                        • #13
                          Originally posted by Brian Bushnell View Post
                          However, I think we may cut our 8bp barcodes to 7 for this purpose because the last base is unreliable (which is true of normal reads as well).
                          It would be nice if the MiSeq software permitted adding an extra cycle to the index read(s) as it does for the sequence reads, and like the HiSeq does for 6bp indexes. I suppose as a workaround one could add and "N" to the end of all the indexes in the sample sheet to get the MiSeq control software to extend the reads.

                          Comment


                          • #14
                            Originally posted by kmcarr View Post
                            It would be nice if the MiSeq software permitted adding an extra cycle to the index read(s) as it does for the sequence reads, and like the HiSeq does for 6bp indexes. I suppose as a workaround one could add and "N" to the end of all the indexes in the sample sheet to get the MiSeq control software to extend the reads.

                            Interesting idea, do you know if this would drop the reads to 250 instead of 251?
                            Microbial ecologist, running a sequencing core. I have lots of strong opinions on how to survey communities, pretty sure some are even correct.

                            Comment


                            • #15
                              Originally posted by thermophile View Post
                              Interesting idea, do you know if this would drop the reads to 250 instead of 251?
                              Hmmm…Interesting question and I hadn't considered the other limitation that may be coded into the MiSeq control software, no more that 525 cycles for v2, 500 cycle reagent cartridges which is maxed out with PE250, dual 8bp indexes

                              251 + 8 + (7)* + 8 + 251 = 525

                              (* 7 dark cycles before index 2 read.)

                              I'd rather keep the extra cycle at the end of the sequence reads in that case.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Essential Discoveries and Tools in Epitranscriptomics
                                by seqadmin




                                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                                04-22-2024, 07:01 AM
                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 08:47 AM
                              0 responses
                              12 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              60 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              60 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              54 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X