Seqanswers Leaderboard Ad

**Brian Bushnell** · 08-09-2016, 10:07 AM

How did you actually do your data processing (command lines, etc)? Also, Illumina's 300bp kits have substantially inferior quality in my tests compared to their 250bp kits, so it's not surprising to have a lot of leftovers.

But it's impossible to answer this question unless you give more details about your data-processing methodology.

**exon** · 08-10-2016, 12:17 AM

We got the information from basespace %Reads Identified (PF) 66.77% (for all samples) plus %Aligned (PhiX) 23.73%. The remaining 9.5% is undetermined read pairs.

**kmcarr** · 08-10-2016, 04:20 AM

In our lab we regularly see a high percentage of undetermined reads with dual indexed, 16S amplicons libraries. Can't explain it but it is common.

**thermophile** · 08-10-2016, 08:11 AM

agreed, ~10% undetermined is fairly standard for our high multiplexed runs as well (>100 samples/run)

**cement_head** · 08-19-2016, 05:58 AM

Originally posted by kmcarr View Post

In our lab we regularly see a high percentage of undetermined reads with dual indexed, 16S amplicons libraries. Can't explain it but it is common.

I'd expected this - errors in the INDEX1 and INDEX2 reads - causes the pairs to be broken.

**exon** · 08-29-2016, 12:21 AM

Originally posted by cement_head View Post

I'd expected this - errors in the INDEX1 and INDEX2 reads - causes the pairs to be broken.

If it is the cause of high undetermined reads, is it possible to allow more mismatch during multiplexing to rescue the reads? Or what are the errors from? Thanks

**kmcarr** · 08-29-2016, 06:31 AM

Originally posted by exon View Post

If it is the cause of high undetermined reads, is it possible to allow more mismatch during multiplexing to rescue the reads? Or what are the errors from? Thanks

How many mismatches can be permitted is dependent upon the minimum edit distance between any two indexes within index set 1 or 2. As your index sets get larger and more complex the likelihood is that at most 1 mismatch can be tolerated; often none can be.

Even if you had a set of dual 8bp indexes with large enough edit distances to permit up 2 mismatches per index the combinatorial explosion of possible mismatched indexes permitted would extend the time required to calculate them to unreasonable levels. (Tried it once, with 96 2x8bp indexes, 2 mismatches in each, bcl2fastq demultiplexing.)

**thermophile** · 08-29-2016, 06:48 AM

I use mothur, you can specify the mismatches allowed. I require strict matching because that seems to be one of the easier ways to control for sequencing errors (logic being if there are errors in the index, there may be more errors in the reads) and because I'm not trying to squeeze the max reads per run-I want higher quality = less reads.

**cement_head** · 08-29-2016, 06:20 PM

Originally posted by exon View Post

If it is the cause of high undetermined reads, is it possible to allow more mismatch during multiplexing to rescue the reads? Or what are the errors from? Thanks

Yes, you can specify this is QIIME (and in mothur) and in just about every program. I think most people just use the defaults, but you can relax this parameter and potentially rescue more reads.

**JBKri** · 08-30-2016, 08:21 AM

Originally posted by exon View Post

If it is the cause of high undetermined reads, is it possible to allow more mismatch during multiplexing to rescue the reads? Or what are the errors from? Thanks

One way is to modify the sample sheet to use a subset of the length of the indexes (at the cost of possibly increased false assignments). Typically the 8th base of Index1 and the 1st base of index2 tend to have a lower Q-scores; presumably this could mean they tend to have more errors in those positions (although this may not be the case; quality scores may not be all that reliable). Once when I put N in those two positions in the sample sheet and repeated the demultiplexing (MiSeq Reporter), I managed to reduce the "Undetermined" from ~5 % down to 1%. I never looked closely into whether it caused problems like false assignments. Of course all this assumes that all indexes are still unique with one base missing. I am not sure if it is worth it for a few percent more reads.

**Brian Bushnell** · 08-30-2016, 06:50 PM

I highly recommend allowing zero mismatches in barcode reads, unless you didn't do any multiplexing. 10% less data will diminish the quality of your analysis... slightly. Cross-contamination can completely destroy it, even at very low levels. Cross-contamination is really hard to get rid of, so every step helps; and allowing zero mismatches does help, even with dual-indexed reads. However, I think we may cut our 8bp barcodes to 7 for this purpose because the last base is unreliable (which is true of normal reads as well).

**kmcarr** · 08-31-2016, 05:39 AM

Originally posted by Brian Bushnell View Post

However, I think we may cut our 8bp barcodes to 7 for this purpose because the last base is unreliable (which is true of normal reads as well).

It would be nice if the MiSeq software permitted adding an extra cycle to the index read(s) as it does for the sequence reads, and like the HiSeq does for 6bp indexes. I suppose as a workaround one could add and "N" to the end of all the indexes in the sample sheet to get the MiSeq control software to extend the reads.

**thermophile** · 08-31-2016, 06:22 AM

Originally posted by kmcarr View Post

It would be nice if the MiSeq software permitted adding an extra cycle to the index read(s) as it does for the sequence reads, and like the HiSeq does for 6bp indexes. I suppose as a workaround one could add and "N" to the end of all the indexes in the sample sheet to get the MiSeq control software to extend the reads.

Interesting idea, do you know if this would drop the reads to 250 instead of 251?

**kmcarr** · 08-31-2016, 06:31 AM

Originally posted by thermophile View Post

Interesting idea, do you know if this would drop the reads to 250 instead of 251?

Hmmm…Interesting question and I hadn't considered the other limitation that may be coded into the MiSeq control software, no more that 525 cycles for v2, 500 cycle reagent cartridges which is maxed out with PE250, dual 8bp indexes

251 + 8 + (7)* + 8 + 251 = 525

(* 7 dark cycles before index 2 read.)

I'd rather keep the extra cycle at the end of the sequence reads in that case.

Topics	Statistics	Last Post
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Yesterday, 08:47 AM	0 responses 12 views 0 likes	Last Post by seqadmin Yesterday, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 54 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM

Seqanswers Leaderboard Ad

Announcement

Undetermined rate in 16S sequencing

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News