Unconfigured Ad

**Bukowski** · 12-18-2014, 11:42 AM

I think it's just the reads where it has not been possible to demultiplex on the barcode with sufficient accuracy. There's always some data here even if your sample sheet is set up properly.

**Brian Bushnell** · 12-18-2014, 12:03 PM

That's correct. Undetermined is also where PhiX reads are supposed to end up if it was spiked in.

**GenoMax** · 12-18-2014, 02:38 PM

There are some special circumstances when I deliberately want the reads to go into "undetermined" file (when using CASAVA or bcl2fastq to demultiplex). This preserves the tags in the read ID's. We have built a demultiplexer for Qiime that can use this undetermined file to produce sample files in the qiime format. (Sending all reads to "undetermined" file is achieved by including a dummy tag sequence like YYYY in the samplesheet)

**ScottC** · 12-18-2014, 05:45 PM

Yes, people use it for a variety of different purposes, but it's actual intended purpose is simply as a catch-all for any read that wasn't assignable to a sample for any reason (poor quality, incorrect indexes specified in the sample sheet, missing index sequences (i.e. PhiX reads, which have no index) sequencing error in the index read for that sequence, etc.)

**bvanga** · 03-01-2017, 09:27 PM

Undetermined sequence

Hi,

Can anyone please explain how miseq (300bp pair-end sequencing) determines undetermined sequences. I am studying for a microbiome in the plant tissue collected from a fruit plant in the environment. Sequence provider mentioned that I got about 20GB of undetermined, but when I did OTU analysis and BLAST, I am able to differentiate different species and different undetermined OTUs (it is ok to me to expect some undetermined OTUs from the environmental sample) I got bit confused ...does undetermined OTUs are different form the miseq picked undetermined folder?

Thank you
Vanga

**Brian Bushnell** · 03-01-2017, 09:51 PM

Miseq does not determine anything. It is a sequencing platform; all it does is produce sequences - it is up to the user to determine what they are.

Illumina sequencing platforms support multiplexing, in which multiple libraries are sequenced together. They have different indexes (or bar codes) which indicate the library they came from. During demultiplexing, the reads are split into different libraries based on the bar code (typically, 8bp sequences within the adapters of the molecule being sequenced). If the bar code sequence is low quality, the read will be sent to the "undetermined" bin, meaning that it is not clear which library it came from. It may be possible for the user to BLAST the undetermined bin and decide with high confidence which organism it came from, in situations where the multiplexed organisms are very different. But, I don't recommend that, as it will increase noise. Instead, if you are getting a large volume in your undetermined bin, you should complain to Illumina (or whoever provides your adapters) about wasted sequence due to the low quality of the index reads, or insufficient length and edit distance of indexes to distinguish between libraries.

**bvanga** · 03-01-2017, 10:18 PM

Thank you

In this case, though about 2.0GB was sent into an undetermined bin, I still have obtained about 900 OTUs with good length (350 to 380bp), and sequence depth (about 50,000 reads per sample). BTW what is a good sequence depth? is there any rough figure to judge the sequence depth or is it highly variable based on the sample.

**Brian Bushnell** · 03-01-2017, 10:52 PM

"50,000 reads per sample" is not a depth. A depth would be something like "300x", which would be the result of, for example, sequencing 10 million 2x150bp pairs (3Gbp) for a 10Mbp organism.

It would be helpful if you could clarify your experiment and goal. Also, I suggest you repost the question in a new thread as it is unrelated to the current thread. By that, I mean, take some time to think about the optimal phrasing of the question, and then create a new thread explaining everything you know about the situation, and what you want to accomplish.

**GenoMax** · 03-02-2017, 04:22 AM

Originally posted by bvanga View Post

In this case, though about 2.0GB was sent into an undetermined bin, I still have obtained about 900 OTUs with good length (350 to 380bp), and sequence depth (about 50,000 reads per sample). BTW what is a good sequence depth? is there any rough figure to judge the sequence depth or is it highly variable based on the sample.

Using reads from "undetermined' pool (if they ended up there after allowing for 1 or more errors in tag reads) is questionable. There are always some reads that can't be explained by observed "tags" in multiplex sequencing. Even if you were able to obtain OTU's from them, you can't be sure which of your samples they belong to.

Topics	Statistics	Last Post
UC San Diego Bioengineers Map Gene Function in Human Stem Cells by SEQadmin2 Started by SEQadmin2, Yesterday, 10:26 AM	0 responses 13 views 0 reactions	Last Post by SEQadmin2 Yesterday, 10:26 AM
New Analysis Splits Leukemia Into 16 Epigenomic Subgroups by SEQadmin2 Started by SEQadmin2, 07-09-2026, 10:04 AM	0 responses 26 views 0 reactions	Last Post by SEQadmin2 07-09-2026, 10:04 AM
Genome-Wide CRISPR Screen Uncovers Unlikely Psoriasis Target by SEQadmin2 Started by SEQadmin2, 07-08-2026, 10:08 AM	0 responses 16 views 0 reactions	Last Post by SEQadmin2 07-08-2026, 10:08 AM
Engineered Protein Motor Takes Its First Steps Along DNA Track by SEQadmin2 Started by SEQadmin2, 07-07-2026, 11:05 AM	0 responses 33 views 0 reactions	Last Post by SEQadmin2 07-07-2026, 11:05 AM

Unconfigured Ad

Undetermined.fastq file

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News