![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
For MAQ: Is there a Tool to convert sanger-format fastq file to illumina-fotmat fastq | byb121 | Bioinformatics | 6 | 12-20-2013 01:26 AM |
Split Large FASTQ file in small FASTQ files with user defined number of reads Windows | deepbiomed | Bioinformatics | 3 | 04-04-2013 07:14 AM |
miseq undetermined fastq | m_elena_bioinfo | Bioinformatics | 4 | 01-28-2013 07:43 AM |
undetermined strand | madsaan | Bioinformatics | 0 | 01-26-2011 05:25 AM |
Reduce file size after Illumina FASTQ to Sanger FASTQ conversion? | jjw14 | Illumina/Solexa | 2 | 06-01-2010 04:35 PM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: Philadelphia Join Date: Apr 2013
Posts: 17
|
![]()
Hello,
This may be a very elementary question but since what I have found thus far on the internet has not entirely clarified this for me, I figured I'd ask here. When a sequencing experiment is run on an Illumina platform, after demultiplexing, there are always *_Undetermined.fastq.gz files. I am lost as to why exactly some reads end up in there, and what the purpose of this file is. I've read that sometimes one may use this file to observe index frequencies or for other troubleshooting issues, but again, I am not entirely clear on this. Is the presence of this file strictly for troubleshooting (i.e. the reads in this file will never be used in any downstream analysis)?? Thanks in advance for any help on this. ![]() |
![]() |
![]() |
![]() |
#2 |
Senior Member
Location: Aberdeen, Scotland Join Date: Jan 2010
Posts: 347
|
![]()
I think it's just the reads where it has not been possible to demultiplex on the barcode with sufficient accuracy. There's always some data here even if your sample sheet is set up properly.
|
![]() |
![]() |
![]() |
#3 |
Super Moderator
Location: Walnut Creek, CA Join Date: Jan 2014
Posts: 2,706
|
![]()
That's correct. Undetermined is also where PhiX reads are supposed to end up if it was spiked in.
|
![]() |
![]() |
![]() |
#4 |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 6,697
|
![]()
There are some special circumstances when I deliberately want the reads to go into "undetermined" file (when using CASAVA or bcl2fastq to demultiplex). This preserves the tags in the read ID's. We have built a demultiplexer for Qiime that can use this undetermined file to produce sample files in the qiime format. (Sending all reads to "undetermined" file is achieved by including a dummy tag sequence like YYYY in the samplesheet)
|
![]() |
![]() |
![]() |
#5 |
Senior Member
Location: Monash University, Melbourne, Australia. Join Date: Jan 2008
Posts: 246
|
![]()
Yes, people use it for a variety of different purposes, but it's actual intended purpose is simply as a catch-all for any read that wasn't assignable to a sample for any reason (poor quality, incorrect indexes specified in the sample sheet, missing index sequences (i.e. PhiX reads, which have no index) sequencing error in the index read for that sequence, etc.)
|
![]() |
![]() |
![]() |
#6 |
Junior Member
Location: New Zealand Join Date: Sep 2016
Posts: 2
|
![]()
Hi,
Can anyone please explain how miseq (300bp pair-end sequencing) determines undetermined sequences. I am studying for a microbiome in the plant tissue collected from a fruit plant in the environment. Sequence provider mentioned that I got about 20GB of undetermined, but when I did OTU analysis and BLAST, I am able to differentiate different species and different undetermined OTUs (it is ok to me to expect some undetermined OTUs from the environmental sample) I got bit confused ...does undetermined OTUs are different form the miseq picked undetermined folder? Thank you Vanga |
![]() |
![]() |
![]() |
#7 |
Super Moderator
Location: Walnut Creek, CA Join Date: Jan 2014
Posts: 2,706
|
![]()
Miseq does not determine anything. It is a sequencing platform; all it does is produce sequences - it is up to the user to determine what they are.
Illumina sequencing platforms support multiplexing, in which multiple libraries are sequenced together. They have different indexes (or bar codes) which indicate the library they came from. During demultiplexing, the reads are split into different libraries based on the bar code (typically, 8bp sequences within the adapters of the molecule being sequenced). If the bar code sequence is low quality, the read will be sent to the "undetermined" bin, meaning that it is not clear which library it came from. It may be possible for the user to BLAST the undetermined bin and decide with high confidence which organism it came from, in situations where the multiplexed organisms are very different. But, I don't recommend that, as it will increase noise. Instead, if you are getting a large volume in your undetermined bin, you should complain to Illumina (or whoever provides your adapters) about wasted sequence due to the low quality of the index reads, or insufficient length and edit distance of indexes to distinguish between libraries. Last edited by Brian Bushnell; 03-01-2017 at 09:06 PM. |
![]() |
![]() |
![]() |
#8 |
Junior Member
Location: New Zealand Join Date: Sep 2016
Posts: 2
|
![]()
In this case, though about 2.0GB was sent into an undetermined bin, I still have obtained about 900 OTUs with good length (350 to 380bp), and sequence depth (about 50,000 reads per sample). BTW what is a good sequence depth? is there any rough figure to judge the sequence depth or is it highly variable based on the sample.
|
![]() |
![]() |
![]() |
#9 |
Super Moderator
Location: Walnut Creek, CA Join Date: Jan 2014
Posts: 2,706
|
![]()
"50,000 reads per sample" is not a depth. A depth would be something like "300x", which would be the result of, for example, sequencing 10 million 2x150bp pairs (3Gbp) for a 10Mbp organism.
It would be helpful if you could clarify your experiment and goal. Also, I suggest you repost the question in a new thread as it is unrelated to the current thread. By that, I mean, take some time to think about the optimal phrasing of the question, and then create a new thread explaining everything you know about the situation, and what you want to accomplish. |
![]() |
![]() |
![]() |
#10 | |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 6,697
|
![]() Quote:
|
|
![]() |
![]() |
![]() |
Thread Tools | |
|
|