SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > Illumina/Solexa



Similar Threads
Thread Thread Starter Forum Replies Last Post
convert base call files (*.bcl) into files (*_qseq.txt) giampe Bioinformatics 12 10-20-2011 08:45 AM
Generating a BAM file from Illumina export files in CASAVa 1.7 nirav99 Bioinformatics 1 09-10-2010 01:20 AM
qseq files versus sequence.txt files drio Illumina/Solexa 3 11-09-2009 09:02 AM
Can anyone make sense of the quality scores in the qseq.txt files? TylerBackman Bioinformatics 2 04-29-2009 09:23 AM
solexa output files | s_*_seq.txt vs. s_*_sequencece.txt lajoieb Illumina/Solexa 3 04-08-2009 05:52 PM

Reply
 
Thread Tools
Old 04-07-2009, 05:32 PM   #1
oleg
Junior Member
 
Location: Berkeley, CA

Join Date: Apr 2009
Posts: 2
Default export.txt files/ quality filtering

Hello, everyone!
I have a questoin: field 11 in export.txt files (illumina pipeline output) contains chromosome match name OR code indicating reason for no matching. I wonder if someone knows more about these codes: NM means no match, QC means too many Ns (I'm told) but what about codes like 0:1:0??
I want to remap the reads using bowtie but am not sure which ones to retain. Are there some codes that indicate 'definitely do not use these reads' (QC comes to mind) or can I use ALL reads and the information in quality scores will take care of this (given appropriate bowtie settings)?
I ask because my impression so far with regards to reads 'passed filtering' is that they only passed filtering because the criterion (FAILED_CHASTITY<=1.00) was satisfied and I don't feel that failure to pass this one criterion is enough to disqualify a read... Are there other criteria that also determine failure to pass filter (this seems to be the default one, used by person who ran machine I got data from)?
Thanks a lot!!!!
oleg is offline   Reply With Quote
Old 04-08-2009, 03:26 AM   #2
stuart.horswell
Junior Member
 
Location: London

Join Date: Feb 2009
Posts: 2
Default

The x:y:z codes:

x = number of exact matches found
y = number of single error (in the seed sequence) matches
z = number of two error (in the seed sequence) matches found
(see p121 of the pipeline with CASAVA documentation, p82 in the previous version)

Unless you're searching for SNPs (when, for obvious reasons, you need a high level of certainty about every base call and alignment mis-match at each base), it's probably safe to just feed all of the reads into bowtie, particularly since you can set your own stringency thresholds at run time.

Filtering does default to FAILED_CHASTITY<=1.00 but there are other options, see p72 of the CASAVA man, or p31 of the previous version for more details.
stuart.horswell is offline   Reply With Quote
Old 04-08-2009, 10:49 AM   #3
oleg
Junior Member
 
Location: Berkeley, CA

Join Date: Apr 2009
Posts: 2
Default

Thanks, Stuart!
That still leaves me wondering: if the code is 0:0:1 (which I do see), would it not have given me the chromosome corresponding to the unique two error match? Oleg.
oleg is offline   Reply With Quote
Old 04-09-2009, 12:50 AM   #4
stuart.horswell
Junior Member
 
Location: London

Join Date: Feb 2009
Posts: 2
Default

I can't find the definition in the docs right now but judging from our data, export.txt defaults to only reporting unique perfect matches. However, if you look in the eland_multi.txt file you should see the multiple alignments - three caveats:

1) If there are perfect and 1-mismatch alignments, both are listed and there isn't any way of determining which is which just using the multi.txt file as far as I can see.

2) There's a (user definable) threshold to how many matches it will report

3) In situations like 1:0:2 it will only report the perfect match. But with 0:0:2 you'll get the two mismatches...

Hope this helps.
stuart.horswell is offline   Reply With Quote
Old 04-28-2009, 07:54 AM   #5
gaoja
Junior Member
 
Location: MD

Join Date: Apr 2009
Posts: 1
Default

Quote:
Originally Posted by oleg View Post
Hello, everyone!
I have a questoin: field 11 in export.txt files (illumina pipeline output) contains chromosome match name OR code indicating reason for no matching. I wonder if someone knows more about these codes: NM means no match, QC means too many Ns (I'm told) but what about codes like 0:1:0??
I want to remap the reads using bowtie but am not sure which ones to retain. Are there some codes that indicate 'definitely do not use these reads' (QC comes to mind) or can I use ALL reads and the information in quality scores will take care of this (given appropriate bowtie settings)?
I ask because my impression so far with regards to reads 'passed filtering' is that they only passed filtering because the criterion (FAILED_CHASTITY<=1.00) was satisfied and I don't feel that failure to pass this one criterion is enough to disqualify a read... Are there other criteria that also determine failure to pass filter (this seems to be the default one, used by person who ran machine I got data from)?
Thanks a lot!!!!
We also noticed similar data in the export files generated by v1.3.2, but not in the export files generated by v1.1. Many reads with a value of 1:0:0 in this field did not report a chromosomal position. By comparing with the data in .eland file, some 1:0:0 reported as an unique chromosome in the export file, some do not. That makes the data in this field very confusing, because there is no consistency here. Any one have contacted Illumina about this?
Thanks,
James
gaoja is offline   Reply With Quote
Old 06-29-2009, 09:38 AM   #6
sjackman
Member
 
Location: Vancouver, Canada

Join Date: Mar 2009
Posts: 15
Default

I haven't looked into this at all, but is it possible that the seed aligned with one unique hit and no mismatches, but the rest of the read did not align at the position, and so the error message is 1:0:0?
sjackman is offline   Reply With Quote
Old 08-17-2010, 07:15 AM   #7
Bioinfo
Member
 
Location: canada

Join Date: Jul 2010
Posts: 15
Default

Hi
Anyone knows the how to create eland intermediate files like eland.results.txt/ eland.extended.txt from eland sorted/export files (version:CASAVA1.7)?
Thanks in advance
Bioinfo is offline   Reply With Quote
Old 08-17-2010, 10:23 PM   #8
Manu
Junior Member
 
Location: Freiburg, Germany

Join Date: May 2010
Posts: 4
Default

I asked the Illumina techsupport once. This is what I got:

Quote:
The temporary _eland_extended.txt files contain information on ALL hits generated by the ELAND algorithm, irrespective of the quality or uniqueness of the hit.
You will see hits that do not appear in s_N_export.txt because they are not unique, or the read has low base-quality scores (the impacts on the alignment score).
You will see that reads can give a hit described in the x:y:z format but these do not have a sufficiently high alignment score compared with a read showing the full details of the alignment.

Another thing to bear in mind is that the x:y:z format only refers to the SEED alingment not the full extended alignment generated by ELAND. For a longer read this can be significantly different given the default seed length of 32-bases. Calculation of the alignment score is described on page 142 of the CASAVA-1.6 user guide.
Manu is offline   Reply With Quote
Old 08-19-2010, 05:20 AM   #9
Bioinfo
Member
 
Location: canada

Join Date: Jul 2010
Posts: 15
Default

Quote:
Originally Posted by Manu View Post
I asked the Illumina techsupport once. This is what I got:
Hi,
Thanks for your reply!

I am wondering that any options for generating scores (QC,
NM,U0,U1,U2, R0,R1,R2) and x,y,z(number of exact, single-error, 2-
error matches) from eland_export.txt/eland_sorted.txt files as the
intermediate files like eland_results.txt, eland _extended.txt are no
longer availble in the GERALD folder (CASAVA 1.7).
Any help would be appreciated
Regards.
Bioinfo is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:33 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO