Seqanswers Leaderboard Ad

**stuart.horswell** · 04-08-2009, 03:26 AM

The x:y:z codes:

x = number of exact matches found
y = number of single error (in the seed sequence) matches
z = number of two error (in the seed sequence) matches found
(see p121 of the pipeline with CASAVA documentation, p82 in the previous version)

Unless you're searching for SNPs (when, for obvious reasons, you need a high level of certainty about every base call and alignment mis-match at each base), it's probably safe to just feed all of the reads into bowtie, particularly since you can set your own stringency thresholds at run time.

Filtering does default to FAILED_CHASTITY<=1.00 but there are other options, see p72 of the CASAVA man, or p31 of the previous version for more details.

**oleg** · 04-08-2009, 10:49 AM

Thanks, Stuart!
That still leaves me wondering: if the code is 0:0:1 (which I do see), would it not have given me the chromosome corresponding to the unique two error match? Oleg.

**stuart.horswell** · 04-09-2009, 12:50 AM

I can't find the definition in the docs right now but judging from our data, export.txt defaults to only reporting unique perfect matches. However, if you look in the eland_multi.txt file you should see the multiple alignments - three caveats:

1) If there are perfect and 1-mismatch alignments, both are listed and there isn't any way of determining which is which just using the multi.txt file as far as I can see.

2) There's a (user definable) threshold to how many matches it will report

3) In situations like 1:0:2 it will only report the perfect match. But with 0:0:2 you'll get the two mismatches...

Hope this helps.

**gaoja** · 04-28-2009, 07:54 AM

Originally posted by oleg View Post

Hello, everyone!
I have a questoin: field 11 in export.txt files (illumina pipeline output) contains chromosome match name OR code indicating reason for no matching. I wonder if someone knows more about these codes: NM means no match, QC means too many Ns (I'm told) but what about codes like 0:1:0??
I want to remap the reads using bowtie but am not sure which ones to retain. Are there some codes that indicate 'definitely do not use these reads' (QC comes to mind) or can I use ALL reads and the information in quality scores will take care of this (given appropriate bowtie settings)?
I ask because my impression so far with regards to reads 'passed filtering' is that they only passed filtering because the criterion (FAILED_CHASTITY<=1.00) was satisfied and I don't feel that failure to pass this one criterion is enough to disqualify a read... Are there other criteria that also determine failure to pass filter (this seems to be the default one, used by person who ran machine I got data from)?
Thanks a lot!!!!

We also noticed similar data in the export files generated by v1.3.2, but not in the export files generated by v1.1. Many reads with a value of 1:0:0 in this field did not report a chromosomal position. By comparing with the data in .eland file, some 1:0:0 reported as an unique chromosome in the export file, some do not. That makes the data in this field very confusing, because there is no consistency here. Any one have contacted Illumina about this?
Thanks,
James

**sjackman** · 06-29-2009, 09:38 AM

I haven't looked into this at all, but is it possible that the seed aligned with one unique hit and no mismatches, but the rest of the read did not align at the position, and so the error message is 1:0:0?

**Bioinfo** · 08-17-2010, 07:15 AM

Hi
Anyone knows the how to create eland intermediate files like eland.results.txt/ eland.extended.txt from eland sorted/export files (version:CASAVA1.7)?
Thanks in advance

**Manu** · 08-17-2010, 10:23 PM

I asked the Illumina techsupport once. This is what I got:

The temporary _eland_extended.txt files contain information on ALL hits generated by the ELAND algorithm, irrespective of the quality or uniqueness of the hit.
You will see hits that do not appear in s_N_export.txt because they are not unique, or the read has low base-quality scores (the impacts on the alignment score).
You will see that reads can give a hit described in the x:y:z format but these do not have a sufficiently high alignment score compared with a read showing the full details of the alignment.

Another thing to bear in mind is that the x:y:z format only refers to the SEED alingment not the full extended alignment generated by ELAND. For a longer read this can be significantly different given the default seed length of 32-bases. Calculation of the alignment score is described on page 142 of the CASAVA-1.6 user guide.

**Bioinfo** · 08-19-2010, 05:20 AM

Originally posted by Manu View Post

I asked the Illumina techsupport once. This is what I got:

Hi,
Thanks for your reply!

I am wondering that any options for generating scores (QC,
NM,U0,U1,U2, R0,R1,R2) and x,y,z(number of exact, single-error, 2-
error matches) from eland_export.txt/eland_sorted.txt files as the
intermediate files like eland_results.txt, eland _extended.txt are no
longer availble in the GERALD folder (CASAVA 1.7).
Any help would be appreciated
Regards.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 27 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 30 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 26 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

export.txt files/ quality filtering

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News