Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • oleg
    Junior Member
    • Apr 2009
    • 2

    export.txt files/ quality filtering

    Hello, everyone!
    I have a questoin: field 11 in export.txt files (illumina pipeline output) contains chromosome match name OR code indicating reason for no matching. I wonder if someone knows more about these codes: NM means no match, QC means too many Ns (I'm told) but what about codes like 0:1:0??
    I want to remap the reads using bowtie but am not sure which ones to retain. Are there some codes that indicate 'definitely do not use these reads' (QC comes to mind) or can I use ALL reads and the information in quality scores will take care of this (given appropriate bowtie settings)?
    I ask because my impression so far with regards to reads 'passed filtering' is that they only passed filtering because the criterion (FAILED_CHASTITY<=1.00) was satisfied and I don't feel that failure to pass this one criterion is enough to disqualify a read... Are there other criteria that also determine failure to pass filter (this seems to be the default one, used by person who ran machine I got data from)?
    Thanks a lot!!!!
  • stuart.horswell
    Junior Member
    • Feb 2009
    • 2

    #2
    The x:y:z codes:

    x = number of exact matches found
    y = number of single error (in the seed sequence) matches
    z = number of two error (in the seed sequence) matches found
    (see p121 of the pipeline with CASAVA documentation, p82 in the previous version)

    Unless you're searching for SNPs (when, for obvious reasons, you need a high level of certainty about every base call and alignment mis-match at each base), it's probably safe to just feed all of the reads into bowtie, particularly since you can set your own stringency thresholds at run time.

    Filtering does default to FAILED_CHASTITY<=1.00 but there are other options, see p72 of the CASAVA man, or p31 of the previous version for more details.

    Comment

    • oleg
      Junior Member
      • Apr 2009
      • 2

      #3
      Thanks, Stuart!
      That still leaves me wondering: if the code is 0:0:1 (which I do see), would it not have given me the chromosome corresponding to the unique two error match? Oleg.

      Comment

      • stuart.horswell
        Junior Member
        • Feb 2009
        • 2

        #4
        I can't find the definition in the docs right now but judging from our data, export.txt defaults to only reporting unique perfect matches. However, if you look in the eland_multi.txt file you should see the multiple alignments - three caveats:

        1) If there are perfect and 1-mismatch alignments, both are listed and there isn't any way of determining which is which just using the multi.txt file as far as I can see.

        2) There's a (user definable) threshold to how many matches it will report

        3) In situations like 1:0:2 it will only report the perfect match. But with 0:0:2 you'll get the two mismatches...

        Hope this helps.

        Comment

        • gaoja
          Junior Member
          • Apr 2009
          • 1

          #5
          Originally posted by oleg View Post
          Hello, everyone!
          I have a questoin: field 11 in export.txt files (illumina pipeline output) contains chromosome match name OR code indicating reason for no matching. I wonder if someone knows more about these codes: NM means no match, QC means too many Ns (I'm told) but what about codes like 0:1:0??
          I want to remap the reads using bowtie but am not sure which ones to retain. Are there some codes that indicate 'definitely do not use these reads' (QC comes to mind) or can I use ALL reads and the information in quality scores will take care of this (given appropriate bowtie settings)?
          I ask because my impression so far with regards to reads 'passed filtering' is that they only passed filtering because the criterion (FAILED_CHASTITY<=1.00) was satisfied and I don't feel that failure to pass this one criterion is enough to disqualify a read... Are there other criteria that also determine failure to pass filter (this seems to be the default one, used by person who ran machine I got data from)?
          Thanks a lot!!!!
          We also noticed similar data in the export files generated by v1.3.2, but not in the export files generated by v1.1. Many reads with a value of 1:0:0 in this field did not report a chromosomal position. By comparing with the data in .eland file, some 1:0:0 reported as an unique chromosome in the export file, some do not. That makes the data in this field very confusing, because there is no consistency here. Any one have contacted Illumina about this?
          Thanks,
          James

          Comment

          • sjackman
            Member
            • Mar 2009
            • 15

            #6
            I haven't looked into this at all, but is it possible that the seed aligned with one unique hit and no mismatches, but the rest of the read did not align at the position, and so the error message is 1:0:0?

            Comment

            • Bioinfo
              Member
              • Jul 2010
              • 15

              #7
              Hi
              Anyone knows the how to create eland intermediate files like eland.results.txt/ eland.extended.txt from eland sorted/export files (version:CASAVA1.7)?
              Thanks in advance

              Comment

              • Manu
                Junior Member
                • May 2010
                • 4

                #8
                I asked the Illumina techsupport once. This is what I got:

                The temporary _eland_extended.txt files contain information on ALL hits generated by the ELAND algorithm, irrespective of the quality or uniqueness of the hit.
                You will see hits that do not appear in s_N_export.txt because they are not unique, or the read has low base-quality scores (the impacts on the alignment score).
                You will see that reads can give a hit described in the x:y:z format but these do not have a sufficiently high alignment score compared with a read showing the full details of the alignment.

                Another thing to bear in mind is that the x:y:z format only refers to the SEED alingment not the full extended alignment generated by ELAND. For a longer read this can be significantly different given the default seed length of 32-bases. Calculation of the alignment score is described on page 142 of the CASAVA-1.6 user guide.

                Comment

                • Bioinfo
                  Member
                  • Jul 2010
                  • 15

                  #9
                  Originally posted by Manu View Post
                  I asked the Illumina techsupport once. This is what I got:
                  Hi,
                  Thanks for your reply!

                  I am wondering that any options for generating scores (QC,
                  NM,U0,U1,U2, R0,R1,R2) and x,y,z(number of exact, single-error, 2-
                  error matches) from eland_export.txt/eland_sorted.txt files as the
                  intermediate files like eland_results.txt, eland _extended.txt are no
                  longer availble in the GERALD folder (CASAVA 1.7).
                  Any help would be appreciated
                  Regards.

                  Comment

                  Latest Articles

                  Collapse

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by SEQadmin2, 06-09-2026, 11:58 AM
                  0 responses
                  24 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-05-2026, 10:09 AM
                  0 responses
                  29 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-04-2026, 08:59 AM
                  0 responses
                  39 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-02-2026, 12:03 PM
                  0 responses
                  61 views
                  0 reactions
                  Last Post SEQadmin2  
                  Working...