Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • export.txt files/ quality filtering

    Hello, everyone!
    I have a questoin: field 11 in export.txt files (illumina pipeline output) contains chromosome match name OR code indicating reason for no matching. I wonder if someone knows more about these codes: NM means no match, QC means too many Ns (I'm told) but what about codes like 0:1:0??
    I want to remap the reads using bowtie but am not sure which ones to retain. Are there some codes that indicate 'definitely do not use these reads' (QC comes to mind) or can I use ALL reads and the information in quality scores will take care of this (given appropriate bowtie settings)?
    I ask because my impression so far with regards to reads 'passed filtering' is that they only passed filtering because the criterion (FAILED_CHASTITY<=1.00) was satisfied and I don't feel that failure to pass this one criterion is enough to disqualify a read... Are there other criteria that also determine failure to pass filter (this seems to be the default one, used by person who ran machine I got data from)?
    Thanks a lot!!!!

  • #2
    The x:y:z codes:

    x = number of exact matches found
    y = number of single error (in the seed sequence) matches
    z = number of two error (in the seed sequence) matches found
    (see p121 of the pipeline with CASAVA documentation, p82 in the previous version)

    Unless you're searching for SNPs (when, for obvious reasons, you need a high level of certainty about every base call and alignment mis-match at each base), it's probably safe to just feed all of the reads into bowtie, particularly since you can set your own stringency thresholds at run time.

    Filtering does default to FAILED_CHASTITY<=1.00 but there are other options, see p72 of the CASAVA man, or p31 of the previous version for more details.

    Comment


    • #3
      Thanks, Stuart!
      That still leaves me wondering: if the code is 0:0:1 (which I do see), would it not have given me the chromosome corresponding to the unique two error match? Oleg.

      Comment


      • #4
        I can't find the definition in the docs right now but judging from our data, export.txt defaults to only reporting unique perfect matches. However, if you look in the eland_multi.txt file you should see the multiple alignments - three caveats:

        1) If there are perfect and 1-mismatch alignments, both are listed and there isn't any way of determining which is which just using the multi.txt file as far as I can see.

        2) There's a (user definable) threshold to how many matches it will report

        3) In situations like 1:0:2 it will only report the perfect match. But with 0:0:2 you'll get the two mismatches...

        Hope this helps.

        Comment


        • #5
          Originally posted by oleg View Post
          Hello, everyone!
          I have a questoin: field 11 in export.txt files (illumina pipeline output) contains chromosome match name OR code indicating reason for no matching. I wonder if someone knows more about these codes: NM means no match, QC means too many Ns (I'm told) but what about codes like 0:1:0??
          I want to remap the reads using bowtie but am not sure which ones to retain. Are there some codes that indicate 'definitely do not use these reads' (QC comes to mind) or can I use ALL reads and the information in quality scores will take care of this (given appropriate bowtie settings)?
          I ask because my impression so far with regards to reads 'passed filtering' is that they only passed filtering because the criterion (FAILED_CHASTITY<=1.00) was satisfied and I don't feel that failure to pass this one criterion is enough to disqualify a read... Are there other criteria that also determine failure to pass filter (this seems to be the default one, used by person who ran machine I got data from)?
          Thanks a lot!!!!
          We also noticed similar data in the export files generated by v1.3.2, but not in the export files generated by v1.1. Many reads with a value of 1:0:0 in this field did not report a chromosomal position. By comparing with the data in .eland file, some 1:0:0 reported as an unique chromosome in the export file, some do not. That makes the data in this field very confusing, because there is no consistency here. Any one have contacted Illumina about this?
          Thanks,
          James

          Comment


          • #6
            I haven't looked into this at all, but is it possible that the seed aligned with one unique hit and no mismatches, but the rest of the read did not align at the position, and so the error message is 1:0:0?

            Comment


            • #7
              Hi
              Anyone knows the how to create eland intermediate files like eland.results.txt/ eland.extended.txt from eland sorted/export files (version:CASAVA1.7)?
              Thanks in advance

              Comment


              • #8
                I asked the Illumina techsupport once. This is what I got:

                The temporary _eland_extended.txt files contain information on ALL hits generated by the ELAND algorithm, irrespective of the quality or uniqueness of the hit.
                You will see hits that do not appear in s_N_export.txt because they are not unique, or the read has low base-quality scores (the impacts on the alignment score).
                You will see that reads can give a hit described in the x:y:z format but these do not have a sufficiently high alignment score compared with a read showing the full details of the alignment.

                Another thing to bear in mind is that the x:y:z format only refers to the SEED alingment not the full extended alignment generated by ELAND. For a longer read this can be significantly different given the default seed length of 32-bases. Calculation of the alignment score is described on page 142 of the CASAVA-1.6 user guide.

                Comment


                • #9
                  Originally posted by Manu View Post
                  I asked the Illumina techsupport once. This is what I got:
                  Hi,
                  Thanks for your reply!

                  I am wondering that any options for generating scores (QC,
                  NM,U0,U1,U2, R0,R1,R2) and x,y,z(number of exact, single-error, 2-
                  error matches) from eland_export.txt/eland_sorted.txt files as the
                  intermediate files like eland_results.txt, eland _extended.txt are no
                  longer availble in the GERALD folder (CASAVA 1.7).
                  Any help would be appreciated
                  Regards.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Current Approaches to Protein Sequencing
                    by seqadmin


                    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                    04-04-2024, 04:25 PM
                  • seqadmin
                    Strategies for Sequencing Challenging Samples
                    by seqadmin


                    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                    03-22-2024, 06:39 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 04-11-2024, 12:08 PM
                  0 responses
                  27 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 10:19 PM
                  0 responses
                  30 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 09:21 AM
                  0 responses
                  26 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-04-2024, 09:00 AM
                  0 responses
                  52 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X