Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Filtering SOLiD reads before mapping?? Conflicting advice

    We have just received our first paired-end library of SOLiD reads (x3 libraries). As an example, one library contains 75 million reads. However some of these have particularly low quality scores (below 20). We have read in a previous thread here about filtering reads before mapping based on a mean / median QV of 20 and have downloaded the paper "Analysis of quality raw data of second generation sequences with Quality Assessment Software" http://www.biomedcentral.com/1756-0500/4/130 as discussed in the previous thread (sorry, don't know how to link a previous thread into this message).

    When we run the Quality Assessment Software based on a mean QV20, 75 million reads decreases to 31 million reads. However with a QV30 (as a minimum QV30 is required for SNP calling we think) our 75 million reads become 0.5 million reads).

    The bioinformaticians within the facility which carried out the SOLiD run for us have advised against pre-filtering reads, and allowing the mapping software (we plan to use Bowtie/TopHat/Cufflinks) to make a call on whether the read quality is good enough.

    We just wondered what everyone else would do in this situation? Our instincts tell us to filter out the reads below QV20, however we are being advised to leave them in.

    We are also concerned that >50% of our reads are low quality (QV20), and only 0.6% are good enough for SNP calling - is this what other people have experienced?

    Your advice will be much appreciated and thank you for taking the time to read this post.

    Helen

  • #2
    I would probably filter out the mean QV<20 reads. While Bowtie/TopHat does take the quality values into account, it's not trivial to set the relevant parameters, especially for color space data where a mismatch does not quite mean the same things as for base space data. To make matters worse, TopHat does not allow you to adjust this parameter from the command line, so you have to manually edit the part of the TopHat script that calls Bowtie (which is what actually does the mapping).

    So in short, the difficulties in controlling how TopHat handles quality values suggest that it would be easier to pre-filter.

    Comment


    • #3
      It seems you are confusing the colorspace QV with the QV of the called bases, the dual encoding will give you lots of QV30 bases from lower QV reads. 50 % > qv20 for the reads does not seem bad, but I would try qv 15 or so to get more mappable reads. Bowtie will will not be able to align many low qv reads since there will be too many errors in the seed.

      Comment


      • #4
        Reply to Chipper

        Thank you for your response. We may indeed be getting confused with the colourspace QV. In fact we have been trying to find an explanation of how the colourspace QV relate to basespace QV. We have managed to map ~11% of unfiltered reads via Bowtie and the quality scores for the aligned bases in the SAM file do not appear to correspond with the QV in the colourspace data files. Are these mapping qualities as opposed to base-call quality scores?

        Are you able to point us in the direction of a clear explanation of how the colourspace QV relate to base quality?

        Thank you so much for your help.

        Comment


        • #5
          The color QV to base QV is done by the aligner, so it would be specific to bowtie. For example, here is BFAST's method: http://sourceforge.net/apps/mediawik...apping_Quality

          Comment


          • #6
            Thank you nilshomer, that link was very helpful.

            I have since found this paragraph in the Bowtie manual (http://bowtie-bio.sourceforge.net/manual.shtml):-

            Quality values are also "decoded" so that each reported quality value is a function of the two color qualities overlapping it. Bowtie again adopts the scheme described in the BWA paper, i.e., the decoded nucleotide quality is either the sum of the overlapping color qualities (when both overlapping colors correspond to bases that match in the alignment), the quality of the matching color minus the quality of the mismatching color, or 0 (when both overlapping colors correspond to mismatches).
            Thank you everyone for your help, it is much appreciated

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Essential Discoveries and Tools in Epitranscriptomics
              by seqadmin




              The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
              04-22-2024, 07:01 AM
            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, Yesterday, 08:47 AM
            0 responses
            16 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            60 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            60 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 09:21 AM
            0 responses
            54 views
            0 likes
            Last Post seqadmin  
            Working...
            X