Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • ericguo
    Junior Member
    • Sep 2011
    • 9

    Quality Score: FastQC vs Illumina

    Hello,

    I have a question in regarding Illumina quality scores. Which quality control is more reliable: FastQC or the Illumina Sample Summary Information from the Illumina pipeline?

    Here is why I ask:

    I just get my sequencing data back (from a Hiseq 2000 machine, 50 base run). Based on the Illumina Sample Summary/Report, the quality of the my dataset is decent. The Illumina Sample Summary Information tells me that: The Mean Quality SCore (PF) is 28.43, and %>Q30 bases (PF) is 69.53.

    However, when I run my data through FastQC, it tells me that the quality of my data is really really bad (please see the attached images). If you look at the two plots attached, the Mean Quality Score is much much worse than 28.43.

    Why is there a discrepancy between the two quality reports? Which one should I believe?

    Also, this is the first time our High-throughput Sequencing facility uses the new Illumina pipeline, CASAVA v1.8. I know in the new pipeline the Quality Scores are different from the old one. Could this change explain why FastQC (on Galaxy (version 0.10.0)) thinks my data is poor quality?

    Thank you in advance for your help!

    -Eric
    Attached Files
  • BAMseek
    Senior Member
    • Apr 2011
    • 124

    #2
    Hi Eric,

    The FASTQ files might contain all reads, not just the ones that passed quality filtering. In each description line of a read, there should be an N or a Y, indicating if the read has been filtered. There seems to be a large number of reads with a mean quality score of 2, and those probably don't pass filtering and so aren't included in the stats Illumina reports. You might try filtering down to those that pass filtering and see if you get similar results.

    Justin

    Comment

    • simonandrews
      Simon Andrews
      • May 2009
      • 870

      #3
      If you run fastqc with the --casava option set then it will remove any reads which were flagged to fail the illumina QC filter. If you're using the latest version of Casava (1.8.2) then these reads are no longer reported in the fastq output.

      Comment

      • NextGenSeq
        Senior Member
        • Apr 2009
        • 482

        #4
        Both are bad, either your library is poor or their sequencing.

        Comment

        • ericguo
          Junior Member
          • Sep 2011
          • 9

          #5
          Thank you very much for your reply. I filtered my reads (I did this with 2% of my total data) with Quality score > 3. This filtered dataset is about 0.65% of the input file, and has a mean quality score of ~28 (see attachment), which is consistent with the Illumina report.

          I realize that my data is poor. I am just wondering if it is usable. Some people I talk to say that even if a read has poor quality score, it is ok to use as long as it is a perfect match to the genome. Is this true? What's your take on this?
          Attached Files

          Comment

          • simonandrews
            Simon Andrews
            • May 2009
            • 870

            #6
            Originally posted by ericguo View Post
            Thank you very much for your reply. I filtered my reads (I did this with 2% of my total data) with Quality score > 3. This filtered dataset is about 0.65% of the input file, and has a mean quality score of ~28 (see attachment), which is consistent with the Illumina report.

            I realize that my data is poor. I am just wondering if it is usable. Some people I talk to say that even if a read has poor quality score, it is ok to use as long as it is a perfect match to the genome. Is this true? What's your take on this?
            If you only got decent results from less than 1% of your library then I'd not have huge confidence in those sequences. You could try mapping them and seeing if you get sensible results. We've had libraries which were 95% adapter where we got useful results from the remaining 5%.

            One other possibility exists. If your library has biased composition then the Illumina base caller can sometimes get confused and produce poor base calls and quality assignments from what is actually good primary data. You'd be able to see this in the composition plots from FastQC. If this is the case then you can normally rescue these libraries by reanalysing with a fixed calibration matrix and fixed phasing parameters. May be a long shot, but we've seen it happen a few times.

            Comment

            • pmiguel
              Senior Member
              • Aug 2008
              • 2328

              #7
              Originally posted by ericguo View Post
              Thank you very much for your reply. I filtered my reads (I did this with 2% of my total data) with Quality score > 3. This filtered dataset is about 0.65% of the input file, and has a mean quality score of ~28 (see attachment), which is consistent with the Illumina report.

              I realize that my data is poor. I am just wondering if it is usable. Some people I talk to say that even if a read has poor quality score, it is ok to use as long as it is a perfect match to the genome. Is this true? What's your take on this?
              I call this the "Bennetzen Dictum":

              Don't waste clean thoughts on dirty data.
              It doesn't necessarily answer your question because you will want to calibrate what constitutes "dirty" for yourself. But I think it is worthwhile to consider whenever you have come to the point where you are considering investing some effort analyzing a questionable data set.

              Anyone who has worked in science for a period of time has been there. You have some data -- usually you have invested some effort in obtaining it. But the results are marginal. Do you abandon this data (invoke the Bennetzen dictum), or persevere?

              There is no correct answer. That isn't the point. The point is you are making a choice. Do that consciously. Don't let hours become days, days weeks, and weeks years without deliberation. Yeah, that will come across as officious and trite. But I have seen it happen many times.

              --
              Phillip

              Comment

              • cproby
                Junior Member
                • Oct 2015
                • 2

                #8
                Hi Everyone

                I want to do RNAseq from FFPE material. I know this is a big ask. If I can get FASTQC scores across all sequences of 38 with a nice tight peak, is that sufficient?

                kind regards
                Charlotte

                Comment

                • simonandrews
                  Simon Andrews
                  • May 2009
                  • 870

                  #9
                  Originally posted by cproby View Post
                  Hi Everyone

                  I want to do RNAseq from FFPE material. I know this is a big ask. If I can get FASTQC scores across all sequences of 38 with a nice tight peak, is that sufficient?

                  kind regards
                  Charlotte
                  Yes, Phred scores of 38 is plenty good enough - however the problems you're likely to hit from FFPE material are not likely to result in poor sequencing scores, but in high duplication levels, or from contamination, so there will be other bits of QC you're going to need to do.

                  Comment

                  Latest Articles

                  Collapse

                  • GATTACAT
                    Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                    by GATTACAT
                    Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
                    07-01-2026, 11:43 AM
                  • SEQadmin2
                    Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                    by SEQadmin2


                    I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                    Here are nine questions we think about, in roughly the order they matter, before...
                    06-18-2026, 07:11 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by SEQadmin2, 07-02-2026, 11:08 AM
                  0 responses
                  7 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-30-2026, 05:37 AM
                  0 responses
                  12 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-26-2026, 11:10 AM
                  0 responses
                  20 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-17-2026, 06:09 AM
                  0 responses
                  54 views
                  0 reactions
                  Last Post SEQadmin2  
                  Working...