![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Threshold quality score to determine the quality read of ILLUMINA reads problem | edge | Illumina/Solexa | 35 | 11-02-2015 11:31 AM |
Questions on the updated illumina quality score | zeam | Bioinformatics | 6 | 10-26-2011 12:08 PM |
about illumina reads quality score | gridbird | Illumina/Solexa | 4 | 08-08-2011 06:10 AM |
Illumina quality score | whereisshe | Bioinformatics | 3 | 11-26-2010 07:45 AM |
Threshold quality score to determine the quality read of ILLUMINA reads problem | edge | General | 1 | 09-13-2010 03:22 PM |
![]() |
|
Thread Tools |
![]() |
#1 |
Junior Member
Location: new haven Join Date: Sep 2011
Posts: 9
|
![]()
Hello,
I have a question in regarding Illumina quality scores. Which quality control is more reliable: FastQC or the Illumina Sample Summary Information from the Illumina pipeline? Here is why I ask: I just get my sequencing data back (from a Hiseq 2000 machine, 50 base run). Based on the Illumina Sample Summary/Report, the quality of the my dataset is decent. The Illumina Sample Summary Information tells me that: The Mean Quality SCore (PF) is 28.43, and %>Q30 bases (PF) is 69.53. However, when I run my data through FastQC, it tells me that the quality of my data is really really bad (please see the attached images). If you look at the two plots attached, the Mean Quality Score is much much worse than 28.43. Why is there a discrepancy between the two quality reports? Which one should I believe? Also, this is the first time our High-throughput Sequencing facility uses the new Illumina pipeline, CASAVA v1.8. I know in the new pipeline the Quality Scores are different from the old one. Could this change explain why FastQC (on Galaxy (version 0.10.0)) thinks my data is poor quality? Thank you in advance for your help! -Eric |
![]() |
![]() |
![]() |
#2 |
Senior Member
Location: St. Louis, MO, USA Join Date: Apr 2011
Posts: 124
|
![]()
Hi Eric,
The FASTQ files might contain all reads, not just the ones that passed quality filtering. In each description line of a read, there should be an N or a Y, indicating if the read has been filtered. There seems to be a large number of reads with a mean quality score of 2, and those probably don't pass filtering and so aren't included in the stats Illumina reports. You might try filtering down to those that pass filtering and see if you get similar results. Justin |
![]() |
![]() |
![]() |
#3 |
Simon Andrews
Location: Babraham Inst, Cambridge, UK Join Date: May 2009
Posts: 871
|
![]()
If you run fastqc with the --casava option set then it will remove any reads which were flagged to fail the illumina QC filter. If you're using the latest version of Casava (1.8.2) then these reads are no longer reported in the fastq output.
|
![]() |
![]() |
![]() |
#4 |
Senior Member
Location: USA Join Date: Apr 2009
Posts: 482
|
![]()
Both are bad, either your library is poor or their sequencing.
|
![]() |
![]() |
![]() |
#5 |
Junior Member
Location: new haven Join Date: Sep 2011
Posts: 9
|
![]()
Thank you very much for your reply. I filtered my reads (I did this with 2% of my total data) with Quality score > 3. This filtered dataset is about 0.65% of the input file, and has a mean quality score of ~28 (see attachment), which is consistent with the Illumina report.
I realize that my data is poor. I am just wondering if it is usable. Some people I talk to say that even if a read has poor quality score, it is ok to use as long as it is a perfect match to the genome. Is this true? What's your take on this? |
![]() |
![]() |
![]() |
#6 | |
Simon Andrews
Location: Babraham Inst, Cambridge, UK Join Date: May 2009
Posts: 871
|
![]() Quote:
One other possibility exists. If your library has biased composition then the Illumina base caller can sometimes get confused and produce poor base calls and quality assignments from what is actually good primary data. You'd be able to see this in the composition plots from FastQC. If this is the case then you can normally rescue these libraries by reanalysing with a fixed calibration matrix and fixed phasing parameters. May be a long shot, but we've seen it happen a few times. |
|
![]() |
![]() |
![]() |
#7 | ||
Senior Member
Location: Purdue University, West Lafayette, Indiana Join Date: Aug 2008
Posts: 2,317
|
![]() Quote:
Quote:
Anyone who has worked in science for a period of time has been there. You have some data -- usually you have invested some effort in obtaining it. But the results are marginal. Do you abandon this data (invoke the Bennetzen dictum), or persevere? There is no correct answer. That isn't the point. The point is you are making a choice. Do that consciously. Don't let hours become days, days weeks, and weeks years without deliberation. Yeah, that will come across as officious and trite. But I have seen it happen many times. -- Phillip |
||
![]() |
![]() |
![]() |
#8 |
Junior Member
Location: Dundee, Scotland Join Date: Oct 2015
Posts: 2
|
![]()
Hi Everyone
I want to do RNAseq from FFPE material. I know this is a big ask. If I can get FASTQC scores across all sequences of 38 with a nice tight peak, is that sufficient? kind regards Charlotte |
![]() |
![]() |
![]() |
#9 |
Simon Andrews
Location: Babraham Inst, Cambridge, UK Join Date: May 2009
Posts: 871
|
![]()
Yes, Phred scores of 38 is plenty good enough - however the problems you're likely to hit from FFPE material are not likely to result in poor sequencing scores, but in high duplication levels, or from contamination, so there will be other bits of QC you're going to need to do.
|
![]() |
![]() |
![]() |
Thread Tools | |
|
|