Seqanswers Leaderboard Ad

**BAMseek** · 12-06-2011, 08:09 PM

Hi Eric,

The FASTQ files might contain all reads, not just the ones that passed quality filtering. In each description line of a read, there should be an N or a Y, indicating if the read has been filtered. There seems to be a large number of reads with a mean quality score of 2, and those probably don't pass filtering and so aren't included in the stats Illumina reports. You might try filtering down to those that pass filtering and see if you get similar results.

Justin

**simonandrews** · 12-07-2011, 01:09 AM

If you run fastqc with the --casava option set then it will remove any reads which were flagged to fail the illumina QC filter. If you're using the latest version of Casava (1.8.2) then these reads are no longer reported in the fastq output.

**NextGenSeq** · 12-07-2011, 07:10 AM

Both are bad, either your library is poor or their sequencing.

**ericguo** · 12-07-2011, 11:02 AM

Thank you very much for your reply. I filtered my reads (I did this with 2% of my total data) with Quality score > 3. This filtered dataset is about 0.65% of the input file, and has a mean quality score of ~28 (see attachment), which is consistent with the Illumina report.

I realize that my data is poor. I am just wondering if it is usable. Some people I talk to say that even if a read has poor quality score, it is ok to use as long as it is a perfect match to the genome. Is this true? What's your take on this?

Attached Files

**simonandrews** · 12-08-2011, 01:20 AM

Originally posted by ericguo View Post

Thank you very much for your reply. I filtered my reads (I did this with 2% of my total data) with Quality score > 3. This filtered dataset is about 0.65% of the input file, and has a mean quality score of ~28 (see attachment), which is consistent with the Illumina report.

I realize that my data is poor. I am just wondering if it is usable. Some people I talk to say that even if a read has poor quality score, it is ok to use as long as it is a perfect match to the genome. Is this true? What's your take on this?

If you only got decent results from less than 1% of your library then I'd not have huge confidence in those sequences. You could try mapping them and seeing if you get sensible results. We've had libraries which were 95% adapter where we got useful results from the remaining 5%.

One other possibility exists. If your library has biased composition then the Illumina base caller can sometimes get confused and produce poor base calls and quality assignments from what is actually good primary data. You'd be able to see this in the composition plots from FastQC. If this is the case then you can normally rescue these libraries by reanalysing with a fixed calibration matrix and fixed phasing parameters. May be a long shot, but we've seen it happen a few times.

**pmiguel** · 12-08-2011, 05:24 AM

Originally posted by ericguo View Post

Thank you very much for your reply. I filtered my reads (I did this with 2% of my total data) with Quality score > 3. This filtered dataset is about 0.65% of the input file, and has a mean quality score of ~28 (see attachment), which is consistent with the Illumina report.

I realize that my data is poor. I am just wondering if it is usable. Some people I talk to say that even if a read has poor quality score, it is ok to use as long as it is a perfect match to the genome. Is this true? What's your take on this?

I call this the "Bennetzen Dictum":

Don't waste clean thoughts on dirty data.

It doesn't necessarily answer your question because you will want to calibrate what constitutes "dirty" for yourself. But I think it is worthwhile to consider whenever you have come to the point where you are considering investing some effort analyzing a questionable data set.

Anyone who has worked in science for a period of time has been there. You have some data -- usually you have invested some effort in obtaining it. But the results are marginal. Do you abandon this data (invoke the Bennetzen dictum), or persevere?

There is no correct answer. That isn't the point. The point is you are making a choice. Do that consciously. Don't let hours become days, days weeks, and weeks years without deliberation. Yeah, that will come across as officious and trite. But I have seen it happen many times.

--
Phillip

**cproby** · 10-22-2015, 03:15 AM

Hi Everyone

I want to do RNAseq from FFPE material. I know this is a big ask. If I can get FASTQC scores across all sequences of 38 with a nice tight peak, is that sufficient?

kind regards
Charlotte

**simonandrews** · 10-22-2015, 04:08 AM

Originally posted by cproby View Post

Hi Everyone

I want to do RNAseq from FFPE material. I know this is a big ask. If I can get FASTQC scores across all sequences of 38 with a nice tight peak, is that sufficient?

kind regards
Charlotte

Yes, Phred scores of 38 is plenty good enough - however the problems you're likely to hit from FFPE material are not likely to result in poor sequencing scores, but in high duplication levels, or from contamination, so there will be other bits of QC you're going to need to do.

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Yesterday, 11:49 AM	0 responses 13 views 0 likes	Last Post by seqadmin Yesterday, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Quality Score: FastQC vs Illumina

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News