SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Illumina quality values for v.3 chemistry GenoMax Bioinformatics 2 07-04-2011 06:55 AM
Why do some Ns have higher quality values than other Ns? lcollado Illumina/Solexa 2 08-26-2010 09:17 PM
Quality Control and Quality Values agc Bioinformatics 4 08-24-2010 12:44 AM
Illumina de novo assembly with quality values Peter Bjarke Olsen Bioinformatics 2 06-21-2010 03:11 AM
Question about the values of quality zino SOLiD 5 05-28-2010 04:31 AM

Reply
 
Thread Tools
Old 04-08-2009, 12:54 PM   #1
d17
Member
 
Location: United States

Join Date: Sep 2008
Posts: 27
Default Illumina/Solexa quality values

Hi everyone,

I have some Illumina GA fastq files with base quality values that don't span the full range that I expect.

The quality values for each of five lanes have the following ranges:
lane 1: 2 to 27
lane 2: -1 to 26
lane 3: 1 to 24
lane 4: 1 to 27
lane 5: 0 to 30
with the majority of bases in all lanes having quality values 22 or 23.

I got the values above by subtracting the offset 64=='@' from the ascii values of the chars presented in the fastq files.

These ranges don't seem to be consistent with anything I've seen elsewhere. For example, with Solexa quality values I think the range should go from -5 to 40, and for Phred quality values 0 to 40.
[ Side note: I am not certain whether my files contain Solexa or Phred-based quality values. I see that the quality value output in GERALD fastq files has changed since Illumina pipeline 1.3 (http://seqanswers.com/forums/showthread.php?t=1110). Since lane 2 contains some -1's, I assume my quality values are Solexa ]

Anyone have any ideas about what could be happening here? Why don't I see any bases with qualities higher than 30?

Thanks!
Dan
________

Last edited by d17; 01-19-2011 at 01:56 AM.
d17 is offline   Reply With Quote
Old 04-08-2009, 02:34 PM   #2
TylerBackman
Member
 
Location: Riverside, CA

Join Date: Oct 2008
Posts: 13
Default

Quote:
Originally Posted by d17 View Post
Anyone have any ideas about what could be happening here? Why don't I see any bases with qualities higher than 30?
Perhaps a problem with the instrument itself? Have you previously had high quality runs, and if so has anything changed with your hardware or software?
TylerBackman is offline   Reply With Quote
Old 04-08-2009, 02:48 PM   #3
d17
Member
 
Location: United States

Join Date: Sep 2008
Posts: 27
Default

Quote:
Originally Posted by TylerBackman View Post
Perhaps a problem with the instrument itself? Have you previously had high quality runs, and if so has anything changed with your hardware or software?
Hmm, we have had high quality runs in the past (i.e. quality values from -5 to 40, most bases called as 40). I'll definitely have to check into whether anything has changed with the machine's hardware or software (it's actually not our machine, and these files are a couple of months old now, so that may be hard to track down). I wonder if anyone else has come across quality values that look remotely like these?

Last edited by d17; 01-19-2011 at 01:56 AM.
d17 is offline   Reply With Quote
Old 04-10-2009, 06:11 PM   #4
dlepp
Junior Member
 
Location: Canada

Join Date: Mar 2009
Posts: 5
Default

Hi Dan,

I was just to post on the very same problem. Most of my quality scores are "V"s, which converts to Q22 on the Illumina scale, if I have that correct (new to this). I'd be interested to know if you find an explanation.

Thanks,

Dion
dlepp is offline   Reply With Quote
Old 04-14-2009, 09:13 PM   #5
Torst
Senior Member
 
Location: The University of Melbourne, AUSTRALIA

Join Date: Apr 2008
Posts: 275
Default

Quote:
Originally Posted by d17 View Post
Anyone have any ideas about what could be happening here? Why don't I see any bases with qualities higher than 30?
For Solexa, the estimated probability of a base call error for Q30 is 0.001. ie. correct with 99.9% probability. This is actually not too bad.

In our runs, we get similar quality ranges to what you list, although it is rare to get values below 0 - in fact bases called as "N" usually have Q=0 ... which doesn't make much sense to me. Yes, this was GAPipeline 1.0.

As I suggest, the quality isn't that bad. The reason you aren't seeing higher is almost certainly due to the prep and/or instrument. eg. if you generate too many clusters on the flowcell (high density) you just won't get high confidence in base calls. It's a touchy tradeoff between density/yield and quality/ability to discern clusters.
Torst is offline   Reply With Quote
Old 04-15-2009, 02:48 PM   #6
d17
Member
 
Location: United States

Join Date: Sep 2008
Posts: 27
Default

Torst, thanks for your input:

Quote:
Originally Posted by Torst View Post
For Solexa, the estimated probability of a base call error for Q30 is 0.001. ie. correct with 99.9% probability. This is actually not too bad.
Yes, you're absolutely right ... but I would be happier if we had Q40's that were correct with 99.99% probability!

Quote:
Originally Posted by Torst View Post
In our runs, we get similar quality ranges to what you list, although it is rare to get values below 0 - in fact bases called as "N" usually have Q=0 ... which doesn't make much sense to me. Yes, this was GAPipeline 1.0.
One strange thing we have is that bases called as "N" don't always have the same quality value: in the five lanes I posted about the quality values of "N" bases range from -1 to +3. Of course the -1 doesn't make any sense whatsoever, but at least the others are consistent with the base having a low probability of being correct.

Quote:
Originally Posted by Torst View Post
The reason you aren't seeing higher is almost certainly due to the prep and/or instrument.
Does anyone know how much variation in the prep is stochastic? (i.e. Is there a definite problem that I need to hunt down here, or did we just get unlucky compared with previous runs that had higher quality values?)
________

Last edited by d17; 01-19-2011 at 01:56 AM.
d17 is offline   Reply With Quote
Old 04-15-2009, 03:54 PM   #7
TylerBackman
Member
 
Location: Riverside, CA

Join Date: Oct 2008
Posts: 13
Default

Is your image analysis with IPAR, or with the Illumina pipeline? The first time we used our IPAR unit, it needed "calibration" and resulted in reads with very low quality scores. Re-running the image analysis with firecrest provided higher quality reads.
TylerBackman is offline   Reply With Quote
Old 04-27-2009, 03:35 PM   #8
sjackman
Member
 
Location: Vancouver, Canada

Join Date: Mar 2009
Posts: 15
Default

I'm seeing the exact same thing. I'm seeing quality values from -1 (? or ASCII 63) to 25 (Y or ASCII 89), with most of the calls being 23 (W or ASCII 87). Tyler, how was your IPAR unit `recalibrated' exactly?
sjackman is offline   Reply With Quote
Old 04-27-2009, 03:48 PM   #9
TylerBackman
Member
 
Location: Riverside, CA

Join Date: Oct 2008
Posts: 13
Default

Quote:
Originally Posted by sjackman View Post
I'm seeing the exact same thing. I'm seeing quality values from -1 (? or ASCII 63) to 25 (Y or ASCII 89), with most of the calls being 23 (W or ASCII 87). Tyler, how was your IPAR unit `recalibrated' exactly?
The scores were only incorrect for the first run with the IPAR unit, and were then correct for all subsequent runs.
__________________
@1
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
+
""""""""""""""""""""""""""""""""""""
TylerBackman is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:27 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO