SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
FASTQC: Quality Scores rlowe Bioinformatics 2 05-02-2012 01:37 AM
GATK base quality recalibration suppose to keep old and new quality scores? Heisman Bioinformatics 2 10-21-2011 08:40 AM
Illumina quality scores ewilbanks Bioinformatics 3 11-10-2010 09:52 AM
bwa question: quality discrepancy between a color-space alignment and its csfastq yenhuahuang1 Bioinformatics 4 03-15-2010 07:23 AM
fastq quality scores bioxyz Bioinformatics 2 11-25-2009 04:28 PM

Reply
 
Thread Tools
Old 04-18-2013, 06:09 PM   #1
dphansti
Member
 
Location: Bay Area

Join Date: May 2011
Posts: 28
Default quality scores discrepancy on wikipedia

I am modifying code that automatically guesses the scoring method used in a fastq file. I am using the wikipedia entry for guidance (http://en.wikipedia.org/wiki/FASTQ_format).

But the example they show has some discrepancies. In the top part of the image illumina-1.5+ scores start at position 66. But at the bottom of the image it starts at they start at 67.

The bigger problem I am having is that some fastqs that I am looking at have quality scores of 'i' which is not consistent with any of the scoring methods listed on wikipedia.

Can anyone clear this up for me? Is there a better source than wikipedia for this information?

Thanks.
__________________
Doug
www.sharedproteomics.com

Last edited by dphansti; 04-18-2013 at 06:12 PM.
dphansti is offline   Reply With Quote
Old 04-18-2013, 07:09 PM   #2
swbarnes2
Senior Member
 
Location: San Diego

Join Date: May 2008
Posts: 912
Default

Read the note in the wiki. B doesn't really mean 2, it means crappier than 2, or it means the end of the read is poor.

Quote:
Starting in Illumina 1.5 and before Illumina 1.8, the Phred scores 0 to 2 have a slightly different meaning. The values 0 and 1 are no longer used and the value 2, encoded by ASCII 66 "B", is used also at the end of reads as a Read Segment Quality Control Indicator.[6] The Illumina manual[7] (page 30) states the following: If a read ends with a segment of mostly low quality (Q15 or below), then all of the quality values in the segment are replaced with a value of 2 (encoded as the letter B in Illumina's text-based encoding of quality scores)... This Q2 indicator does not predict a specific error rate, but rather indicates that a specific final portion of the read should not be used in further analyses. Also, the quality score encoded as "B" letter may occur internally within reads at least as late as pipeline version 1.6
So C, or 3, is the lowest true quality value you can get in that encoding system.

If you have an "i", maybe you have a new read generated with new improved chemistry, which some quality scores of 41, that have for some reason been converted to the old quality scoring.
swbarnes2 is offline   Reply With Quote
Old 04-22-2013, 01:17 AM   #3
maasha
Senior Member
 
Location: Denmark

Join Date: Apr 2009
Posts: 153
Default

In fact, Solexa and Illumina scores cover the whole of the printable ASCII alphabet (0-93) like Phred. However, mostly the scores are from 0-40, but with the new instruments you will often find scores above 40. If you read the wiki entry carefully it actually says so. Guessing correct encoding is a pest.
maasha is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:29 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO