SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Threshold quality score to determine the quality read of ILLUMINA reads problem edge Illumina/Solexa 35 11-02-2015 10:31 AM
Quality Score: FastQC vs Illumina ericguo Illumina/Solexa 8 10-22-2015 04:08 AM
Questions on the updated illumina quality score zeam Bioinformatics 6 10-26-2011 11:08 AM
about illumina reads quality score gridbird Illumina/Solexa 4 08-08-2011 05:10 AM
Threshold quality score to determine the quality read of ILLUMINA reads problem edge General 1 09-13-2010 02:22 PM

Reply
 
Thread Tools
Old 11-18-2010, 12:40 PM   #1
whereisshe
Junior Member
 
Location: Illinois, US

Join Date: Nov 2010
Posts: 2
Default Illumina quality score

Hi everyone,

Currently I started to study of DNA assembly and I found something I could not understand in the following paper:

http://www.sciencedirect.com/science...1&searchtype=a

The figure 1 in the paper shows the relationship between the quality score and the number of errors in reads. They said that the quality score 40 meant 0.01% of error probability and that was true it the following equation was used:

Q = -10log(p/(1-p))

However, according to the graph, only about 65% bases which have the score 40 are correct. Moreover, the percentage of correct bases which have smaller than 40 is almost 0 for all values. I wonder whether this trend is usual or not.
Thank you.
whereisshe is offline   Reply With Quote
Old 11-18-2010, 02:55 PM   #2
obig
Member
 
Location: Berkeley

Join Date: Nov 2010
Posts: 12
Default

I can see your confusion. There is something very strange about figure 1 in their paper. Panel (b) might make sense if what they were plotting was error rate relative to base position for a 40bp read. How can panel (a) even go up to 41 if the max Illumina quality score is 40 as they themselves state in the text?

Look at these papers for better treatment of the question of quality scores vs error rates:
http://genome.cshlp.org/content/18/5/763.long
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2577856/
http://genomebiology.com/2009/10/8/R83
obig is offline   Reply With Quote
Old 11-18-2010, 04:00 PM   #3
whereisshe
Junior Member
 
Location: Illinois, US

Join Date: Nov 2010
Posts: 2
Default

Hi obig,

The largest number in the graph (a) may be 40 and 41 may be displayed because of the wrong setting in Excel. Anyway, they did not explain the number.
Thank you for recommending papers. I appreciate it.

Yun
whereisshe is offline   Reply With Quote
Old 11-26-2010, 06:45 AM   #4
Gerard te Meerman
Junior Member
 
Location: Groningen, Netherlands

Join Date: Nov 2010
Posts: 1
Default Illumina quality scores and deviations from ref sequence

I have collected some data on the distribution of the Illumina quality scores as function of the base position and the actual number of deviations from the reference sequence for an exome enriched sample (101 read length). The Illumina quality score assigns for base 2 the lowest score in 0.04% of the cases. For baseposition 75 this is already 19%, and for base position 100 this figure is 49%. This correlates not very well with observed rates of differences with the Human ref37 genome, with 80% of reads mapped with an exact four single base maximum error model. For baseposition 2 there are 0.5% deviations, for base position 75 1 % and for base position 100 3.5%. You may interpolate the intermediate positions for a reasonable fit. My conclusion is that the Illumina quality score has a very limited relation with observed deviations from the reference sequence. Most deviations are actually errors because the mutational load in the human exome is much lower than the observed rate in exome sequencing. A quality score should differentiate much better in the lower regions of quality to be useful for base calling.
Gerard te Meerman is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 04:21 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO