View Single Post
Old 11-26-2010, 06:45 AM   #4
Gerard te Meerman
Junior Member
 
Location: Groningen, Netherlands

Join Date: Nov 2010
Posts: 1
Default Illumina quality scores and deviations from ref sequence

I have collected some data on the distribution of the Illumina quality scores as function of the base position and the actual number of deviations from the reference sequence for an exome enriched sample (101 read length). The Illumina quality score assigns for base 2 the lowest score in 0.04% of the cases. For baseposition 75 this is already 19%, and for base position 100 this figure is 49%. This correlates not very well with observed rates of differences with the Human ref37 genome, with 80% of reads mapped with an exact four single base maximum error model. For baseposition 2 there are 0.5% deviations, for base position 75 1 % and for base position 100 3.5%. You may interpolate the intermediate positions for a reasonable fit. My conclusion is that the Illumina quality score has a very limited relation with observed deviations from the reference sequence. Most deviations are actually errors because the mutational load in the human exome is much lower than the observed rate in exome sequencing. A quality score should differentiate much better in the lower regions of quality to be useful for base calling.
Gerard te Meerman is offline   Reply With Quote