05-01-2015, 04:34 PM
Brian Bushnell
CalcTrueQuality now has a new version that should be substantially more accurate. It tracks read 1 and read 2 independently, is able to take into account indel-type errors, and can internally run 2-pass recalibration (though I still think that's generally overkill except for producing pretty graphs). Also, sam/bam files can now be recalibrated (before they were needed to make the matrices, but only fastq files could be recalibrated). The ability to handle indels may seem unimportant for Illumina, but with very low quality reads, sometimes substitution errors will be called as indels instead if it gives a better alignment score. So, ignoring indels and reads with indels in them can lead to inflated scores. But also, it can now be used with 454 and IonTorrent data. New results, for the same dataset:

The weighted average deviation from the correct quality score is now 0.25, down from 4.15 in the raw data. And even though this is NextSeq V1 data, the largest bin is for Q41.
