Seqanswers Leaderboard Ad

**Brian Bushnell** · 11-30-2015, 05:33 PM

You can't estimate error rates with a linear average of log-transformed values. If you map to phiX with BBMap, like this:

bbmap.sh in=reads.fq ref=phix.fa mhist=mhist.txt qhist=qhist.txt qahist=qahist.txt

...then graph qhist with something like Excel or R, you will see the difference between the linear and logarithmic averages, as well as the actual error rate. The actual error rate will much more closely resemble the lines of the logarithmic averages.

qahist.txt (quality accuracy histogram), on the other hand, will show the actual measured quality values for each quality score, which will tell you how accurate their quality scores are. Normally they're not too far off, but it highly depends on the specific platform and software version.

**ClemBuntu** · 12-08-2015, 08:23 AM

Sorry but I don't understand.

What's the point of qhist histogram ?

And about qahist, am I supposed to see that most of my reads have a quality greater or equal than 35 ?

Attached Files

**Brian Bushnell** · 12-08-2015, 02:09 PM

The qhist shows you, per position in the read, what the expected error rate is (that's read1_log) and what the actual average error rate is (that's read1_measured). As you can see, those completely disagree so the quality scores are very inaccurate (almost meaningless) for this library, and that the error rates are high - the average error rate starts at Q20 (1%) and drops to Q11 (8%) by the end. The X axis is read position.

The qahist is plotted incorrectly - you need to plot the "Quality" column as the X-axis, and the "TrueQuality" column as the Y-axis; discard the other columns.

**ClemBuntu** · 12-09-2015, 01:10 AM

Originally posted by Brian Bushnell View Post

The qhist shows you, per position in the read, what the expected error rate is (that's read1_log) and what the actual average error rate is (that's read1_measured). As you can see, those completely disagree so the quality scores are very inaccurate (almost meaningless) for this library, and that the error rates are high -

What do you mean 'expected' ?

I'm not sure to understand the scale.

Originally posted by Brian Bushnell View Post

the average error rate starts at Q20 (1%) and drops to Q11 (8%) by the end.

So I guess the Y axe is the log quality value, 20,000 is Q20 and you say it regarding to the orange curve right ?

Originally posted by Brian Bushnell View Post

The qahist is plotted incorrectly - you need to plot the "Quality" column as the X-axis, and the "TrueQuality" column as the Y-axis; discard the other columns.

I have drawn the qahist again, does it show the quality is from Q8 to Q20 as well ?

Attached Files

quality.JPG (27.3 KB, 59 views)

**HESmith** · 12-09-2015, 06:34 AM

Originally posted by ClemBuntu View Post

What do you mean 'expected' ?

The quality scores produced by the sequencer are expected (i.e., predicted or calculated) scores; see http://www.illumina.com/documents/pr...ity_scores.pdf for an explanation.

Originally posted by ClemBuntu View Post

I'm not sure to understand the scale.

So I guess the Y axe is the log quality value, 20,000 is Q20 and you say it regarding to the orange curve right ?

Yes, 20,000 is Q20. The relevant curves are the orange one (expected scores) vs. the gray one (actual scores based on your phiX data). These two curves should overlap if the expected scores accurately reflect the true error rate. They do not. As Brian indicated, the true error rate begins at Q20 and declines to Q11 at the end of the read. These quality scores are very low.

**Brian Bushnell** · 12-09-2015, 11:04 AM

Originally posted by ClemBuntu View Post

I have drawn the qahist again, does it show the quality is from Q8 to Q20 as well ?

Each point on the qahist indicates the claimed quality (from the quality scores) versus the measured quality (based on the alignment match/mismatch rate). So, for example, you have a point at X=22, Y=13. That means that if you take all bases with a stated quality score of Q22 (roughly 99.3% accuracy), on average, they have an error rate indicating Q13 (roughly 95% accuracy).

**HESmith** · 12-09-2015, 01:29 PM

Hey Brian, what's the point at (0,60)? It looks like all the Q0s are 99.9999% accurate!

[ClemBuntu, ignore this message. It's a joke. The lowest quality produced by Illumina is Q2]

**Brian Bushnell** · 12-09-2015, 01:37 PM

That's actually a valid point! The quality scores vary by platform and software version (I'm guessing this is NextSeq V1 chemistry, or HiSeq 4000). Normally, for non-binned quality scores (like HiSeq 2000), there is 0 (for N), 2, then 5-41. Quite often Q2 bases are more accurate than Q5 bases, as 2 has a special meaning. But sometimes, called bases (A, C, G, T) are produced with Q0 assigned. It's normally very few, under 100. I suspect it's a bug in Casava. But, due to the fact that there are so few, it's not uncommon for them to all match the reference. To keep the axes finite, I cap all quality values at 60, but technically, in this case, the Q0 bases are 100% accurate (Q infinity). I wouldn't count on it in general, though

**HESmith** · 12-09-2015, 01:52 PM

Thanks, Brian. As always, you're a fount of knowledge.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 30 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 32 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 53 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Illumina Error rate

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News