Seqanswers Leaderboard Ad

**tonybolger** · 07-27-2011, 03:52 AM

As a quick hack, if you can live with the error of 'double counting' multi-nt errors, you can just factor out the 'weakest' quality. So assuming both a and b can't happen:

Perror(a) = 10**-(a/10)
Perror(b) = 10**-(b/10)

Perror (a or b) = Perror(a)+Perror(b)
(this double-counts if both happen at once)

Then just use:
m=min(a,b)
Perror (a or b) = Perror(m)*(Perror(a-m)+Perror(b-m))

If a and b are very different, the additional terms will effectively zero anyway, meaning that the combined error is almost exactly equal to the most likely error, which makes sense.

And you don't actually need to calculate Perror(m), you just keep it in Q space, and add it back in later (since multiplication of probabilities = addition of Q values).

You can also apply this approach selectively, say if the minimum quality score is at least say 30, where double errors will be very rare.

**sulicon** · 07-27-2011, 11:22 AM

Thanks a lot

The key is skipping double errors when the quality score is high enough, as you mentioned in your last sentence.

**srasdk** · 07-27-2011, 08:10 PM

If you painstakingly follow the math, the resulting formula would be:

c = 0.5(a+b)
- 10log10( 10**(0.05*(a-b)) + 10**(0.05*(b-a)) - 10**(-0.05*(a+b)))

As you can see, the first term is an average.
The second term reduces the sum, since it will always be positive (a and b are positive making the sum under log > 1).
In your example (a=b and a = 20) the value under log 20 reduces to:
2-10**(-2) ~= 2. The higher 'a' the closer you are to equality
So yes, your observations are correct
No, it will not be the same for a != b or when you start using low quality values.

This formula is safe for all phred values.

Topics	Statistics	Last Post
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Today, 08:47 AM	0 responses 11 views 0 likes	Last Post by seqadmin Today, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 59 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 53 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM

Seqanswers Leaderboard Ad

Announcement

Any suggestion for calculating overall Phred scale quality score for a sequence?

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News