SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
BWA paired end mapping quality pparg Bioinformatics 9 11-14-2011 07:51 PM
Question about BWA mapping quality oiiio Bioinformatics 6 07-25-2011 05:33 PM
negative bwa mapping quality Anney Bioinformatics 4 07-11-2011 02:42 PM
BWA mapping quality scores? kweber2 Genomic Resequencing 2 09-27-2010 04:01 PM
Interpretation of BWA mapping quality christophpale Bioinformatics 0 07-21-2010 04:15 AM

Reply
 
Thread Tools
Old 10-29-2009, 11:04 AM   #1
totalnew
Member
 
Location: Canada

Join Date: Apr 2009
Posts: 46
Default bwa mapping quality

bwa approximate mapping quality in such way,

{.
.
.
if (p->c1 == 0) return 23;
if (p->c1 > 1) return 0;
if (p->n_mm == mm) return 25;
if (p->c2 == 0) return 37;
n = (p->c2 >= 255)? 255 : p->c2;
return (23 < g_log_n[n])? 0 : 23 - g_log_n[n];
}

c1 and c2 are the number of top1 and top2 hits. The higher the mapQ, the lower the probability the read alignment is wrong. I kind of mix up, by above function, if c1 is more than 1, why return the mapQ 0?

Thanks for any comments and answers.
totalnew is offline   Reply With Quote
Old 10-29-2009, 08:08 PM   #2
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Quote:
Originally Posted by totalnew View Post
bwa approximate mapping quality in such way,

{.
.
.
if (p->c1 == 0) return 23;
if (p->c1 > 1) return 0;
if (p->n_mm == mm) return 25;
if (p->c2 == 0) return 37;
n = (p->c2 >= 255)? 255 : p->c2;
return (23 < g_log_n[n])? 0 : 23 - g_log_n[n];
}

c1 and c2 are the number of top1 and top2 hits. The higher the mapQ, the lower the probability the read alignment is wrong. I kind of mix up, by above function, if c1 is more than 1, why return the mapQ 0?

Thanks for any comments and answers.
It is ambiguous as to which is the best hit since there are more then one.
nilshomer is offline   Reply With Quote
Old 10-30-2009, 09:58 AM   #3
totalnew
Member
 
Location: Canada

Join Date: Apr 2009
Posts: 46
Default

Quote:
Originally Posted by nilshomer View Post
It is ambiguous as to which is the best hit since there are more then one.
Sorry, I still don't understand.
totalnew is offline   Reply With Quote
Old 10-30-2009, 11:18 AM   #4
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Quote:
Originally Posted by totalnew View Post
Sorry, I still don't understand.
If a read aligns to two location equally well, then you cannot unambiguously say which of the two places is the correct location. In any case that there are two equally likely alignments, the mapping quality is zero.
nilshomer is offline   Reply With Quote
Old 10-30-2009, 02:10 PM   #5
totalnew
Member
 
Location: Canada

Join Date: Apr 2009
Posts: 46
Default

That is clear enough, thanks a lot!
totalnew is offline   Reply With Quote
Old 11-09-2009, 06:50 AM   #6
mingkunli
Member
 
Location: Germany

Join Date: Jan 2009
Posts: 41
Default

Quote:
Originally Posted by totalnew View Post
bwa approximate mapping quality in such way,

{.
.
.
if (p->c1 == 0) return 23;
if (p->c1 > 1) return 0;
if (p->n_mm == mm) return 25;
if (p->c2 == 0) return 37;
n = (p->c2 >= 255)? 255 : p->c2;
return (23 < g_log_n[n])? 0 : 23 - g_log_n[n];
}

c1 and c2 are the number of top1 and top2 hits. The higher the mapQ, the lower the probability the read alignment is wrong. I kind of mix up, by above function, if c1 is more than 1, why return the mapQ 0?

Thanks for any comments and answers.
Where these words come from?
I can't understand if (p->n_mm == mm) return 25;

and for my data,
XT:A:U NM:i:2 X0:i:1 X1:i:0 XM:i:2 XO:i:0 XG:i:0 gives a quality score 25,
XT:A:U NM:i:0 X0:i:1 X1:i:1 XM:i:0 XO:i:0 XG:i:0 gives a quality score 23.

for my data(-n 2 -o 1 -e 2 ),
37 means NM<=1, x0==1,x1==0;
25 means NM==2, x0==1,x1==0;
23 means x1==1;

compatible with this rule above?
mingkunli is offline   Reply With Quote
Old 05-21-2010, 05:50 AM   #7
aleferna
Senior Member
 
Location: sweden

Join Date: Sep 2009
Posts: 121
Default What is the probability that the second best hit was NOT found due to heuristics?

What is the probability that the second best hit was NOT found due to heuristics?

Since all these algorithms use heuristics there is a good chance some hits will be missed. When I evaluated BWA I saw that the mapping quality would change for some sequences depending on what was the sensitivity setting I was using. The bad part is that I did see some cases in which the mapping quality would give a higher value in a combination of parameters that was supposed to have higher sensitivity, this was on one of the first versions of bwa so I don't know if it was a bug. It was also only seen in a few reads which is part of the error rate mentioned in:

"Simulation reveals that BWA may overestimate mapping quality due
to this modification, but the deviation is relatively small. For example, BWA
wrongly aligns 11 reads out of 1,569,108 simulated 70bp reads mapped with
mapping quality 60." BWA paper

My question is the following, the reference genome is "static" therefore would it be possible to fix this small error by tracing back what areas it is generally generated in. I'm guessing its some sort of sequence in repeat areas that can be prone to errors, and that can be tricky because it will bias the mapping so that it finds more of a certain types of areas than others.
aleferna is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 08:49 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO