Register FAQ Members List Calendar Search Today's Posts Mark Forums Read
 Similar Threads Thread Thread Starter Forum Replies Last Post pparg Bioinformatics 9 11-14-2011 06:51 PM oiiio Bioinformatics 6 07-25-2011 04:33 PM Anney Bioinformatics 4 07-11-2011 01:42 PM kweber2 Genomic Resequencing 2 09-27-2010 03:01 PM christophpale Bioinformatics 0 07-21-2010 03:15 AM

 10-29-2009, 10:04 AM #1 totalnew Member   Location: Canada Join Date: Apr 2009 Posts: 46 bwa mapping quality bwa approximate mapping quality in such way, {. . . if (p->c1 == 0) return 23; if (p->c1 > 1) return 0; if (p->n_mm == mm) return 25; if (p->c2 == 0) return 37; n = (p->c2 >= 255)? 255 : p->c2; return (23 < g_log_n[n])? 0 : 23 - g_log_n[n]; } c1 and c2 are the number of top1 and top2 hits. The higher the mapQ, the lower the probability the read alignment is wrong. I kind of mix up, by above function, if c1 is more than 1, why return the mapQ 0? Thanks for any comments and answers.
10-29-2009, 07:08 PM   #2
nilshomer
Nils Homer

Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285

Quote:
 Originally Posted by totalnew bwa approximate mapping quality in such way, {. . . if (p->c1 == 0) return 23; if (p->c1 > 1) return 0; if (p->n_mm == mm) return 25; if (p->c2 == 0) return 37; n = (p->c2 >= 255)? 255 : p->c2; return (23 < g_log_n[n])? 0 : 23 - g_log_n[n]; } c1 and c2 are the number of top1 and top2 hits. The higher the mapQ, the lower the probability the read alignment is wrong. I kind of mix up, by above function, if c1 is more than 1, why return the mapQ 0? Thanks for any comments and answers.
It is ambiguous as to which is the best hit since there are more then one.

10-30-2009, 08:58 AM   #3
totalnew
Member

Join Date: Apr 2009
Posts: 46

Quote:
 Originally Posted by nilshomer It is ambiguous as to which is the best hit since there are more then one.
Sorry, I still don't understand.

10-30-2009, 10:18 AM   #4
nilshomer
Nils Homer

Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285

Quote:
 Originally Posted by totalnew Sorry, I still don't understand.
If a read aligns to two location equally well, then you cannot unambiguously say which of the two places is the correct location. In any case that there are two equally likely alignments, the mapping quality is zero.

 10-30-2009, 01:10 PM #5 totalnew Member   Location: Canada Join Date: Apr 2009 Posts: 46 That is clear enough, thanks a lot!
11-09-2009, 05:50 AM   #6
mingkunli
Member

Location: Germany

Join Date: Jan 2009
Posts: 41

Quote:
 Originally Posted by totalnew bwa approximate mapping quality in such way, {. . . if (p->c1 == 0) return 23; if (p->c1 > 1) return 0; if (p->n_mm == mm) return 25; if (p->c2 == 0) return 37; n = (p->c2 >= 255)? 255 : p->c2; return (23 < g_log_n[n])? 0 : 23 - g_log_n[n]; } c1 and c2 are the number of top1 and top2 hits. The higher the mapQ, the lower the probability the read alignment is wrong. I kind of mix up, by above function, if c1 is more than 1, why return the mapQ 0? Thanks for any comments and answers.
Where these words come from?
I can't understand if (p->n_mm == mm) return 25;

and for my data,
XT:A:U NM:i:2 X0:i:1 X1:i:0 XM:i:2 XO:i:0 XG:i:0 gives a quality score 25,
XT:A:U NM:i:0 X0:i:1 X1:i:1 XM:i:0 XO:i:0 XG:i:0 gives a quality score 23.

for my data(-n 2 -o 1 -e 2 ),
37 means NM<=1, x0==1,x1==0;
25 means NM==2, x0==1,x1==0;
23 means x1==1;

compatible with this rule above?

 05-21-2010, 04:50 AM #7 aleferna Senior Member   Location: sweden Join Date: Sep 2009 Posts: 121 What is the probability that the second best hit was NOT found due to heuristics? What is the probability that the second best hit was NOT found due to heuristics? Since all these algorithms use heuristics there is a good chance some hits will be missed. When I evaluated BWA I saw that the mapping quality would change for some sequences depending on what was the sensitivity setting I was using. The bad part is that I did see some cases in which the mapping quality would give a higher value in a combination of parameters that was supposed to have higher sensitivity, this was on one of the first versions of bwa so I don't know if it was a bug. It was also only seen in a few reads which is part of the error rate mentioned in: "Simulation reveals that BWA may overestimate mapping quality due to this modification, but the deviation is relatively small. For example, BWA wrongly aligns 11 reads out of 1,569,108 simulated 70bp reads mapped with mapping quality 60." BWA paper My question is the following, the reference genome is "static" therefore would it be possible to fix this small error by tracing back what areas it is generally generated in. I'm guessing its some sort of sequence in repeat areas that can be prone to errors, and that can be tricky because it will bias the mapping so that it finds more of a certain types of areas than others.