View Single Post
Old 04-02-2010, 05:52 PM   #2
Nils Homer
nilshomer's Avatar
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285

Maybe Heng will comment, but I will take a shot at the first part.

[QUOTE=jmartin;16446]I'm seeing some alignments that don't make sense to me come out of BWA. There only parameter we are setting is '-n 20', and these are 100mer reads from metagenomic samples being mapped against a bacterial database.

Our understanding of the '-n' parameter is that its setting the max allowable edit distance between query and the reference for a good alignment, so its something like the max number of mismatches allowed. But in the SAM output we're seeing alignments where the NM:i field is showing 70-75 (NM:i is supposed to show the number of mismatches).

How can BWA even be making an alignment of a 100mer query where there are 70-75 mismatches?

BWA uses the first 32 bases in its initial lookup, so that your "20" mismatches can only occur in the first 32 bases (see the "-l" option). The rest of the bases are filled in later and can have any # of mismatches. Note that the algorithm is exponential with respect to the "-n" option so I am quite amused that it was even possible for the program to complete with "-n 20" (that is a greater than 60% error rate!).
nilshomer is offline   Reply With Quote