View Single Post
Old 05-27-2014, 08:47 AM   #47
Brian Bushnell
Super Moderator
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707


1) 'intronlen' does not actually affect alignment, just the way splices are reported - deletions shorter than 'intronlen' will have a 'D' symbol in the cigar string, while longer ones will have an 'N' symbol. To prevent gapped alignments, you should set 'maxindel=0', which will prevent them from being looked for, though some may still be found (or 'strictmaxindel=0' to absolutely ban all alignments with indels). But prokaryotes do have some self-splicing; these are typically very short, like 20bp or so, I think. So you may want to set 'maxindel=100' or something like that, which will prevent long ones.

2) Soft-clipping is applied by default if a read goes off the end of a scaffold. You can also force it on alignments like the one you describe with the flag 'local=t'. Also, note that if you use the flag 'sam=1.4' then cigar strings will be generated with '= for match and 'X' for mismatch, instead of 'M' for both.

3) You can use a flag like 'minid=98' to prevent alignments with lower than approximately 98% identity, which would be 2 mismatches for a 100bp read, but substitutions, deletions, and insertions are all scored differently, and whether they are contiguous or scattered also affects the score, so this is not exact. BBMap does not have any way to ensure that the best alignment with at most X mismatches is returned, like Bowtie 1 can do, because the scoring is fundamentally different.

4) "maxindel" actually controls the length of individual indels, not the number of them, which is not controlled. There's also a "maxindel2" flag that controls the sum of the length of all indels, which by default is set to double maxindel. So if you did this:
maxindel=10 maxindel2=15 strictmaxindel
...then individual indels could be up to 10bp; the sum of the length of indels could be up to 15bp; and any alignment with a single indel longer than 10bp would be banned. But there is no way to limit the total number of indel events.

5) By default the best alignment is reported, and if there are multiple equally-scoring alignments, the first one (in genomic coordinates) is reported, but it is marked as ambiguous (XT:A:R tag). This corresponds to the setting 'ambig=best'. You can alternatively set 'ambig=random' or 'ambig=all' (which is what you are looking for). 'maxsites' will limit the total number of alignments displayed in 'ambig=all' mode, but won't have any effect in default mode.

Thanks for using BBMap!

Brian Bushnell is offline   Reply With Quote