View Single Post
Old 03-27-2017, 02:35 PM   #447
Brian Bushnell
Super Moderator
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707

@minja - I replied via email and still have not had a chance to look into that; sorry - I will when I have a chance.

@catagui - BBMap cannot handle query sequences over 600bp. When you feed it a fasta input file, by default, sequences longer than 500bp are broken into 500bp pieces and mapped individually (and not recombined afterward). Some of these pieces could be quite short (say, 15bp) and will usually map if you have a big enough reference. You can discard them with "minlen=40" or so. Or you could simply use MapPacBio for mapping as it can accommodate much longer reads. In that case, though, you should increase "maxindel" which has different defaults between BBMap and MapPacBio, and furthermore, a longer transcript could contain multiple long introns.

Try " k=13 maxindel=200000 maxlen=60000 minlen=40 (other arguments)" and see if that improves its mapping rate. Ultimately, though, longer query sequences will always have lower alignment rates than shorter ones when using a global or glocal aligner, so shredding the sequences (which BBMap does) artificially increases the alignment rate.
Brian Bushnell is offline   Reply With Quote