View Single Post
Old 03-07-2014, 09:37 AM   #9
Brian Bushnell
Super Moderator
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707

By the way, BBMap's default mode is "ambig=best", meaning that any ambiguously-mapped reads will go to the first top-scoring location (ambiguous hits will be annotated "XT:A:R" rather than "XT:A:U", so you will know which ones are ambiguous). This is fine for variant-calling, but if you are doing RNA-seq expression quantification, it's usually better to set "ambig=random" or "ambig=all". Otherwise, if there are two almost-identical genes, all the reads will map to the first one.

Also, for vertebrate RNA-seq, I suggest these settings:
generate XS tags, needed by cufflinks, according to "unstranded" protocol, or if you don't know the protocol. Alternatives are "ss" (second strand) and "fs" (first strand). If you DO know the protocol, please set it as ss or fs.

All deletions up to this length will be annotated as 'D' in the cigar string; longer ones will be annotated as 'N' (meaning skipped). This may or may not matter to downstream tools. For DNA mapping you don't need this flag.

The longest deletion (i.e., intron) that it will look for. May still find longer ones. Around 98% of human introns are shorter than 100kbp. If you do not set this, it will default to 16kbp, which is too short for vertebrate RNA-seq (though lots of plants and fungi have tiny introns of around 200bp).

I'll add this recommendation to the readme for the next release.
Brian Bushnell is offline   Reply With Quote