View Single Post
Old 11-12-2014, 01:41 PM   #48
Brian Bushnell
Super Moderator
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707

Originally Posted by ysnapus View Post
Thanks for bringing out a great tool. Noob here so please don't mind some questions, and I hope the answers will be of use to others as well.

I noticed that there is a and in the BBMap code I downloaded from Sourceforge. Is bbduk2 a replacement? Should we switch to using Your example usage in this thread uses
BBDuk can operate in one of 4 kmer-matching modes:
Right-trimming (ktrim=r), left-trimming (ktrim=l), masking (ktrim=n), and filtering (default). But it can only do one at a time because all kmers are stored in a single table. It can still do non-kmer-based operations such as quality trimming at the same time.

BBDuk2 can do all 4 kmer operations at once and is designed for integration into automated pipelines where you do contaminant removal and adapter-trimming in a single pass to minimize filesystem I/O. Personally, I never use BBDuk2 from the command line. Both have identical capabilities and functionality otherwise, but the syntax is different.

I have paired-end DNA Seq data, and I know there has been quite a lot of adapter readthrough since my read lengths are high and some inserts were small. Trimmomatic has the palindrome mode for adapter trimming in PE reads. Does bbduk do the palindromic alignment automatically or do we need to set tbo=t and tpe=t in command line parameters? If this is not what tbo is, what does tbo do?

The usage help lists parameters like tbo=f, so to set it true you need tbo=t, but your example in a post lists it ... tpe tbo.
Is either usage correct?
"tbo" and "tbo=t" and "trimbyoverlap=true" are all equivalent. "tpe" (trim pairs equally) is related, but independent. If the mode is ktrim=r, and BBDuk find a kmer match at position 75 of read 1, but finds no matches for read 2, the if "tpe=t" it will trim both read 1 and read 2 at position 75 so they end up the same length, because the both have the same insert size, but presumably a kmer did not match in read 2 due to errors.

"tbo" tries to determine the insert size by reverse-complimenting read 2 and finding the best overlap such that bases of the two reads match (using the same algorithm as BBMerge, with conservative settings). If a valid overlap is found, and the calculated insert size is shorter than the read length, both reads will be trimmed to the insert size (regardless of the "tpe" flag).

So they are a little different but I recommend enabling them both if you have paired fragment libraries. They are not enabled by default because with long-mate-pair libraries you would not want that behavior.

Please let me know if that does not fully answer your question, and I'll look into Trimmomatic's palindrome mode to see exactly what it does.

Am I correct in assuming that we just need to do ktrim=r, even for paired-end data, because of the opposite orientations of the paired reads?

Thank you.
For fragment libraries, yes, that is correct. You only need to use "ktrim=l" in unusual situations with custom adapters, or long-mate-pair libraries.
Brian Bushnell is offline   Reply With Quote