View Single Post
Old 06-17-2014, 12:33 PM   #12
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

Manuel,

A couple of comments. First, this is currently possible with Reformat but not with the released version of BBDuk. I'll update it later today or this week so that it will be possible to do both adapter and positional trimming in once command. Until then, you can do this:

reformat.sh in=reads.fq out=trimmed.fq ftl=10

...where "ftl" means "force trim left".

Second, it's important to make sure these bases should really be trimmed. We have been generating some Nextera libraries recently with very erratic base frequency for the first 20 bases:



The top is the base composition histogram before adapter-trimming, and the bottom is after (this has read 1 from 0-150 and read 2 from 152-302); note how the right part of the read looks much better after adapter trimming. But the first 20 bases look terrible! However, I mapped the adapter-trimmed reads to the assembly with BBMap using the 'mhist' flag, which generates a histogram of the rates of match/substitution/insertion/deletion rates by read position:



The error rate is a little higher for the first few bases, but still well under 1%, so we are not going to trim the first 20bp off of those reads, as was initially proposed. The reads are accurate even though the base composition is highly biased, because the fragmentation was not random (this uses some kind of enzyme). Generally, before you trim off bases because of a skewed base composition histogram, I suggest mapping to see if there actually is a higher error rate there.

For reference, the command to generate those histograms:

bbmap.sh in=reads.fq ref=assembly.fa mhist=mhist.txt bhist=bhist.txt nodisk

-Brian
Attached Images
File Type: png trimming_nextera.png (69.3 KB, 2230 views)
File Type: png nextera_trimmed_mhist.png (24.0 KB, 2203 views)
Brian Bushnell is offline   Reply With Quote