View Single Post
Old 06-17-2014, 10:02 AM   #10
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,696
Default

Woody,

If you run the shellscript bbmerge.sh with no parameters, it will display the help information. To answer your specific questions:

By default it uses all available threads, but you can override that with the "t=" flag. You can get a higher merge rate at the expense of more false positives with the "loose" or "vloose" (very loose) flag. It's also possible to more finely tune it with other flags like "minoverlap", "margin", "entropy", and "mismatches", but those are more complicated to tweak. You can use the "mininsert=" flag to ban merges with an insert size shorter than that value.

2x100bp reads can have a 74bp insert or a 188bp insert size. 74bp insert means that the molecule being sequenced was shorter than read length, and as a result the data collection continued off the end of the genomic sequence and into the adapter. So, before merging, the reads each contained 26bp of junk. And a 188bp insert size means that the reads overlapped by 12 base pairs. BBMerge does not look for overlaps shorter than 12bp in the default mode; the shorter the overlap, the more likely that it's a false positive.

The higher the error rate of reads, the fewer successful joins there will be. If you have sufficient coverage (at least 30x average) you can try error-correcting the reads first with my error correction tool; that should increase the merge rate.

ecc.sh in1=r1.fq in2=r2.fq out1=corrected1.fq out2=corrected2.fq

-Brian
Brian Bushnell is offline   Reply With Quote