View Single Post
Old 06-17-2014, 11:48 PM   #12

Join Date: Jan 2010
Posts: 52

Originally Posted by Brian Bushnell View Post

2x100bp reads can have a 74bp insert or a 188bp insert size. 74bp insert means that the molecule being sequenced was shorter than read length, and as a result the data collection continued off the end of the genomic sequence and into the adapter. So, before merging, the reads each contained 26bp of junk. And a 188bp insert size means that the reads overlapped by 12 base pairs. BBMerge does not look for overlaps shorter than 12bp in the default mode; the shorter the overlap, the more likely that it's a false positive.

Thanks for the reply. It turns out that I confused your "insert size" with "inner distance" of pair-end reads. Since the ~25 million reads were exome-seq data, 44.594% assemble rate is a bit lower than what I expected. If the assemble is perfect, it means 50% of the fragments were actually longer than 188. I hope you could improve those fragments whose "insert size" is longer than 188 in the future.
woodydon is offline   Reply With Quote