View Single Post
Old 07-24-2014, 01:03 PM   #22
scottdson
Junior Member
 
Location: Canada

Join Date: Jul 2011
Posts: 5
Default

Hi Brian,

Thanks for releasing this tools it's a great help. I have a set of fastq files for paired-end reads, 9 sets of paired files with around 46 million reads in each. You're tool has been great for getting rid of left hand adapter contamination. I do have a bit of a weird experience though, one pair of files behaves strangely.

For this pair of files bbduk finishes without incident, but after roughly 12 million reads in the out1 file there is an error, while the out2 file is fine. Looking into the error it seems that around 4 hundred sequences were lost in the out1 file but remained in the out2 file.

out1 looked like this
Quote:
@HWI-D00423:52:H8TMKADXX:1:1116:16453:62330 1:N:0:GCCAAT
GCTGAGTCTGAAGAGTTTATTGCCTTCTCTGCTTCGTAGAGAACGTAGCATGTTTCTTCAGCACACCTTTTGAAGCGCATCGAGCATGGCTGAGTGTGTTGTATCTCTACTGTTAGTCCACTTCTCAGGACCAAATCATCAACAGATCGGA
+
CCCFFFFFHHHHHJJHHIJJJJJJJJJJJJJJJJJJHJJJIJJJJHIJIJIJJJJJJJJJJJJJJJJJJIJIHHEHFFDDDDDDDDDDDDDDDDCDDDDDDDCDEEDEEDEDEDDDDDDDDDDDDDEDDDDDDDDDDDDDDDDDDDDDDDD
@HWI-D00423:52:H8TMKADXX:1:1116:16262:62363 1:N:0:GCCAAT
GCCTGCATTAAACATTGAGTGAACTTTCCAGAAACACTCTTTCAGAGATCTTCAATGCGTGGGAAGAGTTTTTCACCTGCAGTATAGCTTGCCTCTACCAACTGTTTGCCCTTAGAAATGTTCAATTTCACAAAGCTATGTAJJJJJJJJHGIJJIJJJJJJIJJJHHHHHHFFFFFEEEEFFFEDDDDDDDDDECDEDDDDDDDDDDEDDEDDCDDDDDDDDDDDDDCCEEDDDEDDDDEEEEE
@HWI-D00423:52:H8TMKADXX:1:1116:20066:62469 1:N:0:GCCAAT
GTTCCAAGCTCCGGCGAGGGAGGCATCCGCCCCGACTCGGGGCTTCTCCTGCCCAGTCTGCCCAAGCGTAGAGCCCTGCTCTCTGGGAACTCACCTCCCCGCTCGGGGAGAGCCCGGTTAGGGCCGCGGGAGGCCCCAGTCTCAGACCTTC
+
CCCFFFFFHHHHHJJJJJJJJJJJJJJJJJJJJHHFFDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDBBBBDDDDDDDDDDDD>BBDDDDDDDDDDDDDDDDDDDDDDDECDDDDDD
At first i thought the quality line of the "@HWI-D00423:52:H8TMKADXX:1:1116:16262:62363 1:N:0:GCCAAT" block had been concatenated to the sequence, but i soon saw that the fastq blocks below this one were out of step with the ones in the file from the second ends. Hence finding out that over 400 fastq blocks disappeared.

I've run this a couple of times and it happens in the same place which is suggesting a programmatic cause rather than a memory leak.

I also selected 10 fastq blocks from across the "@HWI-D00423:52:H8TMKADXX:1:1116:16262:62363 1:N:0:GCCAAT" block and bbduk works perfectly on these so it doesn't seem to be anything with the read or fastq format of the region.

As a quick work around i'm going to try splitting the fastq files in half and running the two smaller sets through bbduk.

Thanks again for the great tool, just thought i would provide this feed back here.

Scott
scottdson is offline   Reply With Quote