View Single Post
Old 05-08-2017, 03:50 PM   #13
confurious
Junior Member
 
Location: california

Join Date: Apr 2017
Posts: 9
Default

Quote:
Originally Posted by Brian Bushnell View Post
The difference between binning and normalization would be that binning seeks to divide the reads into different organisms prior to assembly, so they can be assembled independently, using less time and memory per job. Normalization simply attempts to reduce the coverage of high-depth organisms/sequences, but still keeps the dataset intact. With no high-depth component, normalization will basically do nothing (unless you configure it to throw away low-depth reads, which is BBNorm's default behavior), but binning should still do something.

Working with huge datasets is tough when you have compute time limitations. But, BBNorm should process data at ~20Mbp/s or so (with "passes=1 prefilter", on a 20-core machine), which would be around 1.7Tbp/day, so it should be possible to normalize or generate a kmer-depth histogram from a several-Tbp sample in 7 days...

But, another option is to assemble independently, deduplicate those assemblies together, then co-assemble only the reads that don't map to the deduplicated combined assembly. The results won't be as good as a full co-assembly, but it is more computationally tractable.
For the alternative strategy which is to assemble them independently and deduplicate them using dedupe.sh . I am wondering should I then map normalized reads to the deduplicated contigs or original reads (10x bigger)? It kind of makes sense to me to just go with the normalized reads since I wanted to co-assemble them afterwards anyway.

Thanks.
confurious is offline   Reply With Quote