View Single Post
Old 12-06-2016, 08:47 AM   #5
Junior Member
Location: Hong Kong

Join Date: Jun 2015
Posts: 4

Originally Posted by Brian Bushnell View Post
In my tests, assembly with Spades and Megahit have time reductions from using Clumpified input that more than pays for the time needed to run Clumpify, largely because both are multi-kmer assemblers which read the input file multiple times. Something purely CPU-limited like mapping would normally not benefit much in terms of speed (though still a bit due to improved cache locality).
In fact, Megahit does not read the input files multiple times. It converts the fastq/a files into a binary format and read the binary file multiple times. I guess that cache locality is the key. Imagine that the same group of k-mers are processed (in different components of Megahit -- graph construction: assign k-mers to different buckets then sorting; local assembly & extracting iterative k-mers: insert k-mers into a hash table)... In this regard, alignment tools may also benefit from it substantially.

Great work Brian.
vout is offline   Reply With Quote