If all data can fit in memory, Clumpify needs the amount of time it takes to read and write the file once. If the data cannot fit in memory, it takes around twice that long.
Is there a way to force clumpify to use just memory (if enough is available) instead of writing to disk?

Edit: On second thought that may not be practical/useful but I will leave the question in for now to see if @Brian has any pointers.

For a 12G input gziped fastq file, clumpify made 28 temp files (each between 400-600M in size).

Edit 2: Final file size was 6.8G so a significant reduction in size.

