I am happy to announce that my Genozip tool now has an optimization feature specifically for Ion Torrent BAM files: --optimize-ZM optimizes the ZM field (flow signal) as follows: negative Ion Torrent flow signal values are changed to zero and positives are rounded to the nearest 10. Eg: -20,212,427 -> 0,210,430. This improves the compression significantly, as seen below.
Example:
> wget ftp://ftp-trace.ncbi.nih.gov/Referen...rawlib.b37.bam
> genozip IonXpress_020_rawlib.b37.bam
> genozip IonXpress_020_rawlib.b37.bam --optimize-ZM -o IonXpress_020_rawlib.b37.optimized.bam.genozip
> ls -Ggh IonXpress_020_rawlib.b37*
-rw-rw-r--+ 1 26G Aug 13 23:53 IonXpress_020_rawlib.b37.bam
-rw-rw-r--+ 1 17G Aug 14 00:10 IonXpress_020_rawlib.b37.bam.genozip
-rw-rw-r--+ 1 12G Aug 14 00:17 IonXpress_020_rawlib.b37.optimized.bam.genozip
Documentation: https://genozip.com
Paper: https://www.researchgate.net/publica...ata_Compressor
Example:
> wget ftp://ftp-trace.ncbi.nih.gov/Referen...rawlib.b37.bam
> genozip IonXpress_020_rawlib.b37.bam
> genozip IonXpress_020_rawlib.b37.bam --optimize-ZM -o IonXpress_020_rawlib.b37.optimized.bam.genozip
> ls -Ggh IonXpress_020_rawlib.b37*
-rw-rw-r--+ 1 26G Aug 13 23:53 IonXpress_020_rawlib.b37.bam
-rw-rw-r--+ 1 17G Aug 14 00:10 IonXpress_020_rawlib.b37.bam.genozip
-rw-rw-r--+ 1 12G Aug 14 00:17 IonXpress_020_rawlib.b37.optimized.bam.genozip
Documentation: https://genozip.com
Paper: https://www.researchgate.net/publica...ata_Compressor