Seqanswers Leaderboard Ad

**adamdeluca** · 08-10-2010, 05:53 AM

Very interesting, I will give them a try.

Something to look at, there is a parallel implementation of bzip2

**krobison** · 08-10-2010, 06:17 AM

What would be really nice would be for some of these options to be available in the downstream tools themselves -- e.g. bwa & bowtie (as far as I know) need the input FASTQs decompressed. It would certainly be convenient if they could read the compressed formats (though bwa with short reads ends up reading them twice, so the overhead of decompressing twice might not be worth it).

**jkbonfield** · 08-10-2010, 06:23 AM

The bsc tool is also parallel, both multi-threaded and mpi capable. I disabled it for the purposes of benchmarking though, to be fair. See http://libbsc.com for more details. I've been quite impressed with it so far.

James

**lh3** · 08-10-2010, 11:37 AM

bwa has been supporting gzip'ed fastq for nearly two years. Minor modification can make it work with bzip'ed or bsc'ed fastq files, although by design bwa cannot support multiple compression algorithms at the same time. Maq's gzip support is later and is only available in SVN. Bowtie accepts piping, so supporting compression or not does not matter too much.

BTW, I did not know bsc before, but it looks very impressive to me, too.

EDIT: a lot of free compressors (e.g. quicklz, bsc and rangecoder) are licensed under GPL or LGPL. This becomes annoying when we want to release source code under a permissive open source license (e.g. BSD and MIT/X11) such that everyone can use the library/tool freely. Another similar practical issue is the availability of other language bindings. gzip is by far the most widely supported library.

**drio** · 08-10-2010, 12:45 PM

Bfast has been also supporting gzip and bzip2 for a long time.

**jkbonfield** · 08-10-2010, 03:12 PM

Yeah GPL can be a pain like that at times.

For what it's worth, I'm happy to release fastq2fqz and fqz2fastq under BSD. It's kind of trivial mix of zlib and staden io_lib anyway, both of which are already BSD.

The fqzcomp code was based on GPL code, although the basic design of what it does is trivial enough to rewrite using a more free library. (Hah! "more free" - that'll wind up the GPL crowd). I doubt I'd ever get the time though.

James

PS. I'm totally with you on gzip being ubiqitous in language bindings. It's also incredibly fast at decompression compared to most, so it's ideal for a lot of our use cases. It's good to see many tools using at least some sort of on-the-fly compression.

Topics	Statistics	Last Post
Decoding Neurodegeneration with Advanced RNA Sequencing by seqadmin Started by seqadmin, 12-30-2024, 01:35 PM	0 responses 26 views 0 likes	Last Post by seqadmin 12-30-2024, 01:35 PM
Evaluating Genome Sequencing for ECMO Patients in the NICU by seqadmin Started by seqadmin, 12-17-2024, 10:28 AM	0 responses 41 views 0 likes	Last Post by seqadmin 12-17-2024, 10:28 AM
New Genetic Toolkit Refines Studies on Gene Function and Disease by seqadmin Started by seqadmin, 12-13-2024, 08:24 AM	0 responses 55 views 0 likes	Last Post by seqadmin 12-13-2024, 08:24 AM
Study Links Brain Mechanism to Emotional Responses in Animals and Humans by seqadmin Started by seqadmin, 12-12-2024, 07:41 AM	0 responses 41 views 0 likes	Last Post by seqadmin 12-12-2024, 07:41 AM

Seqanswers Leaderboard Ad

Announcement

Fastq compression - proof of concept

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News