Unconfigured Ad

**adamdeluca** · 08-10-2010, 05:53 AM

Very interesting, I will give them a try.

Something to look at, there is a parallel implementation of bzip2

Xoilac TV chính thức - Trực tiếp bóng đá Xoilac HD - BLV Roy

Xoilac TV - Xem trực tiếp bóng đá xôi lạc HD, bình luận Tiếng Việt. Kênh XoiacTV cập nhật nhanh lịch thi đấu, nhận định, tỷ số bóng đá, BXH.

**krobison** · 08-10-2010, 06:17 AM

What would be really nice would be for some of these options to be available in the downstream tools themselves -- e.g. bwa & bowtie (as far as I know) need the input FASTQs decompressed. It would certainly be convenient if they could read the compressed formats (though bwa with short reads ends up reading them twice, so the overhead of decompressing twice might not be worth it).

**jkbonfield** · 08-10-2010, 06:23 AM

The bsc tool is also parallel, both multi-threaded and mpi capable. I disabled it for the purposes of benchmarking though, to be fair. See http://libbsc.com for more details. I've been quite impressed with it so far.

James

**lh3** · 08-10-2010, 11:37 AM

bwa has been supporting gzip'ed fastq for nearly two years. Minor modification can make it work with bzip'ed or bsc'ed fastq files, although by design bwa cannot support multiple compression algorithms at the same time. Maq's gzip support is later and is only available in SVN. Bowtie accepts piping, so supporting compression or not does not matter too much.

BTW, I did not know bsc before, but it looks very impressive to me, too.

EDIT: a lot of free compressors (e.g. quicklz, bsc and rangecoder) are licensed under GPL or LGPL. This becomes annoying when we want to release source code under a permissive open source license (e.g. BSD and MIT/X11) such that everyone can use the library/tool freely. Another similar practical issue is the availability of other language bindings. gzip is by far the most widely supported library.

**drio** · 08-10-2010, 12:45 PM

Bfast has been also supporting gzip and bzip2 for a long time.

**jkbonfield** · 08-10-2010, 03:12 PM

Yeah GPL can be a pain like that at times.

For what it's worth, I'm happy to release fastq2fqz and fqz2fastq under BSD. It's kind of trivial mix of zlib and staden io_lib anyway, both of which are already BSD.

The fqzcomp code was based on GPL code, although the basic design of what it does is trivial enough to rewrite using a more free library. (Hah! "more free" - that'll wind up the GPL crowd). I doubt I'd ever get the time though.

James

PS. I'm totally with you on gzip being ubiqitous in language bindings. It's also incredibly fast at decompression compared to most, so it's ideal for a lot of our use cases. It's good to see many tools using at least some sort of on-the-fly compression.

Topics	Statistics	Last Post
New Genomic Method Uncovers Ancient Hominin DNA by SEQadmin2 Started by SEQadmin2, Yesterday, 02:55 AM	0 responses 12 views 0 reactions	Last Post by SEQadmin2 Yesterday, 02:55 AM
Study Captures the First Moments of DNA Replication by SEQadmin2 Started by SEQadmin2, 07-24-2026, 12:17 PM	0 responses 12 views 0 reactions	Last Post by SEQadmin2 07-24-2026, 12:17 PM
Chemotherapy Leaves Detectable DNA Signatures in Childhood Tumors by SEQadmin2 Started by SEQadmin2, 07-23-2026, 11:41 AM	0 responses 13 views 0 reactions	Last Post by SEQadmin2 07-23-2026, 11:41 AM
Single-Cell Atlases Skew Toward European Ancestry, Analysis Finds by SEQadmin2 Started by SEQadmin2, 07-20-2026, 11:10 AM	0 responses 24 views 0 reactions	Last Post by SEQadmin2 07-20-2026, 11:10 AM

Unconfigured Ad

Fastq compression - proof of concept

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News