SEQanswers

Go Back   SEQanswers > General



Similar Threads
Thread Thread Starter Forum Replies Last Post
TopHat error: could not open writer pipe gzip JonB Bioinformatics 11 07-30-2017 07:19 AM
Bowtie2 cannot read gzip format files yumtaoist Bioinformatics 7 02-27-2014 09:10 AM
Recover corrupt illumna fastq gzip file rnathg Bioinformatics 0 01-24-2012 06:46 PM

Reply
 
Thread Tools
Old 12-23-2012, 03:49 PM   #1
gsgs
Senior Member
 
Location: germany

Join Date: Oct 2009
Posts: 140
Default gzip

maybe I should ask this in a compression forum (too) ...
but the problem only happened here, when I downloaded 1000 genome files.

Apparently they don't decompress correctly on my system, the filelengths
are strange.


downloading from :
http://ftp.1000genomes.ebi.ac.uk/vol...ni_haplotypes/

E.g. chromosome 11 has 52335487 bytes as .gz , decompressing
gives a file of 107085824 bytes, which is a very bad compression rate
when e.g. compared to chromosome 1 which has 80MB as gz
and ~1.5GB when expanded.

Now, maybe my gzip is the wrong one ?
Although I never had problems and I downloaded and ungzipped
lots of big files recently without problem.

OK, I went to gzip-homepage, read about a recent bug
with big files > 2GB (chr11 is only 50MB) , downloaded
the recent version 1.2.4. Win32 , downloaded chromosome 11
again and decompressed it.
347996160 bytes ! More, but still not enough, e.g. much
fewer than chromosome 17.

There are similar problems with other chromosomes too,
although #17, which I had tried first seems to be correct.
(64160 lines)

Anyone else had similar problems ?
Any idea how to resolve it ?

----------------------------------------------
see also this thread:
http://seqanswers.com/forums/showthread.php?t=25635
new keyword for search engines:
README_omni_2123_samples_b37_SHAPEIT_haplotypes

Last edited by gsgs; 12-23-2012 at 04:08 PM.
gsgs is offline   Reply With Quote
Old 12-27-2012, 07:44 AM   #2
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,978
Default

Appears that you are doing this on a windows machine. I would suggest trying 7-zip program (http://www.7-zip.org/). It is free and has worked reliably for me with tar, zip, rar (basically you name it) compressed files.
GenoMax is offline   Reply With Quote
Old 12-27-2012, 07:56 AM   #3
gsgs
Senior Member
 
Location: germany

Join Date: Oct 2009
Posts: 140
Default

yes, thanks, that's exactly what I did in the meantime.
(I could have posted an update)
Seems to work correctly .
I must still figure out later what to do with files > 4GB, though.

I did try 7zip earlier but was first irritated that it displayed the
uncompressed filelength as 0. (7zip l chr22.gz)
But then later I figured out that
it still expands them (apparently) correctly.

(that gzip-thing did cost me another ~5hours :-( )

Last edited by gsgs; 12-27-2012 at 07:58 AM.
gsgs is offline   Reply With Quote
Old 12-27-2012, 07:59 AM   #4
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,978
Default

Quote:
Originally Posted by gsgs View Post
I must still figure out later what to do with files > 4GB, though.
64-bit version of 7-zip on a machine that has an NTFS formatted drive should work.
GenoMax is offline   Reply With Quote
Old 02-26-2016, 11:07 PM   #5
dellmerca
Junior Member
 
Location: UAE

Join Date: Feb 2016
Posts: 1
Default gzip

7zip currently treats .tar and .gz extraction as separate operations. These should be combined by default.

Dell
dellmerca is offline   Reply With Quote
Old 02-27-2016, 12:56 AM   #6
gsgs
Senior Member
 
Location: germany

Join Date: Oct 2009
Posts: 140
Default

... or at least have an option to combine them easily.
Files > 4GB could be expanded into multiple files <4GB

7zip often gives much better compression rates than gzip, so why does genbank use gzip ?
gsgs is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:54 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO