Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • gzip

    maybe I should ask this in a compression forum (too) ...
    but the problem only happened here, when I downloaded 1000 genome files.

    Apparently they don't decompress correctly on my system, the filelengths
    are strange.


    downloading from :


    E.g. chromosome 11 has 52335487 bytes as .gz , decompressing
    gives a file of 107085824 bytes, which is a very bad compression rate
    when e.g. compared to chromosome 1 which has 80MB as gz
    and ~1.5GB when expanded.

    Now, maybe my gzip is the wrong one ?
    Although I never had problems and I downloaded and ungzipped
    lots of big files recently without problem.

    OK, I went to gzip-homepage, read about a recent bug
    with big files > 2GB (chr11 is only 50MB) , downloaded
    the recent version 1.2.4. Win32 , downloaded chromosome 11
    again and decompressed it.
    347996160 bytes ! More, but still not enough, e.g. much
    fewer than chromosome 17.

    There are similar problems with other chromosomes too,
    although #17, which I had tried first seems to be correct.
    (64160 lines)

    Anyone else had similar problems ?
    Any idea how to resolve it ?

    ----------------------------------------------
    see also this thread:
    Any topic/question that does not fit into the subcategories below. If you're unsure of where to put something, ask in here!

    new keyword for search engines:
    README_omni_2123_samples_b37_SHAPEIT_haplotypes
    Last edited by gsgs; 12-23-2012, 05:08 PM.

  • #2
    Appears that you are doing this on a windows machine. I would suggest trying 7-zip program (http://www.7-zip.org/). It is free and has worked reliably for me with tar, zip, rar (basically you name it) compressed files.

    Comment


    • #3
      yes, thanks, that's exactly what I did in the meantime.
      (I could have posted an update)
      Seems to work correctly .
      I must still figure out later what to do with files > 4GB, though.

      I did try 7zip earlier but was first irritated that it displayed the
      uncompressed filelength as 0. (7zip l chr22.gz)
      But then later I figured out that
      it still expands them (apparently) correctly.

      (that gzip-thing did cost me another ~5hours :-( )
      Last edited by gsgs; 12-27-2012, 08:58 AM.

      Comment


      • #4
        Originally posted by gsgs View Post
        I must still figure out later what to do with files > 4GB, though.
        64-bit version of 7-zip on a machine that has an NTFS formatted drive should work.

        Comment


        • #5
          gzip

          7zip currently treats .tar and .gz extraction as separate operations. These should be combined by default.

          Dell

          Comment


          • #6
            ... or at least have an option to combine them easily.
            Files > 4GB could be expanded into multiple files <4GB

            7zip often gives much better compression rates than gzip, so why does genbank use gzip ?

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM
            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            17 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            22 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 09:21 AM
            0 responses
            16 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-04-2024, 09:00 AM
            0 responses
            46 views
            0 likes
            Last Post seqadmin  
            Working...
            X