Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • unzipping human bowtie indexes

    Does anyone have a solution to unzipping the human index files needed for bowtie taken from http://bowtie-bio.sourceforge.net/md5s.shtml.

    unzip h_sapiens_asm.ebwt.1.zip
    Error: End-of-central-directory signature not found. Either this file is not a zipfile or it is a multi-part archive

    I also tried the 2 parts, but still received the same error
    h_sapiens_asm.ebwt.1.zip
    h_sapiens_asm.ebwt.2.zip

    Thanks for any help
    L

  • #2
    I had experienced same problem, so gave up and ended up building bowtie indexes on my own, but it took lot of time (more than 4 hrs)

    Comment


    • #3
      Tried it just now and didn't have any problems:

      $ wget ftp://ftp.cbcb.umd.edu/pub/data/bowt...asm.ebwt.1.zip
      --10:32:47-- ftp://ftp.cbcb.umd.edu/pub/data/bowt...asm.ebwt.1.zip
      => `h_sapiens_asm.ebwt.1.zip'
      Resolving ftp.cbcb.umd.edu... 128.8.119.241
      (snip)
      10:37:13 (6.27 MB/s) - `h_sapiens_asm.ebwt.1.zip' saved [1749293837]

      $ unzip h_sapiens_asm.ebwt.1.zip
      Archive: h_sapiens_asm.ebwt.1.zip
      inflating: h_sapiens_asm.1.ebwt
      inflating: h_sapiens_asm.2.ebwt
      inflating: h_sapiens_asm.3.ebwt
      inflating: h_sapiens_asm.4.ebwt
      Did you check the MD5 to be sure you got the entire file w/o corruption? If so, can you give the output of 'unzip --version'? Here's mine:

      benjamin-langmeads-macbook-pro:tmp langmead$ unzip --version
      caution: both -n and -o specified; ignoring -o
      UnZip 5.52 of 28 February 2005, by Info-ZIP. Maintained by C. Spieler. Send
      bug reports using http://www.info-zip.org/zip-bug.html; see README for details.
      Ben

      Comment


      • #4
        Hi Ben,
        I was talking about clicking on any of the link under "pre built indices" section of the page:
        http://bowtie-bio.sourceforge.net/, will download the file but throws error while trying to open, not sure both ftp and web page indices sections are sourced from same place.

        Comment


        • #5
          Hi guys,

          I still can't recreate it. Whether I wget it or click on it in Safari, it works fine. Can you try again to make sure it wasn't a temporary connection problem? And if it still doesn't work, can you give me the relevant OS/software details?

          Thanks,
          Ben

          Comment


          • #6
            Hi Ben,

            I receive the same errors wether I use curl/wget or click on the h_sapiens_asm.ebwt.1.zip file from
            a)ftp://ftp.cbcb.umd.edu/pub/data/bowt...asm.ebwt.1.zip
            b)http://bowtie-bio.sourceforge.net/index.shtml

            bash-3.2$ curl ftp://ftp.cbcb.umd.edu/pub/data/bowt...asm.ebwt.1.zip > h_sapiens_asm.ebwt.1.zip
            * About to connect() to ftp.cbcb.umd.edu port 21 (#0)
            * Trying 128.8.119.241... connected
            * Connected to ftp.cbcb.umd.edu (128.8.119.241) port 21 (#0)
            * Connecting to 128.8.119.241 (128.8.119.241) port 37876
            > SIZE h_sapiens_asm.ebwt.1.zip
            < 213 1749293837
            < 150 Opening BINARY mode data connection for h_sapiens_asm.ebwt.1.zip (1749293837 bytes).
            * Getting file with size: 1749293837
            % Total % Received % Xferd Average Speed Time Time Time Current
            Dload Upload Total Spent Left Speed
            0 1668M 0 0 0 0 0 0 --:--:-- 0:00:02 --:--:-- 0{ [data not shown]
            8 1668M 8 142M 0 0 348k 0 1:21:36 0:06:59 1:14:37 334k* transfer closed with 1599396621 bytes remaining to read
            * Received only partial file: 149897216 bytes
            8 1668M 8 142M 0 0 348k 0 1:21:36 0:06:59 1:14:37 324k* Closing connection #0

            curl: (18) transfer closed with 1599396621 bytes remaining to read

            My version of unzip is the same as yours I believe on a Mac OS X version 10.5.7 (32GB):
            bash-3.2$ unzip -version
            caution: both -n and -o specified; ignoring -o
            UnZip 5.52 of 28 February 2005, by Info-ZIP. Maintained by C. Spieler. Send
            bug reports using http://www.info-zip.org/zip-bug.html; see README for details.

            However the problem seems to be at the downloading stage.

            Thanks for your help in advance

            L

            Comment


            • #7
              OK, I see that now too. Looks like the UMD FTP server is having issues:

              --08:43:46-- ftp://ftp.cbcb.umd.edu/pub/data/bowt...asm.ebwt.1.zip
              (try: 5) => `h_sapiens_asm.ebwt.1.zip.1'
              Connecting to ftp.cbcb.umd.edu|128.8.119.241|:21... connected.
              Logging in as anonymous ... Logged in!
              ==> SYST ... done. ==> PWD ... done.
              ==> TYPE I ... done. ==> CWD /pub/data/bowtie_indexes ... done.
              ==> PASV ...
              Cannot initiate PASV transfer.
              ==> PORT ...
              Invalid PORT.
              Retrying.
              --08:43:51-- ftp://ftp.cbcb.umd.edu/pub/data/bowt...asm.ebwt.1.zip
              (try: 6) => `h_sapiens_asm.ebwt.1.zip.1'
              Connecting to ftp.cbcb.umd.edu|128.8.119.241|:21... connected.
              Logging in as anonymous ... Logged in!
              ==> SYST ... done. ==> PWD ... done.
              ==> TYPE I ... done. ==> CWD /pub/data/bowtie_indexes ... done.
              ==> PASV ... done. ==> RETR h_sapiens_asm.ebwt.1.zip ... done.
              Length: 1,749,293,837 (1.6G) (unauthoritative)

              100%[=======================================================================================================================================================================>] 1,749,293,837 7.70M/s ETA 00:00

              08:47:39 (7.32 MB/s) - `h_sapiens_asm.ebwt.1.zip.1' saved [1749293837]

              $ unzip h_sapiens_asm.ebwt.1.zip
              Archive: h_sapiens_asm.ebwt.1.zip
              End-of-central-directory signature not found. Either this file is not
              a zipfile, or it constitutes one disk of a multi-part archive. In the
              latter case the central directory and zipfile comment will be found on
              the last disk(s) of this archive.
              unzip: cannot find zipfile directory in one of h_sapiens_asm.ebwt.1.zip or
              h_sapiens_asm.ebwt.1.zip.zip, and cannot find h_sapiens_asm.ebwt.1.zip.ZIP, period.
              $ md5sum h_sapiens_asm.ebwt.1.zip
              375e2b7af3f0b0b5a4cd885b4adb91c8 h_sapiens_asm.ebwt.1.zip
              I'll email them.

              Ben

              Comment


              • #8
                Hi all,

                I have downloaded build hg18 from ucsc.

                command to build the index:
                ./bowtie-build -f chr1.fa chr2.fa chr3.fa chr4.fa chr5.fa chr6.fa chr7.fa chr8.fa chr9.fa chr10.fa chr11.fa chr12.fa chr13.fa chr14.fa chr15.fa chr16.fa chr17.fa chr18.fa chr19.fa chr20.fa chr21.fa chr22.fa chrX.fa chrY.fa indexes/hg18

                chr2.fa.1.ebwt,chr2.fa.2.ebwt,chr2.fa.3.ebwt,chr2.fa.4.ebwt plus the 2 rev.ebwt files are created but the loop starts from chr2, not chr1. Secondly, the indexing stops once chr2.fa has been indexed. Comma separating the .fa files does not help.



                Thanks to anyone who can help
                L

                Comment


                • #9
                  I downloaded the human genome index today and it worked fine. Earlier I used the .asm index, but its headers are really long, and thus the generated results are not directly usable in ucsc, unless you sed and replace those with chr1 and so on...
                  --
                  bioinfosm

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Current Approaches to Protein Sequencing
                    by seqadmin


                    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                    04-04-2024, 04:25 PM
                  • seqadmin
                    Strategies for Sequencing Challenging Samples
                    by seqadmin


                    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                    03-22-2024, 06:39 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 04-11-2024, 12:08 PM
                  0 responses
                  18 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 10:19 PM
                  0 responses
                  22 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 09:21 AM
                  0 responses
                  17 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-04-2024, 09:00 AM
                  0 responses
                  49 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X