SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
TopHat Error: Could not find Bowtie index files /bowtie-0.12.5/indexes/. rebrendi Bioinformatics 11 06-22-2016 10:55 AM
BOWTIE - Build Indexes - tutorial andrehorta Bioinformatics 0 02-07-2011 04:52 AM
bowtie indexes for Soap2? maojn7488 Bioinformatics 2 07-29-2010 07:32 AM
problem with tophat commnd directing bowtie indexes repinementer Bioinformatics 2 06-16-2010 09:05 AM
Building indexes with bowtie-build bre Bioinformatics 4 04-01-2010 09:38 AM

Reply
 
Thread Tools
Old 11-10-2009, 06:24 AM   #1
Layla
Member
 
Location: London

Join Date: Sep 2008
Posts: 58
Default unzipping human bowtie indexes

Does anyone have a solution to unzipping the human index files needed for bowtie taken from http://bowtie-bio.sourceforge.net/md5s.shtml.

unzip h_sapiens_asm.ebwt.1.zip
Error: End-of-central-directory signature not found. Either this file is not a zipfile or it is a multi-part archive

I also tried the 2 parts, but still received the same error
h_sapiens_asm.ebwt.1.zip
h_sapiens_asm.ebwt.2.zip

Thanks for any help
L
Layla is offline   Reply With Quote
Old 11-10-2009, 06:57 AM   #2
bekkari
Member
 
Location: New Jersey

Join Date: Oct 2009
Posts: 10
Default

I had experienced same problem, so gave up and ended up building bowtie indexes on my own, but it took lot of time (more than 4 hrs)
bekkari is offline   Reply With Quote
Old 11-10-2009, 07:41 AM   #3
Ben Langmead
Senior Member
 
Location: Baltimore, MD

Join Date: Sep 2008
Posts: 200
Default

Tried it just now and didn't have any problems:

Quote:
$ wget ftp://ftp.cbcb.umd.edu/pub/data/bowt...asm.ebwt.1.zip
--10:32:47-- ftp://ftp.cbcb.umd.edu/pub/data/bowt...asm.ebwt.1.zip
=> `h_sapiens_asm.ebwt.1.zip'
Resolving ftp.cbcb.umd.edu... 128.8.119.241
(snip)
10:37:13 (6.27 MB/s) - `h_sapiens_asm.ebwt.1.zip' saved [1749293837]

$ unzip h_sapiens_asm.ebwt.1.zip
Archive: h_sapiens_asm.ebwt.1.zip
inflating: h_sapiens_asm.1.ebwt
inflating: h_sapiens_asm.2.ebwt
inflating: h_sapiens_asm.3.ebwt
inflating: h_sapiens_asm.4.ebwt
Did you check the MD5 to be sure you got the entire file w/o corruption? If so, can you give the output of 'unzip --version'? Here's mine:

Quote:
benjamin-langmeads-macbook-pro:tmp langmead$ unzip --version
caution: both -n and -o specified; ignoring -o
UnZip 5.52 of 28 February 2005, by Info-ZIP. Maintained by C. Spieler. Send
bug reports using http://www.info-zip.org/zip-bug.html; see README for details.
Ben
Ben Langmead is offline   Reply With Quote
Old 11-10-2009, 08:49 AM   #4
bekkari
Member
 
Location: New Jersey

Join Date: Oct 2009
Posts: 10
Default

Hi Ben,
I was talking about clicking on any of the link under "pre built indices" section of the page:
http://bowtie-bio.sourceforge.net/, will download the file but throws error while trying to open, not sure both ftp and web page indices sections are sourced from same place.
bekkari is offline   Reply With Quote
Old 11-10-2009, 09:01 AM   #5
Ben Langmead
Senior Member
 
Location: Baltimore, MD

Join Date: Sep 2008
Posts: 200
Default

Hi guys,

I still can't recreate it. Whether I wget it or click on it in Safari, it works fine. Can you try again to make sure it wasn't a temporary connection problem? And if it still doesn't work, can you give me the relevant OS/software details?

Thanks,
Ben
Ben Langmead is offline   Reply With Quote
Old 11-12-2009, 05:17 AM   #6
Layla
Member
 
Location: London

Join Date: Sep 2008
Posts: 58
Default

Hi Ben,

I receive the same errors wether I use curl/wget or click on the h_sapiens_asm.ebwt.1.zip file from
a)ftp://ftp.cbcb.umd.edu/pub/data/bowt...asm.ebwt.1.zip
b)http://bowtie-bio.sourceforge.net/index.shtml

bash-3.2$ curl ftp://ftp.cbcb.umd.edu/pub/data/bowt...asm.ebwt.1.zip > h_sapiens_asm.ebwt.1.zip
* About to connect() to ftp.cbcb.umd.edu port 21 (#0)
* Trying 128.8.119.241... connected
* Connected to ftp.cbcb.umd.edu (128.8.119.241) port 21 (#0)
* Connecting to 128.8.119.241 (128.8.119.241) port 37876
> SIZE h_sapiens_asm.ebwt.1.zip
< 213 1749293837
< 150 Opening BINARY mode data connection for h_sapiens_asm.ebwt.1.zip (1749293837 bytes).
* Getting file with size: 1749293837
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 1668M 0 0 0 0 0 0 --:--:-- 0:00:02 --:--:-- 0{ [data not shown]
8 1668M 8 142M 0 0 348k 0 1:21:36 0:06:59 1:14:37 334k* transfer closed with 1599396621 bytes remaining to read
* Received only partial file: 149897216 bytes
8 1668M 8 142M 0 0 348k 0 1:21:36 0:06:59 1:14:37 324k* Closing connection #0

curl: (18) transfer closed with 1599396621 bytes remaining to read

My version of unzip is the same as yours I believe on a Mac OS X version 10.5.7 (32GB):
bash-3.2$ unzip -version
caution: both -n and -o specified; ignoring -o
UnZip 5.52 of 28 February 2005, by Info-ZIP. Maintained by C. Spieler. Send
bug reports using http://www.info-zip.org/zip-bug.html; see README for details.

However the problem seems to be at the downloading stage.

Thanks for your help in advance

L
Layla is offline   Reply With Quote
Old 11-12-2009, 06:04 AM   #7
Ben Langmead
Senior Member
 
Location: Baltimore, MD

Join Date: Sep 2008
Posts: 200
Default

OK, I see that now too. Looks like the UMD FTP server is having issues:

Quote:
--08:43:46-- ftp://ftp.cbcb.umd.edu/pub/data/bowt...asm.ebwt.1.zip
(try: 5) => `h_sapiens_asm.ebwt.1.zip.1'
Connecting to ftp.cbcb.umd.edu|128.8.119.241|:21... connected.
Logging in as anonymous ... Logged in!
==> SYST ... done. ==> PWD ... done.
==> TYPE I ... done. ==> CWD /pub/data/bowtie_indexes ... done.
==> PASV ...
Cannot initiate PASV transfer.
==> PORT ...
Invalid PORT.
Retrying.
--08:43:51-- ftp://ftp.cbcb.umd.edu/pub/data/bowt...asm.ebwt.1.zip
(try: 6) => `h_sapiens_asm.ebwt.1.zip.1'
Connecting to ftp.cbcb.umd.edu|128.8.119.241|:21... connected.
Logging in as anonymous ... Logged in!
==> SYST ... done. ==> PWD ... done.
==> TYPE I ... done. ==> CWD /pub/data/bowtie_indexes ... done.
==> PASV ... done. ==> RETR h_sapiens_asm.ebwt.1.zip ... done.
Length: 1,749,293,837 (1.6G) (unauthoritative)

100%[=======================================================================================================================================================================>] 1,749,293,837 7.70M/s ETA 00:00

08:47:39 (7.32 MB/s) - `h_sapiens_asm.ebwt.1.zip.1' saved [1749293837]

$ unzip h_sapiens_asm.ebwt.1.zip
Archive: h_sapiens_asm.ebwt.1.zip
End-of-central-directory signature not found. Either this file is not
a zipfile, or it constitutes one disk of a multi-part archive. In the
latter case the central directory and zipfile comment will be found on
the last disk(s) of this archive.
unzip: cannot find zipfile directory in one of h_sapiens_asm.ebwt.1.zip or
h_sapiens_asm.ebwt.1.zip.zip, and cannot find h_sapiens_asm.ebwt.1.zip.ZIP, period.
$ md5sum h_sapiens_asm.ebwt.1.zip
375e2b7af3f0b0b5a4cd885b4adb91c8 h_sapiens_asm.ebwt.1.zip
I'll email them.

Ben
Ben Langmead is offline   Reply With Quote
Old 11-12-2009, 09:45 AM   #8
Layla
Member
 
Location: London

Join Date: Sep 2008
Posts: 58
Default

Hi all,

I have downloaded build hg18 from ucsc.

command to build the index:
./bowtie-build -f chr1.fa chr2.fa chr3.fa chr4.fa chr5.fa chr6.fa chr7.fa chr8.fa chr9.fa chr10.fa chr11.fa chr12.fa chr13.fa chr14.fa chr15.fa chr16.fa chr17.fa chr18.fa chr19.fa chr20.fa chr21.fa chr22.fa chrX.fa chrY.fa indexes/hg18

chr2.fa.1.ebwt,chr2.fa.2.ebwt,chr2.fa.3.ebwt,chr2.fa.4.ebwt plus the 2 rev.ebwt files are created but the loop starts from chr2, not chr1. Secondly, the indexing stops once chr2.fa has been indexed. Comma separating the .fa files does not help.



Thanks to anyone who can help
L
Layla is offline   Reply With Quote
Old 11-12-2009, 03:16 PM   #9
bioinfosm
Senior Member
 
Location: USA

Join Date: Jan 2008
Posts: 482
Default

I downloaded the human genome index today and it worked fine. Earlier I used the .asm index, but its headers are really long, and thus the generated results are not directly usable in ucsc, unless you sed and replace those with chr1 and so on...
__________________
--
bioinfosm
bioinfosm is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:36 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO