SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
For MAQ: Is there a Tool to convert sanger-format fastq file to illumina-fotmat fastq byb121 Bioinformatics 6 12-20-2013 01:26 AM
Convert fastq from NCBI SRA to fasta and qual? kmkocot Bioinformatics 7 10-09-2012 09:15 AM
How convert multiple .sra files into .fastq in one go? TuA Bioinformatics 5 05-27-2011 08:32 AM
sra-lite to fastq problem: no output pickrell Bioinformatics 0 02-03-2011 11:26 AM
Question about using sra_toolkit to transform the SRA format into FASTQ format areyousad Bioinformatics 0 05-16-2010 10:56 PM

Reply
 
Thread Tools
Old 11-29-2010, 10:20 AM   #1
tbusch0000
Junior Member
 
Location: san jose, ca

Join Date: Nov 2010
Posts: 5
Default How to convert sra-lite format to fastq?

I am trying to dump sra-lite (sequence read archive) files to fastq format. On the NCBI Sequence Read Archive site it states:

...users are asked download runs of interest and execute dumps into the desired format using the SRA SDK toolkit available at http://www.ncbi.nlm.nih.gov/Traces/s...are&s=software

I downloaded the precompiled toolkit for 64-bit architecture onto my macbookpro running snow leopard and tried to run the fastq-dump executable from the terminal, and get the error message "cannot execute binary file".

Any guidance would be much appreciated!
tbusch0000 is offline   Reply With Quote
Old 11-29-2010, 10:33 AM   #2
SongLi
Member
 
Location: Durham

Join Date: Oct 2010
Posts: 19
Default

Although I can get their CentOS 64bit running, it's realy slow, take about 10hrs to unpack one file. I am also interested to know more about this new SRA-tools.
SongLi is offline   Reply With Quote
Old 11-29-2010, 10:45 AM   #3
tbusch0000
Junior Member
 
Location: san jose, ca

Join Date: Nov 2010
Posts: 5
Default

I just noticed they released a new MacOSX beta package.

I downloaded that one and entered in the terminal $./fastq-dump -A SRP000910 -D SRR070499.lite.sra

Received error message: "memory exhausted while constructing memory map within file system module - failed to open 'SRR070499.lite.sra'"
tbusch0000 is offline   Reply With Quote
Old 11-29-2010, 10:49 AM   #4
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,539
Default

Quote:
Originally Posted by tbusch0000 View Post
I downloaded the precompiled toolkit for 64-bit architecture onto my macbookpro running snow leopard and tried to run the fastq-dump executable from the terminal, and get the error message "cannot execute binary file".
My guess is you download a 64bit Linux binary, which won't work on the Mac.
maubp is offline   Reply With Quote
Old 11-29-2010, 10:55 AM   #5
tbusch0000
Junior Member
 
Location: san jose, ca

Join Date: Nov 2010
Posts: 5
Default

Quote:
Originally Posted by maubp View Post
My guess is you download a 64bit Linux binary, which won't work on the Mac.
Thanks, they've only just released the mac binaries. It will execute now, but gives the error message above.
tbusch0000 is offline   Reply With Quote
Old 11-29-2010, 11:04 AM   #6
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,539
Default

Quote:
Originally Posted by tbusch0000 View Post
Received error message: "memory exhausted while constructing memory map within file system module - failed to open 'SRR070499.lite.sra'"
How much RAM do you have, and how big is SRR070499.lite.sra?
maubp is offline   Reply With Quote
Old 11-29-2010, 11:07 AM   #7
tbusch0000
Junior Member
 
Location: san jose, ca

Join Date: Nov 2010
Posts: 5
Default

Quote:
Originally Posted by maubp View Post
How much RAM do you have, and how big is SRR070499.lite.sra?
I have 6GB RAM and the file is 3.5 GB
tbusch0000 is offline   Reply With Quote
Old 11-29-2010, 11:13 AM   #8
seb567
Senior Member
 
Location: Québec, Canada

Join Date: Jul 2008
Posts: 256
Default

I have to download and convert files to test Ray, the assembler I am working on (see a thread elsewhere on this forum).

My take on sratoolkit (I use /software/sratoolkit.2.0b4-2-centos_linux64/):

It is slow, but it works. My guess is that data are compressed, using something like LIBBZ2 (it is just a guess). That explains the compression ratio as well as the slowness.

Quote:
[[email protected] MyShortReadArchive]$ ldd /software/sratoolkit.2.0b4-2-centos_linux64/fastq-dump
linux-vdso.so.1 => (0x00007fff361ff000)
libdl.so.2 => /lib64/libdl.so.2 (0x00000033f5a00000)
libz.so.1 => /lib64/libz.so.1 (0x00000033f6600000)
libbz2.so.1 => /lib64/libbz2.so.1 (0x0000003403e00000)
libm.so.6 => /lib64/libm.so.6 (0x00000033f5600000)
libc.so.6 => /lib64/libc.so.6 (0x00000033f5200000)
/lib64/ld-linux-x86-64.so.2 (0x00000033f4e00000)
Binaries are linked against libz and libbz2, but the slowness indicates that they probably rely on libbz2.
seb567 is offline   Reply With Quote
Old 11-29-2010, 11:25 AM   #9
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,539
Default

I'm not 100% sure how memmap works on the Mac, but it sounds like you should have enough RAM to load the whole file into memory (assuming no other memory hungry applications are running at the same time). Can you find a smaller example to test?
maubp is offline   Reply With Quote
Old 11-29-2010, 11:42 AM   #10
SongLi
Member
 
Location: Durham

Join Date: Oct 2010
Posts: 19
Default

Hi seb567,

How slow are you experiencing with fasta-dump?

My experiene is this: my computer is Xeon 2.4G 4core, 12G RAM, fasta-dump takes 600 minutes to finish one sra file.

I have tried the newest release and also different sra files. fastq-dump is always very slow.

Thanks,

Quote:
Originally Posted by seb567 View Post
I have to download and convert files to test Ray, the assembler I am working on (see a thread elsewhere on this forum).

My take on sratoolkit (I use /software/sratoolkit.2.0b4-2-centos_linux64/):

It is slow, but it works. My guess is that data are compressed, using something like LIBBZ2 (it is just a guess). That explains the compression ratio as well as the slowness.



Binaries are linked against libz and libbz2, but the slowness indicates that they probably rely on libbz2.
SongLi is offline   Reply With Quote
Old 11-29-2010, 11:47 AM   #11
seb567
Senior Member
 
Location: Québec, Canada

Join Date: Jul 2008
Posts: 256
Default

About 1-2 hours for a 2 GB sra file, though it is very approximated.

I downloaded all sra files for SRA010766, converted them from sra to fastq, then to fastq.gz. The script started yesterday 6 PM (EST).

So yours is slower, way slower.

Quote:
[[email protected] Illumina-SRX015621]$ ls
batch-3 SRR033559_1.fastq.gz SRR033570_1.fastq.gz SRR033581_1.fastq.gz SRR033592_1.fastq.gz SRR033603_1.fastq.gz SRR033614_1.fastq.gz SRR033625_1.fastq.gz
download.log SRR033559_2.fastq.gz SRR033570_2.fastq.gz SRR033581_2.fastq.gz SRR033592_2.fastq.gz SRR033603_2.fastq.gz SRR033614_2.fastq.gz SRR033625_2.fastq.gz
files.txt SRR033560_1.fastq.gz SRR033571_1.fastq.gz SRR033582_1.fastq.gz SRR033593_1.fastq.gz SRR033604_1.fastq.gz SRR033615_1.fastq.gz SRR033626_1.fastq.gz
list-sra.sh SRR033560_2.fastq.gz SRR033571_2.fastq.gz SRR033582_2.fastq.gz SRR033593_2.fastq.gz SRR033604_2.fastq.gz SRR033615_2.fastq.gz SRR033626_2.fastq.gz
newFiles SRR033561_1.fastq.gz SRR033572_1.fastq.gz SRR033583_1.fastq.gz SRR033594_1.fastq.gz SRR033605_1.fastq.gz SRR033616_1.fastq.gz SRR033627_1.fastq.gz
nohup.out SRR033561_2.fastq.gz SRR033572_2.fastq.gz SRR033583_2.fastq.gz SRR033594_2.fastq.gz SRR033605_2.fastq.gz SRR033616_2.fastq.gz SRR033627_2.fastq.gz
README SRR033562_1.fastq.gz SRR033573_1.fastq.gz SRR033584_1.fastq.gz SRR033595_1.fastq.gz SRR033606_1.fastq.gz SRR033617_1.fastq.gz SRR033628_1.fastq
SRA010766 SRR033562_2.fastq.gz SRR033573_2.fastq.gz SRR033584_2.fastq.gz SRR033595_2.fastq.gz SRR033606_2.fastq.gz SRR033617_2.fastq.gz SRR033628_2.fastq
SRR033552_1.fastq.gz SRR033563_1.fastq.gz SRR033574_1.fastq.gz SRR033585_1.fastq.gz SRR033596_1.fastq.gz SRR033607_1.fastq.gz SRR033618_1.fastq.gz SRR033629_1.fastq
SRR033552_2.fastq.gz SRR033563_2.fastq.gz SRR033574_2.fastq.gz SRR033585_2.fastq.gz SRR033596_2.fastq.gz SRR033607_2.fastq.gz SRR033618_2.fastq.gz SRR033629_2.fastq
SRR033553_1.fastq.gz SRR033564_1.fastq.gz SRR033575_1.fastq.gz SRR033586_1.fastq.gz SRR033597_1.fastq.gz SRR033608_1.fastq.gz SRR033619_1.fastq.gz SRR033630_1.fastq
SRR033553_2.fastq.gz SRR033564_2.fastq.gz SRR033575_2.fastq.gz SRR033586_2.fastq.gz SRR033597_2.fastq.gz SRR033608_2.fastq.gz SRR033619_2.fastq.gz SRR033630_2.fastq
SRR033554_1.fastq.gz SRR033565_1.fastq.gz SRR033576_1.fastq.gz SRR033587_1.fastq.gz SRR033598_1.fastq.gz SRR033609_1.fastq.gz SRR033620_1.fastq.gz SRR033631_1.fastq
SRR033554_2.fastq.gz SRR033565_2.fastq.gz SRR033576_2.fastq.gz SRR033587_2.fastq.gz SRR033598_2.fastq.gz SRR033609_2.fastq.gz SRR033620_2.fastq.gz SRR033631_2.fastq
SRR033555_1.fastq.gz SRR033566_1.fastq.gz SRR033577_1.fastq.gz SRR033588_1.fastq.gz SRR033599_1.fastq.gz SRR033610_1.fastq.gz SRR033621_1.fastq.gz SRR033632_1.fastq
SRR033555_2.fastq.gz SRR033566_2.fastq.gz SRR033577_2.fastq.gz SRR033588_2.fastq.gz SRR033599_2.fastq.gz SRR033610_2.fastq.gz SRR033621_2.fastq.gz SRR033632_2.fastq
SRR033556_1.fastq.gz SRR033567_1.fastq.gz SRR033578_1.fastq.gz SRR033589_1.fastq.gz SRR033600_1.fastq.gz SRR033611_1.fastq.gz SRR033622_1.fastq.gz SRR033633_1.fastq
SRR033556_2.fastq.gz SRR033567_2.fastq.gz SRR033578_2.fastq.gz SRR033589_2.fastq.gz SRR033600_2.fastq.gz SRR033611_2.fastq.gz SRR033622_2.fastq.gz SRR033633_2.fastq
SRR033557_1.fastq.gz SRR033568_1.fastq.gz SRR033579_1.fastq.gz SRR033590_1.fastq.gz SRR033601_1.fastq.gz SRR033612_1.fastq.gz SRR033623_1.fastq.gz
SRR033557_2.fastq.gz SRR033568_2.fastq.gz SRR033579_2.fastq.gz SRR033590_2.fastq.gz SRR033601_2.fastq.gz SRR033612_2.fastq.gz SRR033623_2.fastq.gz
SRR033558_1.fastq.gz SRR033569_1.fastq.gz SRR033580_1.fastq.gz SRR033591_1.fastq.gz SRR033602_1.fastq.gz SRR033613_1.fastq.gz SRR033624_1.fastq.gz
SRR033558_2.fastq.gz SRR033569_2.fastq.gz SRR033580_2.fastq.gz SRR033591_2.fastq.gz SRR033602_2.fastq.gz SRR033613_2.fastq.gz SRR033624_2.fastq.gz
seb567 is offline   Reply With Quote
Old 11-29-2010, 01:51 PM   #12
tbusch0000
Junior Member
 
Location: san jose, ca

Join Date: Nov 2010
Posts: 5
Default

Thanks for the tips.

I got the fastq-dump working on an x-large amazon cloud instance running cent os ami.
tbusch0000 is offline   Reply With Quote
Old 09-01-2011, 06:05 AM   #13
babaref
Junior Member
 
Location: Tehran

Join Date: Jul 2011
Posts: 2
Default

How to convert fastq format to sra files? is there any perl script for this conversion?
babaref is offline   Reply With Quote
Old 08-21-2013, 12:43 AM   #14
gsgs
Senior Member
 
Location: germany

Join Date: Oct 2009
Posts: 140
Default

I want the table, that converts a byte from the sra file
into a sequence of nucleotides

http://www.flutrackers.com/forum/showpost.php?p=507401

SRA toolkit sourcecode has "4na" and "2na"
gsgs is offline   Reply With Quote
Old 08-21-2013, 03:40 AM   #15
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,473
Default

Why don't you either use fastq-dump or just download the gzipped fastq files from ENA (such as this one)?

Last edited by dpryan; 08-21-2013 at 03:40 AM. Reason: forgot a word
dpryan is offline   Reply With Quote
Old 08-21-2013, 03:43 AM   #16
gsgs
Senior Member
 
Location: germany

Join Date: Oct 2009
Posts: 140
Default

I found that other thread, saying that the format is complicated,
so there is no such table.

http://seqanswers.com/forums/archive...p/t-12054.html


I'm having problems with files >4GB and wanted to test it
on a partially downloaded file first

I have to split the large files, so they work with my programs.
It's also faster, better for testing, dealing with 4GB files is tedious.

I doubt that sra-tools will work with such splitted files

Last edited by gsgs; 08-21-2013 at 03:47 AM.
gsgs is offline   Reply With Quote
Old 08-21-2013, 04:06 AM   #17
gsgs
Senior Member
 
Location: germany

Join Date: Oct 2009
Posts: 140
Default

OK, I tried to download the file to my external drive, it took 5.5h ,
until an error message was displayed that the file couldn't be copied.

Then I searched my main HD and found that it was put into a temporary file
which had 4631463048 Bytes, so apparently >4GB is possible on my
main drive but not on the external one.
(Windows XP, computer bought in 2010 or 2011)


I made a copy of that temporary file to another file on the maindrive,
then I closed the error window, and indeed, the temporary file was
deleted, but luckily I had the copy.
As expected I can't copy that file to the external drive nor can I access it
with any of my programs.
But DOS-commands copy,type,find do work.

So, I need a program that splits the big file into 2 smaller files, that can be assessed.
gsgs is offline   Reply With Quote
Old 08-21-2013, 04:10 AM   #18
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,440
Default

If you are using 32-bit windows XP (which you likely are) this may not be possible. What kind of format do you have on your external drive? You may need NTFS for files > 4GB.
GenoMax is offline   Reply With Quote
Old 08-21-2013, 04:12 AM   #19
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,473
Default

Ah, yeah, I expect that the SRA format is pretty non-trivial from the various discussions of it. Honestly, if your computer is having issues with files ~4GB then you might just be better off using someone else's (though check if the drive is NTFS formatted), particularly if you're stuck on windows. Got a labmate with a Mac?
dpryan is offline   Reply With Quote
Old 08-21-2013, 04:13 AM   #20
gsgs
Senior Member
 
Location: germany

Join Date: Oct 2009
Posts: 140
Default

I cannot just switch to Win64, since I need all my old programs
that were written on 16bit or 32bit
gsgs is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 11:18 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO