SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
fastq-dump on SRA files harlock0083 Bioinformatics 14 10-18-2018 04:19 AM
fastq-dump error erikm Bioinformatics 10 03-24-2016 05:23 AM
SRA to fastq conversion with fastq-dump loses sequences pcantalupo Bioinformatics 13 10-08-2015 05:09 PM
fastq-dump for Illumina duygu Bioinformatics 3 08-03-2011 03:00 PM
Fastq-dump.exe Giles Bioinformatics 2 06-11-2011 01:34 PM

Reply
 
Thread Tools
Old 10-25-2012, 03:33 PM   #1
UpsetNotMad Scientist
Junior Member
 
Location: SF

Join Date: Oct 2012
Posts: 6
Default fastq-dump for dummies

Can someone provide a dummies guide to fastq-dump? I mean a really dumb guide: download here, install here, open it by doing this, do this to an sra to output a fastq. Initially words would work but a video format would be highly valuable.

I see a SEQanswer youtube channel here. Full of short videos for the SRA toolkit and beyond. This could really help the large number of biologist that will begin using dbGAP datasets to guide their research.

Am I the only one that see a simple visual userguide as a useful resource?

Trust me I have read every thing google can find on fastq-dump and still can't get it to work on an sra. I am working in the windows environment. I expect most beginners will be in windows.

Last edited by UpsetNotMad Scientist; 10-26-2012 at 10:52 AM.
UpsetNotMad Scientist is offline   Reply With Quote
Old 10-26-2012, 12:23 AM   #2
colindaven
Senior Member
 
Location: Germany

Join Date: Oct 2008
Posts: 403
Default

a) do you have access to a Linux server ? If not, it shouldn't be too tricky to ask for a user account.

b) type the following:

fastq-dump mySRA.sra

On windows (guessing here), open a command shell, copy the _SRA and fastq-dump to the same directory.

# cd to the directory
cd c:\temp

#run program
fastq-dump mySRA.sra

Hope that helps
colindaven is offline   Reply With Quote
Old 10-26-2012, 01:34 AM   #3
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 870
Default

You might also want to look into getting your data from the ENA rather than GEO. They already do the extraction of files from the sra dumps and you can download them individually. They mirror all GEO data so you can just search with the GEO accession you want.

http://www.ebi.ac.uk/ena/
simonandrews is offline   Reply With Quote
Old 10-26-2012, 01:37 AM   #4
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 870
Default

Quote:
Originally Posted by colindaven View Post
fastq-dump mySRA.sra
The only thing I'd add is that if your data has more than one read per sample (ie paired end), then this will produce a single file with the two reads concatenated together. If you want separate files for the different reads you'll need to run:

fastq-dump --split-files mySRA.sra

..which should really have been the default behaviour.
simonandrews is offline   Reply With Quote
Old 10-26-2012, 04:46 AM   #5
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,854
Default

Since the OP was asking for directions on how to use SRA toolkit in a windows environment here goes:

1. Download the right software distribution (you should be using 64-bit windows with the file sizes involved .. if not, it is time to switch).

http://www.ncbi.nlm.nih.gov/Traces/s...?view=software (for 64-bit windows: http://ftp-private.ncbi.nlm.nih.gov/...1.16-win64.zip)

2. Extract the toolkit software folder and place it into a suitable location. e.g. c:\

3. Open a terminal window ("start" --> type "cmd" in the search box --> press enter). This should open a terminal window. Generally this will put you in your "home directory" (e.g. c:\Users\your_user_name).

4. In the terminal window.

Code:
cd c:\my_sra_files (replace with the right path for your SRA files)
dir *.sra (verify that directory contains the .sra files)
c:\sratoolkit.2.1.10-win64\bin\fastq-dump.exe --split-files filename.sra
6. Be patient. The files are large and it will take some time (5 -10 min) to complete the extraction. make sure you have enough space available on the disk where you are extracting the files. The above command should extract the "fastq" files in the same directory where your .sra files are.

7. Repeat for additional files as needed.

Last edited by GenoMax; 10-26-2012 at 05:15 AM. Reason: simplified directions
GenoMax is online now   Reply With Quote
Old 10-26-2012, 10:31 AM   #6
UpsetNotMad Scientist
Junior Member
 
Location: SF

Join Date: Oct 2012
Posts: 6
Default

Quote:
Originally Posted by GenoMax View Post
Since the OP was asking for directions on how to use SRA toolkit in a windows environment here goes:

1. Download the right software distribution (you should be using 64-bit windows with the file sizes involved .. if not, it is time to switch).

http://www.ncbi.nlm.nih.gov/Traces/s...?view=software (for 64-bit windows: http://ftp-private.ncbi.nlm.nih.gov/...1.16-win64.zip)

2. Extract the toolkit software folder and place it into a suitable location. e.g. c:\

3. Open a terminal window ("start" --> type "cmd" in the search box --> press enter). This should open a terminal window. Generally this will put you in your "home directory" (e.g. c:\Users\your_user_name).

4. In the terminal window.

Code:
cd c:\my_sra_files (replace with the right path for your SRA files)
dir *.sra (verify that directory contains the .sra files)
c:\sratoolkit.2.1.10-win64\bin\fastq-dump.exe --split-files filename.sra
6. Be patient. The files are large and it will take some time (5 -10 min) to complete the extraction. make sure you have enough space available on the disk where you are extracting the files. The above command should extract the "fastq" files in the same directory where your .sra files are.

7. Repeat for additional files as needed.
Awesome. THANKS! Already more useful than any guide online.

I unfortunately instantly get:

The procedure entry point GetErrorMode could not be located in the dynamic link library KERNEL32.dll.

When I run fastq-dump as directed with .sra file in the same directory and the exact cmd you said (adjusting for the directories). I am on a 32 bit windows system running XP (I know, really dated).

Any suggestions?

While I wait for replies. I am going to try the same thing on a 64 bit system with the 64 bit toolkit.

Also, I looked at ENA, however this SRA is restricted access so Fastq is not avaliable.

Start rant: It's nice that the Europeans don't put a burden on the less-equipped end-user. I don't get the logic of SRA anyway, especially encrypted SRA. Lets say a WGS experiment is 50G (low estimate, too). Decrypt give 100G total (old copy still there). Then make fastq gives another ~100G more. This data is now 200G of storage. While the original SRA saved 50% space? WTF? To save 50% (~50G) of space you cost the end-user 400% more resources? This does not include the hours of lost productivity due to the reformatting problems (like mine). End rant.

Last edited by UpsetNotMad Scientist; 10-26-2012 at 02:01 PM.
UpsetNotMad Scientist is offline   Reply With Quote
Old 10-26-2012, 11:03 AM   #7
UpsetNotMad Scientist
Junior Member
 
Location: SF

Join Date: Oct 2012
Posts: 6
Smile

Got it to work on a single SRA in Windows 7 64-bit with 64 bit toolkit. I was previously using XP with 32-bit toolkit. Now how do I get it do this on a full study instead of a run where the sra files are in a crap load of folders?

BTW, GenoMax I could hug you right now. Not in an awkward way either.

Last edited by UpsetNotMad Scientist; 10-26-2012 at 11:08 AM.
UpsetNotMad Scientist is offline   Reply With Quote
Old 10-26-2012, 11:09 AM   #8
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,854
Default

Quote:
Originally Posted by UpsetNotMad Scientist View Post
I unfortunately instantly get:

The procedure entry point GetErrorMode could not be located in the dynamic link library KERNEL32.dll.

When I run fastq-dump as directed with .sra file in the same directory and the exact cmd you said (adjusting for the directories). I am on a 32 bit windows system running XP (I know, really dated).

Any suggestions?
Do you have service pack 3 for Windows XP installed? If not you may need to bite the bullet and install that. Apparently the error you mentioned may be related to absence of service pack 3 for XP.

You are bound to run into some problem or the other using 32-bit windows. If you do have access to a 64-bit machine (and NTFS formatted disks that can handle large single files) you may want to switch.
GenoMax is online now   Reply With Quote
Old 10-26-2012, 11:11 AM   #9
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,854
Default

Quote:
Originally Posted by UpsetNotMad Scientist View Post
Got it to work on a single SRA in Windows 7 64-bit with 64 bit toolkit. I was previously using XP with 32-bit toolkit. Now how do I get it do this on a full study instead of a run where the sra files are in a crap load of folders?
Not sure if this would work. Give it a try
Code:
fastq-dump --split-files *.sra
GenoMax is online now   Reply With Quote
Old 10-26-2012, 11:19 AM   #10
UpsetNotMad Scientist
Junior Member
 
Location: SF

Join Date: Oct 2012
Posts: 6
Default

Quote:
Originally Posted by GenoMax View Post
Do you have service pack 3 for Windows XP installed? If not you may need to bite the bullet and install that. Apparently the error you mentioned may be related to absence of service pack 3 for XP.

You are bound to run into some problem or the other using 32-bit windows. If you do have access to a 64-bit machine (and NTFS formatted disks that can handle large single files) you may want to switch.
Yup, Service Pack 3 installed. I will just use the 64-bit system, Windows 7, NTFS.
UpsetNotMad Scientist is offline   Reply With Quote
Old 10-26-2012, 02:55 PM   #11
UpsetNotMad Scientist
Junior Member
 
Location: SF

Join Date: Oct 2012
Posts: 6
Default

Quote:
Originally Posted by GenoMax View Post
Not sure if this would work. Give it a try
Code:
fastq-dump --split-files *.sra
fastq-dump --split-files *.sra doesn't work for the same sra files even in the same directory let alone a directory of folder.

the decrypt.bin can decrypt all the files in a series of directories but fastq-dump can't convert?

Last edited by UpsetNotMad Scientist; 10-26-2012 at 04:13 PM.
UpsetNotMad Scientist is offline   Reply With Quote
Old 10-29-2012, 05:09 AM   #12
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,854
Default

Program input options are decided by the program authors and not all programs accept input the same way. The following does seem to work in a specific directory.

Code:
fastq-dump --split-files file1.sra file2.sra file3.sra
Out of curiosity how many folders/files are you working with. You may be able to use batch file processing but if you are going to do that it may be simpler to do it in a UNIX environment with shell scripts.

Quote:
Originally Posted by UpsetNotMad Scientist View Post
fastq-dump --split-files *.sra doesn't work for the same sra files even in the same directory let alone a directory of folder.

the decrypt.bin can decrypt all the files in a series of directories but fastq-dump can't convert?
GenoMax is online now   Reply With Quote
Old 10-29-2012, 09:51 AM   #13
UpsetNotMad Scientist
Junior Member
 
Location: SF

Join Date: Oct 2012
Posts: 6
Default

Quote:
Originally Posted by GenoMax View Post

Code:
fastq-dump --split-files file1.sra file2.sra file3.sra
Out of curiosity how many folders/files are you working with.
Thanks for your help GenoMax!

It is ~1400 runs. Here is an example: Click here

The folders are setup like this: SRP1\SRS1\SRX1\SRR1\SRA1.sra
Number of folders in each directory (not all same): 1\1\18\10\56\SRA1.sra

Putting in individual .sra in line of code seems excessive. The sra folder structure was created by the people who made fastq-dump and released it for windows. Either I am doing something wrong or there has to be a reasonable solution. Is there are way to do file1-1400.sra?

Last edited by UpsetNotMad Scientist; 11-05-2012 at 02:59 PM.
UpsetNotMad Scientist is offline   Reply With Quote
Old 02-10-2014, 01:20 PM   #14
jrt@thompsonclan.org
Junior Member
 
Location: NJ

Join Date: Nov 2009
Posts: 2
Default What if PE/SE status is unknown

I'm looking at a paper that doesn't specify what length of sequence they generated or whether it is paired end or single end.

What happens if you use --split-files and a .sra file that is really single end?
jrt@thompsonclan.org is offline   Reply With Quote
Old 02-10-2014, 01:27 PM   #15
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 870
Default

Quote:
Originally Posted by jrt@thompsonclan.org View Post
I'm looking at a paper that doesn't specify what length of sequence they generated or whether it is paired end or single end.

What happens if you use --split-files and a .sra file that is really single end?
It will work fine. You'll just end up with an extra _1 on the end of your file names.
simonandrews is offline   Reply With Quote
Old 02-10-2014, 02:56 PM   #16
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,854
Default

Quote:
Originally Posted by jrt@thompsonclan.org View Post
I'm looking at a paper that doesn't specify what length of sequence they generated or whether it is paired end or single end.
That information should be on the SRA page for that dataset.
GenoMax is online now   Reply With Quote
Reply

Tags
beginner, fastq-dump, sra format, video

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:54 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO