SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
SRA -> FastQ, Problem with SRA toolkit? kelseyca Bioinformatics 12 05-23-2013 11:59 AM
SRA Toolkit in ubuntu 12.04 shrujan Bioinformatics 33 04-18-2013 03:11 PM
How to download RNAseq data from SRA for one species like mouse afiroz Bioinformatics 3 01-04-2013 09:44 PM
SRA archive anagari Bioinformatics 0 06-13-2011 11:43 AM
Download from SRA archive SongLi Bioinformatics 4 04-22-2011 09:55 AM

Reply
 
Thread Tools
Old 11-19-2014, 02:57 AM   #1
Retro
Member
 
Location: Pennsylvania

Join Date: Apr 2011
Posts: 27
Default new usage of SRA toolkit/ SRA archive data download

It seems that the NCBI SRA archive changed the way how files can be downloaded. Up till now we used the link from SRA website to download files with Aspera Connect, then we used SRA toolkit to extract fasta sequences. Now the there is I must say a little confusing description that we are not able to apply. It seems that SRA toolkit can be used to directly process data from NCBI website. Did anybody solve this situation? We are working in Windows environment. Thanks.
link to SRA description:
http://www.ncbi.nlm.nih.gov/books/NB...sra_data_using
Retro is offline   Reply With Quote
Old 11-19-2014, 03:06 AM   #2
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,077
Default

Can you post an example of an accession # that is not working as expected?
GenoMax is offline   Reply With Quote
Old 11-19-2014, 03:55 AM   #3
Retro
Member
 
Location: Pennsylvania

Join Date: Apr 2011
Posts: 27
Default

The change applies for all SRA files. So a random example:
http://trace.ncbi.nlm.nih.gov/Traces/sra/?run=SRR617107

when I go to the download tab, there used to be links to FTP and Aspera downloads. Now there is only the new description on the use of SRA toolkit.
Retro is offline   Reply With Quote
Old 11-19-2014, 04:28 AM   #4
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,077
Default

There is always the option of getting the fastq files directly from ENA avoiding sratoolkit altogether.

Corresponding URL for the example you posted above:

ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR617/SRR617107/

ftp://ftp.sra.ebi.ac.uk/vol1/srr/SRR617/SRR617107
GenoMax is offline   Reply With Quote
Old 11-19-2014, 04:34 AM   #5
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,077
Default

Corresponding NCBI SRA direct URL (using information from SRA link you included above):

ftp://ftp-trace.ncbi.nih.gov/sra/sra...617/SRR617107/
GenoMax is offline   Reply With Quote
Old 11-19-2014, 05:00 AM   #6
Retro
Member
 
Location: Pennsylvania

Join Date: Apr 2011
Posts: 27
Default

OK, that works, thanks.

However, it goes through regular download, the Aspera connection was much better. If I understand it correctly, SRA toolkit now allows processing the files directly from the NCBI site without the need to download them. For example, using the fastq dump to transform .sra files to fasta. Base on the description available on NCBI (link bellow), I was not able to do it though.

http://www.ncbi.nlm.nih.gov/books/NB...sra_data_using
Retro is offline   Reply With Quote
Old 11-20-2014, 09:33 AM   #7
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,077
Default

Quote:
Originally Posted by Retro View Post
However, it goes through regular download, the Aspera connection was much better. If I understand it correctly, SRA toolkit now allows processing the files directly from the NCBI site without the need to download them. For example, using the fastq dump to transform .sra files to fasta. Base on the description available on NCBI (link bellow), I was not able to do it though.

http://www.ncbi.nlm.nih.gov/books/NB...sra_data_using
After upgrading to the latest sratoolkit (v.2.4.2-1) I tried the new method out. Here is what I discovered.

In order to get the downloads to work, every user (especially if you are on a shared system/cluster) will have to run the configuration utility (help located at: http://trace.ncbi.nlm.nih.gov/Traces...lkit_doc&f=std) and set an appropriate path for storing configuration directories/files. Remember to save settings before you exit the utility.

Hint: Do the following in a xterm/X11 window if you want the text to be properly formatted.

Code:
$ /path_to/vdb-config -i
Once this is done then you will be able to download fastq files (and other data) directly from NCBI without downloading the .sra files.

Following example only prints five reads to screen

Code:
$ /path_to/fastq-dump -X 5 -Z SRR390729
This command will then download the full data file as fastq to the current directory
Code:
$ /path_to/fastq-dump SRR390729

Last edited by GenoMax; 11-20-2014 at 09:37 AM.
GenoMax is offline   Reply With Quote
Old 08-20-2015, 02:05 PM   #8
fibar
Member
 
Location: Argentina

Join Date: Feb 2013
Posts: 19
Default

I confirm GenoMax last reply. I updated my version to 2.5.2 and it's working with the mentioned commands.

This new version includes the setting of a proxy at the 'vdb-config -i' window, which in my case I had to enable and add as 'proxy:port'. If not, the process remained stuck with no warnings.

If you don't specify a directory, it will be downloaded at the one you are standing.

Remember '--split-files' when you are downloading PE reads.
fibar is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:14 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO