SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
fastq-dump on SRA files harlock0083 Bioinformatics 14 10-18-2018 03:19 AM
fastq-dump error erikm Bioinformatics 10 03-24-2016 04:23 AM
fastq-dump for Illumina duygu Bioinformatics 3 08-03-2011 02:00 PM
Fastq-dump.exe Giles Bioinformatics 2 06-11-2011 12:34 PM
Reduce file size after Illumina FASTQ to Sanger FASTQ conversion? jjw14 Illumina/Solexa 2 06-01-2010 04:35 PM

Reply
 
Thread Tools
Old 09-29-2011, 07:51 AM   #1
pcantalupo
Junior Member
 
Location: USA

Join Date: Sep 2011
Posts: 1
Default SRA to fastq conversion with fastq-dump loses sequences

Hello,

I converted an SRA archive (ftp://ftp-trace.ncbi.nlm.nih.gov/sra...953/SRR073769/) to fastq with the fastq-dump program (sratoolkit-2.1.6). The resulting fastq file had ~160,000 less sequences (2% of the total number of spots) than expected. Why does this occur?

Thank you,

Paul
pcantalupo is offline   Reply With Quote
Old 08-15-2012, 08:28 AM   #2
eeh_021
Junior Member
 
Location: Boston, MA

Join Date: Aug 2012
Posts: 3
Default

I've also experienced this problem. Did you find a solution?

Thank you,

Elizabeth
eeh_021 is offline   Reply With Quote
Old 08-15-2012, 11:31 AM   #3
ymc
Senior Member
 
Location: Hong Kong

Join Date: Mar 2010
Posts: 498
Default

how do u know the expected number of sequences?
ymc is offline   Reply With Quote
Old 08-15-2012, 12:24 PM   #4
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,978
Default

I am seeing the same number of sequences as reported on the SRA page:

http://www.ncbi.nlm.nih.gov/sra?term=SRR073769

in the file I downloaded.

Code:
../sratoolkit.2.1.16-centos_linux64/bin/fastq-dump.2.1.18 SRR073769.sra 
Written 8175900 spots for SRR073769.sra
Written 8175900 spots total
Code:
$ more SRR073769.fastq | grep "@SRR073769" | wc -l
8175900
GenoMax is offline   Reply With Quote
Old 08-15-2012, 02:15 PM   #5
eeh_021
Junior Member
 
Location: Boston, MA

Join Date: Aug 2012
Posts: 3
Default

Oh, I see.

I'm still a little confused about something:

This file, SRR035116.sra, for example, is 3.9Gb
When I convert it to fastq, however, it is only 2.2Gb. I checked the number of spots, and it's, surprisingly, the same.

Usually when I convert sra to fastq, my files get a lot bigger. Help?

Thank you!!
eeh_021 is offline   Reply With Quote
Old 08-16-2012, 04:10 AM   #6
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,978
Default

.sra file is ~383 Mb and the .fastq file is 1.6 G (on my filesystem). If your .sra file is truly that large then something must be wrong.

Use the aspera client that SRA provides to download the .sra file.


Quote:
Originally Posted by eeh_021 View Post

I'm still a little confused about something:

This file, SRR035116.sra, for example, is 3.9Gb
When I convert it to fastq, however, it is only 2.2Gb. I checked the number of spots, and it's, surprisingly, the same.

Usually when I convert sra to fastq, my files get a lot bigger. Help?

Thank you!!
GenoMax is offline   Reply With Quote
Old 08-16-2012, 07:57 AM   #7
eeh_021
Junior Member
 
Location: Boston, MA

Join Date: Aug 2012
Posts: 3
Default

383Mb?
http://www.ncbi.nlm.nih.gov/sra?term=SRR035116
If you go there, it is supposed to be 3.9Gb, and that's about how big it is when I download it...
eeh_021 is offline   Reply With Quote
Old 08-16-2012, 08:11 AM   #8
seb567
Senior Member
 
Location: Québec, Canada

Join Date: Jul 2008
Posts: 260
Default

Quote:
Originally Posted by eeh_021 View Post
383Mb?
http://www.ncbi.nlm.nih.gov/sra?term=SRR035116
If you go there, it is supposed to be 3.9Gb, and that's about how big it is when I download it...
Perhaps it is more straightforward to fetch it from Europe or Japan.

Compressed files (.fastq.gz or .fastq.bz2) are just easier to use than those .sra files.


Sébastien Boisvert
seb567 is offline   Reply With Quote
Old 08-16-2012, 10:39 AM   #9
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,978
Default

We are talking about two different data sets.

My response was for the dataset (SRR073769) that was in pcantalupo's original post.

Dataset you are referring to below is indeed 3.9 Gb.


Quote:
Originally Posted by eeh_021 View Post
383Mb?
http://www.ncbi.nlm.nih.gov/sra?term=SRR035116
If you go there, it is supposed to be 3.9Gb, and that's about how big it is when I download it...
GenoMax is offline   Reply With Quote
Old 08-27-2012, 08:57 AM   #10
csmatyi
Member
 
Location: Nebraska

Join Date: Oct 2011
Posts: 25
Default

Quote:
Originally Posted by eeh_021 View Post
Oh, I see.

I'm still a little confused about something:

This file, SRR035116.sra, for example, is 3.9Gb
When I convert it to fastq, however, it is only 2.2Gb. I checked the number of spots, and it's, surprisingly, the same.

Usually when I convert sra to fastq, my files get a lot bigger. Help?

Thank you!!
I don't think that's a problem if the fastq file gets bigger because the sra file is in binary anyway, which is more compact.
csmatyi is offline   Reply With Quote
Old 06-10-2014, 04:05 PM   #11
alireda82
Junior Member
 
Location: Murfreesboro, TN, USA

Join Date: Jun 2014
Posts: 1
Default

how to convert SRA file to FASTQ?
alireda82 is offline   Reply With Quote
Old 06-10-2014, 04:27 PM   #12
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,978
Default

Quote:
Originally Posted by alireda82 View Post
how to convert SRA file to FASTQ?
Use SRA toolkit: http://eutils.ncbi.nih.gov/Traces/sr...lkit_doc&f=std
GenoMax is offline   Reply With Quote
Old 10-08-2015, 03:00 PM   #13
VC87
Member
 
Location: Portugal

Join Date: Oct 2015
Posts: 18
Default

Hi everyone, i'm new here!
Can someone tell-me if it's possible to cenvert a WIG file type to FASTQ?thanks in advance
VC87 is offline   Reply With Quote
Old 10-08-2015, 04:09 PM   #14
blancha
Senior Member
 
Location: Montreal

Join Date: May 2013
Posts: 367
Default

No, the Wig files do not contain the sequences, just the coverage.
blancha is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 03:46 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO