SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
fastq-dump on SRA files harlock0083 Bioinformatics 14 10-18-2018 03:19 AM
SRA to fastq conversion with fastq-dump loses sequences pcantalupo Bioinformatics 13 10-08-2015 04:09 PM
SRA Toolkit and Conversion to Illumina Fastq Format snape_ar Bioinformatics 10 07-29-2012 07:45 PM
problem understanding NCBI SRA fastq files efoss Bioinformatics 4 03-30-2012 07:17 AM
sra-lite to fastq problem: no output pickrell Bioinformatics 0 02-03-2011 11:26 AM

Reply
 
Thread Tools
Old 06-15-2012, 05:07 AM   #1
hui_shi
Junior Member
 
Location: UK

Join Date: Oct 2010
Posts: 6
Smile problem with sra toolkit fastq-dump sratoolkit.2.1.10-win64

Hello,

I was trying to use fastq-dump to convert a .sra file to .fastq format.

The data contains paired end reads, so when I type in commands,
fastq-dump --split-files .sra, it gives two files, *_1.fastq and *_2.fastq.

But the problem is the size of the two files are different, one is a lot bigger than the other. This doesn't seems to be right.

Does anyone know why this happens? and how to fix it? Shouldn't the two files contain the same number of reads, so that the size should be the same as well?

Thank you,

Hui

Last edited by hui_shi; 06-15-2012 at 05:11 AM.
hui_shi is offline   Reply With Quote
Old 06-15-2012, 05:43 AM   #2
vadim
Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 37
Default

Could you post the run accession please?
vadim is offline   Reply With Quote
Old 06-15-2012, 05:55 AM   #3
hui_shi
Junior Member
 
Location: UK

Join Date: Oct 2010
Posts: 6
Default

Thanks for your reply. I didn't use --accession or -A, don't know if that's what you mean. Actually, I don't understand why using -A, except to modify the output name.

The full command I typed in was:

fastq-dump --split-files SRR443885.sra

it generates two files, SRR443885_1.fastq and SRR443885_2.fastq, but they are of different size.

Thanks,

Hui

Last edited by hui_shi; 06-15-2012 at 06:01 AM.
hui_shi is offline   Reply With Quote
Old 06-15-2012, 06:57 AM   #4
vadim
Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 37
Default

The spot descriptor claims that the read length is 150bp, 75 forward and 75 reverse. But the actual read length is 100 bases only. I guess fastq-dump just obeys the rules by splitting the read at 75, so the reverse read becomes 3 times smaller than the forward.

The run metadata is here:
http://www.ncbi.nlm.nih.gov/Traces/s...&run=SRR443885

The experiment metadata is here:
http://www.ncbi.nlm.nih.gov/sra/SRX116341?&report=full

You can contact them at sra@ncbi about this.
vadim is offline   Reply With Quote
Old 01-13-2014, 12:23 AM   #5
emp
Member
 
Location: india

Join Date: Jan 2014
Posts: 11
Default regarding small sra format file

I wanted to convert .sra file of size 289.98 Mb. when i am using sratool fastq-dump.. it is generating error message as:-



An error occurred during processing.
A report was generated into the file '.../ncbi_error_report.xml'.
If the problem persists, you may consider sending the file


kindly help me out for this ASAP
emp is offline   Reply With Quote
Old 01-13-2014, 12:30 AM   #6
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

Did you try looking at what was in the file? Given that it's an error report, perhaps it's informative...
dpryan is offline   Reply With Quote
Old 01-13-2014, 12:37 AM   #7
emp
Member
 
Location: india

Join Date: Jan 2014
Posts: 11
Default

Respected dpryan,


The file is in xml format which on opening with google chrome is giving code. It does'nt have any error specified there in...

Here in is the file which i am trying to convert to fastq.
http://www.ncbi.nlm.nih.gov/sra/SRX370497
emp is offline   Reply With Quote
Old 01-13-2014, 12:49 AM   #8
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

Try upgrading your copy of the SRA toolkit, that usually fixes this sort of issue (I tried to extract the file you linked and couldn't until I upgraded my local copy of fastq-dump).
dpryan is offline   Reply With Quote
Old 01-13-2014, 01:02 AM   #9
emp
Member
 
Location: india

Join Date: Jan 2014
Posts: 11
Default

kindly tell me how can i upgrade sra tool kit??


Is it through verbose??
emp is offline   Reply With Quote
Old 01-13-2014, 01:04 AM   #10
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

Try here, on NCBI
dpryan is offline   Reply With Quote
Old 05-21-2015, 03:08 AM   #11
tw7649116
Junior Member
 
Location: Yesan,Korea

Join Date: Mar 2014
Posts: 8
Default

Dear hui_shi,

I have the same problem with convert the sra to pair-end fastq.
Do you remember how to solve this problem?

Thank you very much!!
tw7649116 is offline   Reply With Quote
Old 05-21-2015, 04:23 AM   #12
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,059
Default

Are you using the latest SRAtoolkit? Have you done the steps described in this post: http://seqanswers.com/forums/showpos...6&postcount=7?
GenoMax is offline   Reply With Quote
Old 05-21-2015, 05:05 PM   #13
tw7649116
Junior Member
 
Location: Yesan,Korea

Join Date: Mar 2014
Posts: 8
Default

Thank you GenoMax,

I used the latest version.
In that way we need to redownload the file?
Thank you! If I can't solve this, may be I need to download again!

Best wishes
tw7649116 is offline   Reply With Quote
Old 05-21-2015, 05:21 PM   #14
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,059
Default

If you have downloaded the .sra file have you tried the command this way?

Code:
c:\> fastq-dump.exe -F --split-files c:\path_to\SRAfile.sra
Sometimes people do submit unequal length data sets so if you have mismatched cycle numbers that is a possibility. What SRA# are you looking at?
GenoMax is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 03:09 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO