SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > Illumina/Solexa



Similar Threads
Thread Thread Starter Forum Replies Last Post
fastq-dump on SRA files harlock0083 Bioinformatics 14 10-18-2018 03:19 AM
SRA to fastq conversion with fastq-dump loses sequences pcantalupo Bioinformatics 13 10-08-2015 04:09 PM
problem with sra toolkit fastq-dump sratoolkit.2.1.10-win64 hui_shi Bioinformatics 13 05-21-2015 05:21 PM
SRA fastq-dump puzzle gibberwocky Bioinformatics 2 12-25-2014 09:14 PM
sra toolkit fastq-dump for paired end read set jgibbons1 Bioinformatics 2 12-05-2012 06:51 AM

Reply
 
Thread Tools
Old 10-07-2014, 07:05 AM   #1
MurielGB
Member
 
Location: Montpellier, France

Join Date: Oct 2013
Posts: 51
Default Convert SRA to FASTQ with fastq-dump but problem of read length

Hello,
I have Illumina paired end reads of length 76bp.
The problem is that when I use fastq-dump to obtain two files with paired reads separated, it splits the reads into 101bp and 51bp rather that 76+76...
I tried with the options --split-files, --split-spot, --split-3 and always have the same result.
I also tried different fastq-dump versions: 1 ; 2 ; 2.3.4 and 2.3.5.2.
Do you have an idea how I can do that ?
Thanks !
MurielGB is offline   Reply With Quote
Old 10-07-2014, 07:07 AM   #2
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,978
Default

It is possible that the dataset you are looking at has asymmetric reads. Have you looked at the record in SRA to see if that is the case?
GenoMax is offline   Reply With Quote
Old 10-07-2014, 07:11 AM   #3
MurielGB
Member
 
Location: Montpellier, France

Join Date: Oct 2013
Posts: 51
Default

I don't know if this is possible.
When I convert SRA to FASTQ without any option, I obtain a fastq with 152bp reads.

Here is the page where I downloaded the sra file : http://www.ncbi.nlm.nih.gov/sra/?term=SRR1174239
MurielGB is offline   Reply With Quote
Old 10-07-2014, 07:14 AM   #4
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,978
Default

Based on the record it looks like a standard 2 x 101 bp PE dataset.

Update: The information in SRA appears to be incorrect since the dataset is dumping with 152 bp length (so would be 2 x 76).

Last edited by GenoMax; 10-07-2014 at 07:20 AM.
GenoMax is offline   Reply With Quote
Old 10-07-2014, 07:19 AM   #5
MurielGB
Member
 
Location: Montpellier, France

Join Date: Oct 2013
Posts: 51
Default

Yeah I agree but then why do I obtain 152bp reads when using fastq-dump ?!!
MurielGB is offline   Reply With Quote
Old 10-07-2014, 07:24 AM   #6
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,978
Default

This appears to be an asymetric dataset (101 x 51) as originally suspected. See attached screencap.
Attached Images
File Type: png Capture.PNG (20.4 KB, 8 views)
GenoMax is offline   Reply With Quote
Old 10-07-2014, 07:31 AM   #7
MurielGB
Member
 
Location: Montpellier, France

Join Date: Oct 2013
Posts: 51
Default

OK but when I do fastqc on the fastq file with 101bp reads, I obtain the attached graph that is, to me, typical of problems of read splitting with bad length.
Attached Images
File Type: png quality.png (9.8 KB, 4 views)
MurielGB is offline   Reply With Quote
Old 10-07-2014, 07:36 AM   #8
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,978
Default

It does appear that something is fishy.

You may want to email SRA support and ask them to look into this data set. You could also alert the submitter independently.
GenoMax is offline   Reply With Quote
Old 10-07-2014, 07:39 AM   #9
MurielGB
Member
 
Location: Montpellier, France

Join Date: Oct 2013
Posts: 51
Default

OK, thanks a lot !
MurielGB is offline   Reply With Quote
Old 10-07-2014, 09:10 AM   #10
aaronh
Member
 
Location: California

Join Date: Sep 2008
Posts: 45
Default

I ran into this issue with another data set. The problem was SRA miss-parsed the fastq files. A few emails between the help desk and the original depositor resulted in SRA reformatting the files.
aaronh is offline   Reply With Quote
Reply

Tags
fastq-dump, illumina

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:20 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO