SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Gap5 - new release (staden-2.0.0b6) jkbonfield Bioinformatics 3 08-13-2014 05:04 AM
CuffDiff error question, please help! lewewoo RNA Sequencing 2 05-03-2011 06:36 AM
Staden install on Mac OSX 10.6 andreitudor Bioinformatics 0 04-07-2011 06:29 AM
a question about the use of R package-GenomicFeatures masylichu Bioinformatics 5 12-08-2010 05:26 PM
Error -- SPP package, MSER -- could you please help abmmki Introductions 0 11-15-2009 03:17 AM

Reply
 
Thread Tools
Old 12-08-2011, 07:40 PM   #1
mhayes
Member
 
Location: Cleveland, OH

Join Date: Aug 2011
Posts: 11
Default Staden package question: is srf2fastq returning an error, or is this normal?

I'm attempting to convert an SRF file into FASTQ format using srf2fastq, which is a tool included in the Staden package.

I'm executing the following command:

srf2fastq -s BC3 -a -n -c in.srf

This should split the resulting FASTQ into two files since it's paired read data.

When the program finishes, I get the following message:

'Block of unknown type '3'. Aborting'

The FASTQ files are created, but I don't know if they are truncated because of this "error".

Is the above error normal?
mhayes is offline   Reply With Quote
Old 12-09-2011, 05:21 AM   #2
rmdavies
Member
 
Location: Great Britain

Join Date: Dec 2009
Posts: 13
Default

This isn't normal. srf2fastq is telling you that it found something unexpected in the SRF file, because it is either corrupt or it has some sort of junk on the end.

It is possible that srf2fastq managed to dump out all of the reads from the SRF file, but I wouldn't like to guarantee it in this case. Where did you get the SRF file from?
rmdavies is offline   Reply With Quote
Old 12-09-2011, 08:40 AM   #3
mhayes
Member
 
Location: Cleveland, OH

Join Date: Aug 2011
Posts: 11
Default

The SRF files are from the European Genome-phenome Archive. These datasets are not publicly available, however.

For one of the SRF files that caused this error, I dumped it to a text file and compared the read ID at the end of that file with the read ID at the end of the corresponding FASTQ file. These read IDs matched, so the FASTQ file in *that* instance was being completely generated, despite the error.

When I mapped the reads to the reference genome, the coverage was very low. So I'm not sure if it is because of the dataset or because all of the reads are not being generated.
mhayes is offline   Reply With Quote
Old 12-12-2011, 03:49 AM   #4
rmdavies
Member
 
Location: Great Britain

Join Date: Dec 2009
Posts: 13
Default

Quote:
Originally Posted by mhayes View Post
The SRF files are from the European Genome-phenome Archive. These datasets are not publicly available, however.
That's a pity, it makes diagnosing the problem much more difficult. Have you tries getting in touch with the EGA? They could try converting the file at their end to see if they get the same error.

Quote:
For one of the SRF files that caused this error, I dumped it to a text file and compared the read ID at the end of that file with the read ID at the end of the corresponding FASTQ file. These read IDs matched, so the FASTQ file in *that* instance was being completely generated, despite the error.

When I mapped the reads to the reference genome, the coverage was very low. So I'm not sure if it is because of the dataset or because all of the reads are not being generated.
srf2fastq goes through the file sequentially, so if you have the last entry in the SRF file then you should have everything. You could also try using srf_list -l to see where the last entry is. For example:

Code:
srf_list -l 2010_1.srf | tail
IL3_2010:1:100:1793:1774       6917868443 +  570 +  9625
IL3_2010:1:100:1793:1939       6917869029 +  576 +  9625
IL3_2010:1:100:1793:1988       6917869621 +  576 +  9625
IL3_2010:1:100:1793:2011       6917870213 +  573 +  9625
IL3_2010:1:100:1793:1851       6917870802 +  575 +  9625
IL3_2010:1:100:1793:1104       6917871393 +  582 +  9625
IL3_2010:1:100:1793:122        6917871991 +  577 +  9625
IL3_2010:1:100:1793:1577       6917872583 +  578 +  9625
IL3_2010:1:100:1793:1331       6917873177 +  578 +  9625
IL3_2010:1:100:1793:1827       6917873771 +  580 +  9625
The first number is the file position of the read, which should be near the end of the file for the last one. In this case the file is 7040277611 bytes long. The small difference is due to the name index on the end of the file. If you get a number that is much lower than the file length then you may well be missing some data. You can also try counting the number of files this way and see if it matches what you get in the fastq output.
rmdavies is offline   Reply With Quote
Old 12-12-2011, 10:54 AM   #5
mhayes
Member
 
Location: Cleveland, OH

Join Date: Aug 2011
Posts: 11
Default

Thank you, rmdavies. The srf_list program also returns an "Unknown block type" error, and for all reads after this error, the file position is listed as "-1".

However, all of the reads seem to be written to the fastq file, so there is no truncation. Regarding SRF to FASTQ conversion, I will assume that the error can be ignored for now, but I will contact the data providers just to be sure.
mhayes is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 09:33 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO