Seqanswers Leaderboard Ad

**rmdavies** · 12-09-2011, 06:21 AM

This isn't normal. srf2fastq is telling you that it found something unexpected in the SRF file, because it is either corrupt or it has some sort of junk on the end.

It is possible that srf2fastq managed to dump out all of the reads from the SRF file, but I wouldn't like to guarantee it in this case. Where did you get the SRF file from?

**mhayes** · 12-09-2011, 09:40 AM

The SRF files are from the European Genome-phenome Archive. These datasets are not publicly available, however.

For one of the SRF files that caused this error, I dumped it to a text file and compared the read ID at the end of that file with the read ID at the end of the corresponding FASTQ file. These read IDs matched, so the FASTQ file in *that* instance was being completely generated, despite the error.

When I mapped the reads to the reference genome, the coverage was very low. So I'm not sure if it is because of the dataset or because all of the reads are not being generated.

**rmdavies** · 12-12-2011, 04:49 AM

Originally posted by mhayes View Post

The SRF files are from the European Genome-phenome Archive. These datasets are not publicly available, however.

That's a pity, it makes diagnosing the problem much more difficult. Have you tries getting in touch with the EGA? They could try converting the file at their end to see if they get the same error.

For one of the SRF files that caused this error, I dumped it to a text file and compared the read ID at the end of that file with the read ID at the end of the corresponding FASTQ file. These read IDs matched, so the FASTQ file in *that* instance was being completely generated, despite the error.

When I mapped the reads to the reference genome, the coverage was very low. So I'm not sure if it is because of the dataset or because all of the reads are not being generated.

srf2fastq goes through the file sequentially, so if you have the last entry in the SRF file then you should have everything. You could also try using srf_list -l to see where the last entry is. For example:

Code:

srf_list -l 2010_1.srf | tail
IL3_2010:1:100:1793:1774       6917868443 +  570 +  9625
IL3_2010:1:100:1793:1939       6917869029 +  576 +  9625
IL3_2010:1:100:1793:1988       6917869621 +  576 +  9625
IL3_2010:1:100:1793:2011       6917870213 +  573 +  9625
IL3_2010:1:100:1793:1851       6917870802 +  575 +  9625
IL3_2010:1:100:1793:1104       6917871393 +  582 +  9625
IL3_2010:1:100:1793:122        6917871991 +  577 +  9625
IL3_2010:1:100:1793:1577       6917872583 +  578 +  9625
IL3_2010:1:100:1793:1331       6917873177 +  578 +  9625
IL3_2010:1:100:1793:1827       6917873771 +  580 +  9625

The first number is the file position of the read, which should be near the end of the file for the last one. In this case the file is 7040277611 bytes long. The small difference is due to the name index on the end of the file. If you get a number that is much lower than the file length then you may well be missing some data. You can also try counting the number of files this way and see if it matches what you get in the fastq output.

**mhayes** · 12-12-2011, 11:54 AM

Thank you, rmdavies. The srf_list program also returns an "Unknown block type" error, and for all reads after this error, the file position is listed as "-1".

However, all of the reads seem to be written to the fastq file, so there is no truncation. Regarding SRF to FASTQ conversion, I will assume that the error can be ignored for now, but I will contact the data providers just to be sure.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 39 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 41 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 35 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 55 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Staden package question: is srf2fastq returning an error, or is this normal?

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News