Seqanswers Leaderboard Ad

**GenoMax** · 05-11-2015, 03:15 AM

Fastq-dump appears to be rejecting reads because of this

"Rejected 117005 SPOTS because SPOTLEN < 1".

These reads appear to have no sequence.

You can confirm this yourself by doing

Code:

$ fastq-dump -M 0 -F SRR2003880

You can download the original HDF5 files for this record (using the "Download" tab) and verify if there are many 0 length sequences. You will need access to SMRTportal to properly process the raw data files.

**Retro** · 05-11-2015, 03:32 AM

Thanks. But those reads show up in the NCBI website as not empty.

**GenoMax** · 05-11-2015, 03:46 AM

It is possible that the download from SRA is corrupt. Best recourse there is to wait to hear back from SRA support. They generally fix these files based on my experience.

In the mean time, hdf5 files from the download tab is the original data from the submitter. It does not appear to contain the metadata.xml file that is required by SMRTportal so you may not be able to use the original files right away.

**GenoMax** · 05-11-2015, 03:53 AM

ENA record appears to have the same number of spots: ftp://ftp.sra.ebi.ac.uk/vol1/fastq/S...03880.fastq.gz

**Retro** · 05-11-2015, 06:21 AM

We downloaded the ENA fatsq file. It is exactly what we get as result of the SRA toolkit. So probably only 46K sequences are usable. What is still unclear is why the NCBI archive website shows the "zero" reads as sequences, e.g. SRA|SRR2003880.1

**rwan** · 06-01-2015, 11:06 PM

Dear all,

Not sure if you have resolved your problem, but I had a similar problem with PacBio reads, but from a different data set. After reading this thread, I asked NCBI's Helpdesk and they explained to me that PacBio data is special in that multiple reads with a lot of errors are used to form consensus reads. It is these consensus reads that are output with no options to fastq-dump:

Code:

fastq-dump SRR2003880

If the raw reads are required, you need to supply the --table SEQUENCE option. i.e.,

Code:

fastq-dump --table SEQUENCE SRR2003880

I hope this helps someone!

Ray

**ynwh** · 12-04-2015, 06:56 AM

That is really helpful. Thank you, Ray.

In my case SRR1284074, if use
#fastq-dump SRR1284074
Rejected 163480 SPOTS because SPOTLEN < 1
Read 163482 spots for SRR1284074
Written 2 spots for SRR1284074

Use "--table SEQUENCE" to dump SRR1284074, I still got 3 spots rejected.
#fastq-dump --table SEQUENCE SRR1284074
Rejected 3 SPOTS because SPOTLEN < 1
Read 163482 spots for SRR1284074
Written 163479 spots for SRR1284074

Any more suggestions or comments on this issue are very welcome.

Originally posted by rwan View Post

Dear all,

Not sure if you have resolved your problem, but I had a similar problem with PacBio reads, but from a different data set. After reading this thread, I asked NCBI's Helpdesk and they explained to me that PacBio data is special in that multiple reads with a lot of errors are used to form consensus reads. It is these consensus reads that are output with no options to fastq-dump:

Code:

fastq-dump SRR2003880

If the raw reads are required, you need to supply the --table SEQUENCE option. i.e.,

Code:

fastq-dump --table SEQUENCE SRR2003880

I hope this helps someone!

Ray

Topics	Statistics	Last Post
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Yesterday, 08:47 AM	0 responses 13 views 0 likes	Last Post by seqadmin Yesterday, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 54 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM

Seqanswers Leaderboard Ad

Announcement

PacBio data - problem with SRA toolkit

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News