SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > Illumina/Solexa



Similar Threads
Thread Thread Starter Forum Replies Last Post
fastq-dump on SRA files harlock0083 Bioinformatics 14 10-18-2018 03:19 AM
Illumina Paired End FASTQ kjsalimian Bioinformatics 2 01-05-2012 12:19 PM
about SRA paired datasets syslm01 RNA Sequencing 21 10-19-2011 10:59 AM
How convert multiple .sra files into .fastq in one go? TuA Bioinformatics 5 05-27-2011 08:32 AM
Keep large paired-end Fastq datasets in sync sklages Bioinformatics 4 03-17-2011 02:28 AM

Reply
 
Thread Tools
Old 10-08-2010, 12:37 PM   #1
Bio.X2Y
Member
 
Location: Europe

Join Date: Apr 2010
Posts: 46
Default Why are Illumina paired-end SRA datasets made up of 3 FASTQ files?

I'm looking at some NCBI SRA datasets for Paired-End Illumina Rna-seq.

In each case, the dataset is made up of 3 fastq files, even though I would only expect 2 (one for each end).

Example:

SRR018256.fastq (2,048,908 lines)
SRR018256_1.fastq (50,313,152 lines)
SRR018256_2.fastq (50,313,152 lines)

All files look OK, and the _1 and _2 files have the same number of lines, as I would expect.

Does anyone have any idea what the third file might be?

Thanks.
Bio.X2Y is offline   Reply With Quote
Old 10-08-2010, 03:23 PM   #2
Pepe
Member
 
Location: Germany

Join Date: Mar 2009
Posts: 28
Default

What i do with my paired end reads is to filter out the ones that have adapters or bad quality. Then I take the pairs of the removed ones and I put them in a separate file, so the 2 paired end files have the same number of reads and in the same order but I can still use the 'pairless' reads in the analysis.
Maybe they did the same?
Pepe is offline   Reply With Quote
Old 10-12-2010, 07:22 AM   #3
Bio.X2Y
Member
 
Location: Europe

Join Date: Apr 2010
Posts: 46
Default

Thanks Pepe, that makes sense.

Does anyone else have other possible explanations? Cheers
Bio.X2Y is offline   Reply With Quote
Old 10-12-2010, 09:27 AM   #4
Chipper
Senior Member
 
Location: Sweden

Join Date: Mar 2008
Posts: 324
Default

Unpaired reads.
Chipper is offline   Reply With Quote
Old 10-12-2010, 12:20 PM   #5
Bio.X2Y
Member
 
Location: Europe

Join Date: Apr 2010
Posts: 46
Default

Hi, is it possible to get unpaired reads from a paired-end experiment? I'm not very familiar with the procedure.
Bio.X2Y is offline   Reply With Quote
Old 10-12-2010, 01:12 PM   #6
GW_OK
Senior Member
 
Location: Oklahoma

Join Date: Sep 2009
Posts: 411
Default

I should think so. it is possible that something went wrong with one read or the other, leaving a lonely, unpaired read.
GW_OK is offline   Reply With Quote
Old 10-12-2010, 11:28 PM   #7
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 871
Default

I'm not sure the Illumina pipeline can create unpaired reads. The basis for the sequencing is an initial identification of regions followed by tracking those regions to determine sequence. When you do a paired end read there is no separate cluster detection in the second read, meaning that you use exactly the same regions as the first read.

For the output from the pipeline you get only two sequence files, one for each read, which always contain the same number of sequences and always come in the same order so you can match up pairs of sequences. If stuff goes wrong you'll just end up with a bunch of sequences full of poly-N.

If the file is for unpaired sequences then it must have been something which the researchers created from the original data, as the pipeline itself won't create this.

Could it be a trial run before the main sequencing run? We do this routinely with our libraries - doing 10% of a lane with them to see if they look OK before going on to do a full run.
simonandrews is offline   Reply With Quote
Old 10-16-2010, 12:42 PM   #8
Bio.X2Y
Member
 
Location: Europe

Join Date: Apr 2010
Posts: 46
Default

Hi Simon, thanks for this. Out of interest, when you say you do 10% of a lane, how is this done? I'm not very familiar with the sequencing procedure itself, but I imagined it was an all-or-nothing, and you couldn't back out after 10%. Do you mean you are watching some results in real time (i.e. nothing to do with the GAPipeline), and making a decision to abandon if necessary after 10%? If you do abandon, does this mean the flowcell is effectively wasted? If that's what happened here (in the decribed experiment), would the authors have needed to run image analysis, etc. on partially complete reads? Wouldn't that mean that (a) they wouldn't have full length reads, e.g. they would only have 5 bases per read out of 50 potential bases and (b) they would still be paired? Thanks for your help!
Bio.X2Y is offline   Reply With Quote
Old 10-17-2010, 11:58 PM   #9
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 871
Default

Quote:
Originally Posted by Bio.X2Y View Post
Hi Simon, thanks for this. Out of interest, when you say you do 10% of a lane, how is this done? I'm not very familiar with the sequencing procedure itself, but I imagined it was an all-or-nothing, and you couldn't back out after 10%.
We still run a control lane on each flowcell because of the nature of many of our libraries. What we can therefore do is to mix in 10% of another sample alongside the PhiX and then extract out everything which doesn't map to PhiX at the end of the run to get a small scale view of the other library.
simonandrews is offline   Reply With Quote
Old 12-21-2010, 11:36 AM   #10
spadejac
Junior Member
 
Location: Newark, Delaware

Join Date: Sep 2009
Posts: 4
Default

No other explanations. Here is NCBI documentation about it:

SRR000001.fastq – Fragment library data, or unpaired mates from a paired library.
SRR000001_1.fastq – First mate sequence.
SRR000001_2.fastq – Second mate sequence in the submitted orientation.
spadejac is offline   Reply With Quote
Reply

Tags
fastq, illumina, sra

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:32 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO