SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
TopHat -paired end vs single end reads adarshjose RNA Sequencing 10 06-12-2012 06:15 PM
Can Cuffdiff treat paired-end and single-end reads at the same time? zun RNA Sequencing 3 06-12-2012 05:37 PM
Can paired-end mapping produce more reads than single-end ? warrenemmett Bioinformatics 13 03-20-2012 11:10 PM
paired-end reads mapped to genome.. gene with only one direction of paired-end reads? danwiththeplan Bioinformatics 2 09-22-2011 02:06 AM
BOth single and paired end reads in a file!! adgen Illumina/Solexa 0 06-30-2010 10:28 AM

Reply
 
Thread Tools
Old 08-15-2011, 04:05 PM   #1
efoss
Member
 
Location: Seattle

Join Date: Jul 2011
Posts: 98
Default 50 bp paired end reads vs. 100 bp single end reads

In an RNA seq experiment in which the goal is simply to count transcripts (vs., e.g., finding evidence for translocations or novel splice variants), is it better to have 100 base pair single end reads or 50 base pair paired end reads? (100 base pair single end reads seem better to me.)

Thanks.

Eric
efoss is offline   Reply With Quote
Old 08-15-2011, 06:49 PM   #2
dcfactor
Junior Member
 
Location: Cleveland, OH

Join Date: Jan 2011
Posts: 2
Default

50 base-pair paired-end reads span a longer region of the transcript. Each read represents one end of a ~200-300 base-pair RNA fragment, compared to a 100 base-pair read which only gives you information about 100 bases. A larger fragment means you are more likely to span a splice junction, insertion, or deletion. Therefore 50 bp is preferable. Remember that with next gen sequencing technology, for the most part your read is only a "tag" that tells you where in the genome the fragment originated. As long as the "tag" is long enough to be unique (and 50 bp is for the most part) you are set.
dcfactor is offline   Reply With Quote
Old 08-15-2011, 09:14 PM   #3
sarbashis
Member
 
Location: India

Join Date: Jun 2010
Posts: 17
Default

An important application of paired read is that each pair suppose to come from same gene. So by using paired end read one can easily mark approx gene boundaries.
sarbashis is offline   Reply With Quote
Old 08-16-2011, 06:42 AM   #4
HESmith
Senior Member
 
Location: Bethesda MD

Join Date: Oct 2009
Posts: 498
Default

Hi Eric,

Is your genome well-annotated? If not, and you plan to build gene models, then PE-50bp would be preferable.
HESmith is offline   Reply With Quote
Old 08-16-2011, 07:31 AM   #5
efoss
Member
 
Location: Seattle

Join Date: Jul 2011
Posts: 98
Default

Quote:
Originally Posted by HESmith View Post
Hi Eric,

Is your genome well-annotated? If not, and you plan to build gene models, then PE-50bp would be preferable.
Yes, the genomes are well annotated. I'm working with human and yeast.

Eric
efoss is offline   Reply With Quote
Old 08-17-2011, 07:33 AM   #6
HESmith
Senior Member
 
Location: Bethesda MD

Join Date: Oct 2009
Posts: 498
Default

In that case, single end 50bp should be fine for obtaining gene counts.
HESmith is offline   Reply With Quote
Old 08-17-2011, 10:35 AM   #7
westerman
Rick Westerman
 
Location: Purdue University, Indiana, USA

Join Date: Jun 2008
Posts: 1,104
Default

Also, depending on the software you use, the RNA-seq module may not be able to use PE reads. At least without some extra work. Yes, CASAVA, I'm looking at you!

I'll agree that for well annotated genomes 50 bp SE should be satisfactory.
westerman is offline   Reply With Quote
Old 08-17-2011, 12:11 PM   #8
dreesbl
Registered Vendor
 
Location: Seattle, WA

Join Date: Nov 2009
Posts: 6
Default

If you're sequencing an equal number of base pairs, I vote for paired end reads. I agree with dcfactor that there won't be much difference between estimates from 50 bp and 100 bp read data if you're counting the alignment hits per gene. You get more data for your sequencing buck because each aligned pair gives you information on not only the sequences covered by the reads but the region between them as well.

50 bp of sequence in a mate pair can be more useful for read mapping than an extra 50 bp in the read itself. If you don't find what you're looking for the gene-level expression patterns, having paired end data leaves more avenues open for other analysis.

[email protected]
www.spiralgenetics.com
dreesbl is offline   Reply With Quote
Old 08-17-2011, 12:17 PM   #9
efoss
Member
 
Location: Seattle

Join Date: Jul 2011
Posts: 98
Default

Thanks very much everyone. I'll be going with 50 bp paired reads.

Eric
efoss is offline   Reply With Quote
Old 08-18-2011, 01:59 AM   #10
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 992
Default

If you want to look for differential expression, sequencing depth is usually the most limiting factor. Hence, I would go for 50bp single-end and invest the money you saved into sequencing another lane (ideally, with a biological replicate).

Longer reads are useful to see where splice sites are located, but for humans, we already know that quite well, and for yeast, there is hardly any splicing, anyway. Long reads don't help much for mapping, because the transcribed part of the genome is usually not that repetitive, and 50 bp is usually long enough to even distinguish most orthologs.

Paired-end reads may or may not help to distinguish isoforms of the same gene, but, at least for yeast, this is unimportant, of course.
Simon Anders is offline   Reply With Quote
Old 08-24-2011, 08:51 AM   #11
HESmith
Senior Member
 
Location: Bethesda MD

Join Date: Oct 2009
Posts: 498
Default

I'll second Simon's advice. You'll get very little additional information by paired-end sequencing (since, in essence, you're just counting each gene twice). Biological replicates provide much more information. In fact, I'd recommend triplicates at a minimum to provide statistical power to your analysis. If you index the samples, they can be sequenced in the same lane; the only additional cost would be preparing separate libraries (~$50 each for Illumina). Three biological replicates of 20 million reads each are much better than a single sample of 60 million.
HESmith is offline   Reply With Quote
Old 01-15-2014, 08:03 PM   #12
alyamahmoud
Member
 
Location: Egypt, Saudi Arabia

Join Date: Nov 2013
Posts: 29
Default

@Simon and HESmith

does your recommendation apply to poorly annotated genomes as well ?
alyamahmoud is offline   Reply With Quote
Old 01-15-2014, 08:05 PM   #13
alyamahmoud
Member
 
Location: Egypt, Saudi Arabia

Join Date: Nov 2013
Posts: 29
Default

I am also very confused to why 50bp SE give higher alignment rate than 100bp PE ? I have done the experiment myself using 100 bp PE and compared to trimmed 50 bp SE.
Thank you
alyamahmoud is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 08:10 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO