SEQanswers

Go Back   SEQanswers > Applications Forums > RNA Sequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
Finding splice variants in de novo transcriptome data tboothby RNA Sequencing 1 08-01-2011 10:13 AM
Optimal read lenght for RNASeq cjohnson RNA Sequencing 6 12-09-2010 08:34 PM
read length distributions? greigite Illumina/Solexa 1 12-01-2009 07:42 AM
PubMed: Comparison of methods for quantification of subtle splice variants. Newsbot! Literature Watch 0 10-29-2009 06:20 AM
BWA Read Length AnamikaDarwin Bioinformatics 1 04-10-2009 11:47 PM

Reply
 
Thread Tools
Old 10-04-2011, 04:30 AM   #1
Joke van Vugt
Junior Member
 
Location: The Netherlands

Join Date: Oct 2011
Posts: 2
Default What is the optimal read length to quantify splice variants: 50, 76 or 100 bp?

Does anyone know what the best read length is to quantify splice variants from RNA seq data using an Illumina HiSeq. The reference genome has been sequenced so assembly is not too much of a problem.
On one hand the longest possible read length will increase identification of splice variants. However, with a shorter read length, more fragments can be sequenced (for a similar price), which increases quantification.
Is there such a thing as an optimal read length in this case?

Thanks for any input!
Joke van Vugt is offline   Reply With Quote
Old 10-04-2011, 06:59 AM   #2
HESmith
Senior Member
 
Location: Bethesda MD

Join Date: Oct 2009
Posts: 509
Default

The longer the read, the more likely it is to span a splice junction. Also, the marginal cost-per-base of longer reads is less (i.e., 2X reads @ 50bp is more expensive than 1X @ 100bp), and you'd have to sequence more than twice the number of shorter reads to obtain the same number of mappable junctions. So, the longer reads are cheaper as well.
HESmith is offline   Reply With Quote
Old 10-04-2011, 08:02 AM   #3
westerman
Rick Westerman
 
Location: Purdue University, Indiana, USA

Join Date: Jun 2008
Posts: 1,104
Default

[QUOTE=Joke van Vugt;52959
Is there such a thing as an optimal read length in this case?
[/QUOTE]

Probably. I do not have any data but rather just a rough guess.

An equally spliced read would have 1/2 of the bases on one side of the splice and 1/2 of the bases on the other. Both of these segments need to be mapped to the reference. Thus a 50-base read would be mapping two 25-base partial-reads. Offset where the splicing occurs and you could be trying to map even fewer bases. And then there are sequencing errors and/or SNVs versus your reference. While some rescuing might occur (e.g., we know that both segments must map on the same chromosome, both should be within reasonable distance of each other, depth of coverage can take care of a lot of mismatches, splicing characteristics can be taken into account, etc.) I am simply not fond of mapping 25-mers much less 20-mers. So for me a 50-bp read is not good for splicing variants.

76-base reads would have 33-bases on either side. Even considering offsets and sequencing errors, the worst partial-read being mapped is around 28-mers. That is much more comfortable.

100-bases is, of course, even better but if you are concerned about cost then don't use them.
westerman is offline   Reply With Quote
Old 10-04-2011, 10:48 PM   #4
Joke van Vugt
Junior Member
 
Location: The Netherlands

Join Date: Oct 2011
Posts: 2
Default

Quote:
Originally Posted by HESmith View Post
The longer the read, the more likely it is to span a splice junction. Also, the marginal cost-per-base of longer reads is less (i.e., 2X reads @ 50bp is more expensive than 1X @ 100bp), and you'd have to sequence more than twice the number of shorter reads to obtain the same number of mappable junctions. So, the longer reads are cheaper as well.
This is very useful! Thanks!
Joke van Vugt is offline   Reply With Quote
Old 10-06-2011, 05:12 AM   #5
tonybolger
Senior Member
 
Location: berlin

Join Date: Feb 2010
Posts: 156
Default

Quote:
Originally Posted by Joke van Vugt View Post
Does anyone know what the best read length is to quantify splice variants from RNA seq data using an Illumina HiSeq. The reference genome has been sequenced so assembly is not too much of a problem.
On one hand the longest possible read length will increase identification of splice variants. However, with a shorter read length, more fragments can be sequenced (for a similar price), which increases quantification.
Is there such a thing as an optimal read length in this case?
If you're aiming to identify novel variants, longer is clearly better. This would be doubly-true with de-novo, of course.

For quantifying known variants, it's a bit more complex, at least in theory. You want the maximum number of reads which hit only one variant - longer reads are moderately more likely to cover an alternative splicing point, but once you have enough to confirm the splice, the rest is a waste. But all reads which fail to identify a specific splicing variant are effectively a waste.

I guess the priority should be on thus on total bases, and most likely with current pricing, that means 100bp reads. I don't think you can apply paired reads easily, but if you can do it for less than 2x the cost, i probably would consider it.
tonybolger is offline   Reply With Quote
Old 10-12-2011, 04:19 AM   #6
tonybolger
Senior Member
 
Location: berlin

Join Date: Feb 2010
Posts: 156
Default

To follow up on this, http://www.biomedcentral.com/1471-2105/12/323 discusses the tradeoff of read length vs pairing vs more reads.
tonybolger is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:28 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO