SEQanswers

Go Back   SEQanswers > Applications Forums > RNA Sequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
Nextera insert sizes larger than expected pjuneja Sample Prep / Library Generation 48 06-15-2016 09:07 AM
What insert size for velveth (1.2.10) with 2 sets of reads with diff. insert sizes? Genomics101 Bioinformatics 4 02-07-2014 12:41 PM
bwa - sampe - large insert sizes and slow Elsie Bioinformatics 16 10-31-2013 03:49 PM
Generating larger insert sizes (>300bp) using TruSeq RNA protocol JChase Illumina/Solexa 4 05-16-2013 05:36 AM
insert sizes nozzer Bioinformatics 1 07-09-2010 06:49 AM

Reply
 
Thread Tools
Old 08-10-2014, 02:25 PM   #1
luc
Senior Member
 
Location: US

Join Date: Dec 2010
Posts: 429
Default insert sizes for RNA-seq

Has anybody come across studies that have looked into the optimal insert sizes for RNA-seq libraries?

Would you have recommendations? I assume the optimal size ranges might change with library prep protocols. I am especially interested in recommendations for protocols using RNA fragmentation, random-hexamer-primed 1st strand synthesis, and dUTP incorporation for strand specificity.

Thanks a lot in advance!
luc is offline   Reply With Quote
Old 08-11-2014, 11:31 AM   #2
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

The optimal insert size depends on various factors...

1) Read length and sequencing platform
2) Gene and exon length distribution in the target organism
3) Use of data - assembly vs quantification

I don't think you can derive a useful number without specifying these things. I like long insert sizes, particularly in organisms with differential splicing, as they are more informative about the source isoform. But it's really experiment-specific.
Brian Bushnell is offline   Reply With Quote
Old 08-11-2014, 01:49 PM   #3
luc
Senior Member
 
Location: US

Join Date: Dec 2010
Posts: 429
Default

Thanks, Brian.

Yes, I should have specified that. I was thinking about Illumina HiSeq systems and transcript quantification as the purpose (e.g. usually single -end 50 bp reads).

I imagine random-priming will cause some bias against smaller fragments. Illumina flowcell clustering on the other hand is more efficient for smaller fragments. The chemical fragmentation is very likely approximately random; nevertheless there is likely some bias as to which transcripts of specific size ranges (lets say about 400 bp transcripts compared to 3kb transcripts) show up as fragments of a specific size range (e.g. 150 bp inserts or 300 bp inserts)?
Very likely it would be best to look at some ERCC spike-in data.

Last edited by luc; 08-11-2014 at 03:57 PM.
luc is offline   Reply With Quote
Old 08-11-2014, 05:01 PM   #4
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

With 50bp single-end reads, there is no reason to shoot for a long insert size, and for quantification, short inserts will be less biased anyway. I don't know what kind of biases are introduced by the different fragmentation methods, though I understand that "random hexamer priming" is actually pretty non-random, so it seems like something to avoid for accurate quantification of small transcripts.

Also, the shorter your insert sizes, the less genetic material or amplification you will need. So it seems like you should go as short as possible; maybe 100bp.

Last edited by Brian Bushnell; 08-11-2014 at 05:04 PM.
Brian Bushnell is offline   Reply With Quote
Old 08-13-2014, 05:17 AM   #5
turnersd
Senior Member
 
Location: Charlottesville, VA

Join Date: May 2011
Posts: 112
Default

Don't mean to side-track this discussion too much, but I'm noticing I have very poor coverage of a relatively small transcript (1200bp) after rRNA reduction and 2x100 sequencing, need to check on insert size. What are some of the upstream library prep steps that have been discussed here that could result in this poor coverage? That is, could you help me understand why random hexamer priming biases against coverage of small transcripts? How does the insert size affect this small-transcript coverage?

Thanks.
turnersd is offline   Reply With Quote
Old 08-13-2014, 09:48 AM   #6
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

If the random hexamers are not completely random (in terms of their concentration or binding affinity), then transcripts rich in the more concentrated/better-binding hexamers will be overrepresented and those poor in them will be underrepresented. The shorter a transcript is, down to a limit of 6bp, the more highly skewed the abundance distribution of its hexamers is likely to be. 1200bp is probably fairly long for that to play a major role.

Also, the longer the insert relative to the transcript, the fewer available start/stop positions there are. Considering a 600bp transcript, there's no longer any place an 800bp insert fragment can originate. But assuming you kept 600bp and smaller fragments, the majority of fragments from that transcript would be expected to be the whole unsheared transcript, starting at one end and ending at the other with no coverage in the middle (since only the 2 outermost 100bp sections would be sequenced).
Brian Bushnell is offline   Reply With Quote
Old 08-13-2014, 11:07 AM   #7
turnersd
Senior Member
 
Location: Charlottesville, VA

Join Date: May 2011
Posts: 112
Default

Thanks for the helpful explanation, Brian.
turnersd is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 03:19 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO