SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   RNA Sequencing (http://seqanswers.com/forums/forumdisplay.php?f=26)
-   -   3' Bias in RNA-Seq (http://seqanswers.com/forums/showthread.php?t=9839)

JohnK 03-03-2011 08:39 AM

3' Bias in RNA-Seq
 
Would anyone happen to know what the specific factor in cDNA fragmentation is that causes the 3' bias? It seems I've read a few papers that mention the bias, but don't go into explaining why? Thank you!

J

ECO 03-03-2011 08:52 AM

Moving to RNA-seq.

irit 03-03-2011 09:29 AM

If you are doing PolyA enrichment that would be the source of 3' bias. I don't know if there should be bias if not doing this (say, ribodepletion or dsn from total RNA).

JohnK 03-03-2011 09:38 AM

Quote:

Originally Posted by irit (Post 36315)
If you are doing PolyA enrichment that would be the source of 3' bias. I don't know if there should be bias if not doing this (say, ribodepletion or dsn from total RNA).

Hi, irit. No doubt that polyA enrichment sounds like a perfect culprit. What are your thoughts on cDNA fragmentation though? From "RNA-Seq: a revolutionary tool for transcriptomics":

"Conversely, cDNA fragmentation is usually strongly biased towards the identification of sequences from the 3' ends of transcripts, and thereby provides valuable information about the precise identity of these ends."

VanessaS 03-05-2011 06:13 AM

In our protocol, the cDNA isn't fragmented, the RNA is.

Seqasaurus 03-05-2011 10:07 AM

I'd have thought polyA selection would have been the main source of 3' bias. Maybe also fragmentation of cDNA. On 454 at least, cDNAs of a certain size range don't nebulize well. This would bias reads of those fragments towards the transcript ends. If I remember correctly, this is no longer a problem with newer more rapid methods for non-normalized cDNA sequencing. For full-length, normalized cDNA sequencing (on 454), coligation (before nebulization) probably helps reduce the bias. Not sure if fragmentation produces bias on illumina as I'm not familiar with the library prep. I do know that we also have our cDNA coligated before nebulization for subsequent RNAseq on HiSeq2000.

someone correct me if I'm wrong.

JohnK 03-05-2011 12:34 PM

Quote:

Originally Posted by VanessaS (Post 36459)
In our protocol, the cDNA isn't fragmented, the RNA is.

Hi, Vanessa. So, would you say it's definitely a result of the poly(A) selection?? Thanks!

VanessaS 03-06-2011 09:45 PM

Quote:

Originally Posted by JohnK (Post 36468)
Hi, Vanessa. So, would you say it's definitely a result of the poly(A) selection?? Thanks!

I'm not qualified to say anything is definite, other than its not from shearing the cDNA. Maybe someone can comment on the kinds of biases, if any, introduced from enzymatic shearing of the RNA? We use RNAse III. I thought it was non-specific so no bias?

JohnK 03-07-2011 09:32 AM

Quote:

Originally Posted by VanessaS (Post 36499)
I'm not qualified to say anything is definite, other than its not from shearing the cDNA. Maybe someone can comment on the kinds of biases, if any, introduced from enzymatic shearing of the RNA? We use RNAse III. I thought it was non-specific so no bias?

Hey, Vanessa. Not so much the fragmentation of the RNA, but the poly(A) purification step, which you might expect to definitely generate a 3' bias as someone stated above... What do you think of that?

steven 03-11-2011 09:37 AM

Consider RNA stability issues too. Combined with poly(A) selection, this can result in a dramatic enrichment of terminal fragments.

pbluescript 03-11-2011 09:50 AM

Another contributor (depending on how you isolate and prepare your RNA) would be priming with oligo dT for the reverse transcription.

NextGenSeq 03-11-2011 11:39 AM

Also, the 5' cap on mRNA stabilizes the 5' end of RNA over the 3' end.

roryk 04-15-2011 05:16 AM

I was wondering if other people were also noticing a 3' bias in their Illumina prepped samples for RNA-Seq using the 8-sample bead-based poly-A selection kit. For shorter transcripts (< 6 kb or so) I do not notice any 3' bias but for longer transcripts there is definitely a fairly severe bias towards the 3' end of the transcript. I also see a peak at the 5' end of the longer transcripts as well which makes me think it is not due to degradation-- is that a reasonable thought? My total RNA looked great as assayed on a Bioanalyzer, but I'm not sure if that is true of the mRNA step, I wasn't sure how to check that. I visualized the size of the fragmented RNA by converting to cDNA and running on a gel and saw a fairly broad smear, so at least I know the entire mRNA library was not degraded. I've done hundreds of total RNA preps without RNAse contamination, it's hard for me to imagine that somehow I am introducing RNAse during the poly-A selection but this 3' bias sure does look like that is exactly what has happened.

Could the 3' bias in longer transcripts, as people have suggested above in the thread, simply be a byproduct of the poly-A selection? There are a couple of questionable vortexing steps in the Illumina protocol; I'm not worried that vortexing alone would shear the RNA since I do it as a standard part of my total RNA prep, but they do have you vortex the RNA-bound beads briefly. Could that be shearing the RNA? The flopping ends of the RNA banging into those beads, would that shear the longer transcripts? If so, why do I see a peak at the 5' end too? The 5' peak is about half the size of the 3' peak.

I visualized the 3'-5' bias using Simon Andrews' excellent SeqMonk visualizer, using view->probe trend plot on a probe list of mRNA. The 3' bias is there if I look at a probe list of the CDS as well. It is not there if I cut the annotations up into single exons and run. It is also much, much less pronounced (think about 10% difference) if I look at single exons which are > 5kb and there is no spike at both ends of the probe list. Am I maybe not understanding what the probe-trend plot is showing me?

simonandrews 04-15-2011 06:34 AM

Quote:

Originally Posted by roryk (Post 39583)
I visualized the 3'-5' bias using Simon Andrews' excellent SeqMonk visualizer, using view->probe trend plot on a probe list of mRNA. The 3' bias is there if I look at a probe list of the CDS as well. It is not there if I cut the annotations up into single exons and run. It is also much, much less pronounced (think about 10% difference) if I look at single exons which are > 5kb and there is no spike at both ends of the probe list. Am I maybe not understanding what the probe-trend plot is showing me?

Rory - glad to hear you're liking SeqMonk!

If you've put probes over mRNA features and then done a trend plot then the peak you see at the end might not be due to true 3' bias.

Different transcripts will have exons at different places along their length. Therefore the trend plot for any individual transcript will go up and down as you pass in and out of an exon. If you average over all transcripts then you'll see the combined signals from all of the transcripts doing this which will even itself out for the most part - however the only places you're guaranteed to be in an exon are at the beginning and end of each transcript, so a trend plot over all transcripts will probably show a peak at each end because of the higher probability of being in an exon. Since 3' exons are generally larger than 5' exons you'll probably also see a bigger peak at the 3' end.

What you'd need for a true view of the trend over a spliced transcript would be to concatenate the exons for each transcript together and do a trend plot over those - missing out the introns. This could actually be a good addition to the program so I'll look at adding that in the a future release.

This same problem wouldn't apply to trend plots over exons where you would expect the signal from the reads to be continuous.

roryk 04-15-2011 07:34 AM

Quote:

Originally Posted by simonandrews (Post 39587)
Rory - glad to hear you're liking SeqMonk!

If you've put probes over mRNA features and then done a trend plot then the peak you see at the end might not be due to true 3' bias.

Different transcripts will have exons at different places along their length. Therefore the trend plot for any individual transcript will go up and down as you pass in and out of an exon. If you average over all transcripts then you'll see the combined signals from all of the transcripts doing this which will even itself out for the most part - however the only places you're guaranteed to be in an exon are at the beginning and end of each transcript, so a trend plot over all transcripts will probably show a peak at each end because of the higher probability of being in an exon. Since 3' exons are generally larger than 5' exons you'll probably also see a bigger peak at the 3' end.

What you'd need for a true view of the trend over a spliced transcript would be to concatenate the exons for each transcript together and do a trend plot over those - missing out the introns. This could actually be a good addition to the program so I'll look at adding that in the a future release.

This same problem wouldn't apply to trend plots over exons where you would expect the signal from the reads to be continuous.

Ahh-- that makes complete sense. I had it in my head that when I was putting probes over the mRNA features I was putting them on an introns spliced out stitched-together version of the mRNA, thank you so much for the clarification.

Thanks again, Simon. I say again because you have answered about 15 other questions I have had while looking at my data when you were answering forum posts of other people!

roryk 04-15-2011 08:33 AM

Even looking at single exons which are very large (6kb), I can see there is a bit of 3' bias. Is this something to be concerned about for downstream quantitation? I have seen several papers where they look at coverage across entire transcripts and it appears to be mostly-uniform-- not so here. I attached an image of the probe trend plot for all exons > 6kb.

http://dl.dropbox.com/u/2822886/poss...prime-bias.png

censinis 04-17-2011 09:08 AM

Directional RNA-Seq for bacterial transcriptome analysis...
 
Hi guys,

I am particularly interested on directional RNA-seq to be determined by means of RNA-seq and Illumina HiSeq 2000. Same authors (like N.Croucher of Sanger) already mentioned few approaches but I am wondering if anyone already tested them with bacterial totRNA. In particular I am looking at protocols suggested for ribosomal RNA depletion, RNA fragmentation and retro-transcription. Could you be so gentle to help me?

Thanks in advance
Best
SC

adarob 04-17-2011 10:26 AM

This type of bias (as well as sequence-specific bias) is corrected for in Cufflinks. The importance of doing this correction is detailed in our paper here: http://genomebiology.com/2011/12/3/R22/

kalidaemon 05-10-2011 04:44 AM

Seqmonk bug?
 
I'm also trying to visualize/correct for potential 3' bias in my RNA-Seq data-set and want to try Seqmonk. The problem is that I can't get it to run off my PC which has a Windows XP operating system. Have other people run into this problem? What have you done to fix it?

simonandrews 05-10-2011 04:49 AM

Quote:

Originally Posted by kalidaemon (Post 41270)
I'm also trying to visualize/correct for potential 3' bias in my RNA-Seq data-set and want to try Seqmonk. The problem is that I can't get it to run off my PC which has a Windows XP operating system. Have other people run into this problem? What have you done to fix it?

WinXP is not the problem. If SeqMonk won't start at all then it's either:
  • You don't have java installed (or it's not been added to your path)
  • You have less than 2GB RAM in your machine

If you don't have java installed then just get the latest version from java.com and install it.

If you have less than 2GB RAM in your machine then you'll need to lower the default memory allocation in the configuration which is shipped with SeqMonk. Instructions for how to do this can be found here.


All times are GMT -8. The time now is 09:19 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.