SEQanswers

Go Back   SEQanswers > Applications Forums > RNA Sequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
ERCC Spike-In Control Suthira RNA Sequencing 9 06-06-2013 01:55 PM
Questions about ERCC RNA Spike-In Control Frazzled Illumina/Solexa 2 06-05-2013 05:17 PM
Normalizing to input biznatch Bioinformatics 5 05-06-2013 11:38 PM
Pipeline + test datset for RNAseq with ERCC controls? danwiththeplan RNA Sequencing 0 03-10-2013 01:58 PM
ERCC spike in controls for illumina? onconaut Illumina/Solexa 5 11-26-2012 06:47 AM

Reply
 
Thread Tools
Old 05-08-2013, 12:33 PM   #1
Eric Fournier
Member
 
Location: Quebec, Canada

Join Date: Jul 2011
Posts: 21
Default Normalizing with ERCC spike-in

Hello everyone,

I have RNA-seq libraries prepared from 10 different stages of embryonic development (3 replicates per stage), with each library constructed using the same number of embryos. The amount of total RNA should be variable between stages and follow a known pattern. Thus, to be able to compare "absolute quantities" of RNAs, we spiked each of the libraries with ERCC controls after RNA extraction, but prior to any other processing step.

I've aligned the sequencing results using TopHat2, and constructed transcripts using cufflinks. However, now that I'm getting to the "normalize using ERCC" steps, I am a unsure on how to proceed.

My first instinct would be to do a regression of the ERCC's FPKMs against their known concentration for each of the libraries, then report all of the other transcripts' FPKM against that curve. However, given that FPKMs are already a normalized value, is this still a good idea?

Furthermore, going from FPKM to whichever measure I obtain will make it impossible to use standard RNA-seq comparison tools, such as cuffdiff. Would there be another kind of normalization which would be more "standard" or more sensible?

Thank you for any help,
-Eric Fournier
Eric Fournier is offline   Reply With Quote
Old 05-09-2013, 06:02 AM   #2
jparsons
Member
 
Location: SF Bay Area

Join Date: Feb 2012
Posts: 62
Default

The easy route is to just follow the "method" used in the Cell paper (Revisiting Global Gene Expression Analysis, 151, Oct 2012). They do more or less exactly what your first instinct suggests, fitting a regression of the ERCC spikes and renormalizing.

When running any cufflinks/cuffdiff analysis in a sample which contains ERCCs, you don't want to keep ERCC-mapped reads in the denominator of your FPKM calculation. You could either normalize them away (by multiplying through by total reads / Total non-ERCC Reads) or you could prevent them from showing up in the first place (by mapping them separately) and then factoring in their relative ratios after the fact. I prefer the latter method, because I don't understand everything that Cufflinks does in its calculations, and I don't trust that the presence of spiked-in RNA doesn't cause one of Cufflinks' calculations to make an assumption that isn't true in my sample.

The renormalized values would still be FPKM, as you are merely correcting for the incorrect assumption that Cufflinks initially makes about your sample. You should be able to carry forward with Cuffdiff after you change the denominator to the proper value.

Last edited by jparsons; 05-09-2013 at 06:08 AM.
jparsons is offline   Reply With Quote
Old 05-10-2013, 06:30 AM   #3
Eric Fournier
Member
 
Location: Quebec, Canada

Join Date: Jul 2011
Posts: 21
Default

Thank you very much! The article was a very nice read.
Eric Fournier is offline   Reply With Quote
Old 05-13-2013, 03:58 PM   #4
danwiththeplan
Member
 
Location: Auckland

Join Date: Sep 2011
Posts: 72
Default

Eric I'm curious, did you multiplex all your samples and run them on a single lane? So you required 30 separate spikes, for each library, which were then multiplex-tagged and combined? Or were they all on a different lane?

Also, did you use the ExFold mix to look at fold-change effects?
danwiththeplan is offline   Reply With Quote
Old 05-14-2013, 06:35 AM   #5
Eric Fournier
Member
 
Location: Quebec, Canada

Join Date: Jul 2011
Posts: 21
Default

Hello Dan,

we ran our samples on five different lanes. Each lane used 6 of 8 possible multiplex tags from the Encore multiplex kit (which uses 4nt tags). This actually caused a small problem, since one of the combination of 6 tags that we used caused library complexity for the first four nucleotides to go down substantially, which was reflected as low quality values across the whole library.

The ERCC spikes were added immediatly after RNA extraction, while the multiplexing was done just prior to sending the libraries to the sequencing center.

Since we had 10 different tissues and that we were not interested in any particular pairwise comparison, we did not use th Exfold mix to assess fold-change effects. Rather, we used only mix 1 from the ERCC to have one shared standard across all libraries.
Eric Fournier is offline   Reply With Quote
Reply

Tags
cufflinks, ercc, fpkm, normalization

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 11:18 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO