SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
ERCC Spike-In Control Suthira RNA Sequencing 9 06-06-2013 02:55 PM
Questions about ERCC RNA Spike-In Control Frazzled Illumina/Solexa 2 06-05-2013 06:17 PM
Normalizing with ERCC spike-in Eric Fournier RNA Sequencing 4 05-14-2013 07:35 AM
ERCC spike in controls for illumina? onconaut Illumina/Solexa 5 11-26-2012 07:47 AM
(a) Background Read Distribution (b) Spike-ins ANJAN PURKAYASTHA Bioinformatics 0 03-05-2009 08:02 AM

Reply
 
Thread Tools
Old 01-07-2014, 07:54 AM   #1
swebb
Junior Member
 
Location: Edinburgh

Join Date: Jun 2009
Posts: 7
Default DESeq analysis with ERCC RNA spike ins

Hi, I want to perform differential expression analyses of multiple RNA-seq samples using DESeq. We have included ERCC spike in controls and wish to use these to normalise the count data.

I have seen a few posts that suggest I use estimateSizeFactors() on a DESeqDataSet consisting of only the ERCC RNAs then apply these size factors to the DESeqDataSet containing my experimental data.

We have used the same total amount of RNA and spike in volume for each sample so there are no corrections applied first. However, we have used mix1 in our treatment samples and mix2 in our control. Would it make more sense then to only use subgroup B of the ERCC spike ins to estimate size factors as these are the same concentration in both mixes?

Is there perhaps a more accurate way to go about this? I have read the "Synthetic spike-in standards for RNA-seq experiments" paper which suggests plotting expected fpkm fold change against observed and fitting a curve. However, I would prefer to use DESeq and count based differential expression to compare with previous analyses performed without spike in.

Thanks in advance for any help with this.
swebb is offline   Reply With Quote
Old 01-07-2014, 10:16 AM   #2
jparsons
Member
 
Location: SF Bay Area

Join Date: Feb 2012
Posts: 62
Default

The DESeq size factors assume that most things will be 1:1, so the 1:1 sub pool would be a good fit.

That said, it's data analysis so there's nothing stopping you from doing it both ways - I'd be very surprised if there was any variation between the normalization factors you generate this way.

I did a quick spot check on some of my data and 9/10 libraries gave the exact same normalization factor using all of the ERCCs vs using only the 1:1 pool. The 10th was off by a bit but it was actually spiked differently than the others, so it's expected for that difference to be picked up.

Last edited by jparsons; 01-07-2014 at 10:25 AM.
jparsons is offline   Reply With Quote
Old 01-07-2014, 11:44 PM   #3
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 992
Default

Why exactly did you decide to use spike-ins? In a standard RNA-Seq experiment, I would expect a normalization based on spike-ins to give worse results than one based on the counts from the biological data, but maybe yours is not a "standard" experiment.
Simon Anders is offline   Reply With Quote
Old 01-09-2014, 03:53 AM   #4
swebb
Junior Member
 
Location: Edinburgh

Join Date: Jun 2009
Posts: 7
Default

We have added RNA spike ins to give us the ability to check for sequencing bias, look at lower limits of detection and hopefully to aid normalisation of transcript abundance. Most of our RNA-seq experiments have spike in added by default.

I would say this analysis is fairly standard. We have extracted ribo-depleted RNA from treatment and control cells (with several reps) and want to test for differential expression between the 2 groups. Can you explain a little more why you would not normalize to the spike in?
swebb is offline   Reply With Quote
Old 02-26-2015, 03:56 PM   #5
friducha
Junior Member
 
Location: UK

Join Date: Jan 2015
Posts: 9
Default

I would also like to understand why using spike-in controls for normalization is being discouraged here.
friducha is offline   Reply With Quote
Old 02-26-2015, 11:58 PM   #6
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,479
Default

Quote:
Originally Posted by friducha View Post
I would also like to understand why using spike-in controls for normalization is being discouraged here.
Added noise, likely different range of expression/presence, lower number of species used for normlization, etc.

Spike-ins are useful when you are concerned about transcriptional amplification (or otherwise heavily asymmetrically distributed fold-changes between groups). When that's not the case, using them makes little sense.
dpryan is offline   Reply With Quote
Old 02-27-2015, 12:31 AM   #7
plabaj
Member
 
Location: Vienna

Join Date: Oct 2010
Posts: 72
Default

Using ERCC spike-ins makes only sense when you are sure that before adding them the ratio between mRNA and totalRNA in your sample is equal.
In a MAQC/SEQC studies this ratio was disturbed and thus we were able to investigate this issue:
http://www.nature.com/nbt/collections/seqc/index.html

You can study how ERCCs behave in your samples with use of errcdashbord R package ("Assessing technical performance in different gene expression experiments with external spike-in RNA control ratio mixtures" from link above), and then use ERCCs for normalization with use of RUV ("Normalization of RNA-seq data using factor analysis of control genes or samples" from link above). However we think that for removing unwanted variation tools like PEER or SVA are better (see "Detecting and correcting systematic variation in large-scale RNA sequencing data" from link above)
__________________
Pawel Labaj

Last edited by plabaj; 02-27-2015 at 12:34 AM.
plabaj is offline   Reply With Quote
Old 02-27-2015, 07:00 AM   #8
aggp11
Member
 
Location: Wisconsin

Join Date: Jun 2011
Posts: 87
Default

In my understanding, using spike-ins helps us detect the "breadth" of our sequencing, or in other terms the low-abundant transcripts that can potentially be detected in the sequencing experiments (I am sorry I am not saying this right), but I agree with what dpryan had to say about the use of spike-ins.

I came across the following paper, where they propose a methodology to normalize your reads using "target genes", which could include housekeeping genes, ERCC spike-ins or any other gene set. They observe that using just the ERCC spike-ins wasn't enough to normalize your RNA-Seq data, which is something that I think is known but never really shown before (unless it is).

http://www.nature.com/nbt/journal/v3.../nbt.2931.html

I found this R package very useful just to play with and use some housekeeping genes for normalization instead of the library size and other design factors.
aggp11 is offline   Reply With Quote
Old 03-02-2015, 09:24 AM   #9
jparsons
Member
 
Location: SF Bay Area

Join Date: Feb 2012
Posts: 62
Default

In my opinion, ERCC spike-ins are uniquely situated to determine the mRNA:totalRNA ratio between samples, and are best used when you expect that the ratio is NOT equal. The Nature Biotech paper referenced above didn't account for the mRNA:totalRNA ratio, even though they were using the MAQC/SEQC sample for half of the work, which is the main reason why they were unable to normalize the data.

Dpryan's points are worth repeating, though - there are few data points to use, particularly with Ambion's 10^20 dynamic range pool. In cases where other normalization methods don't make sense, they can be a good fallback - but to quote the guidance from the Clinical and Laboratory Standards Institute about the ERCCs, "While it is possible to scale or normalize array data by matching the mean or median of a set of external RNA controls, this approach is problematic for a number of reasons…Third, normalization using hundreds or thousands of genes within the linear range of response of the assay is mathematically more robust than using a small number of external RNA controls." (The rest of the paragraph is centered on microarray-specific issues)

I have a preprint that discusses the use of the ERCCs to account for mRNA:total, including in the context of the SEQC dataset, http://biorxiv.org/content/early/2015/02/11/015107
jparsons is offline   Reply With Quote
Old 03-02-2015, 09:32 AM   #10
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,479
Default

I hadn't seen your paper on Bioarxiv, that definitely looks to be worth a read!
dpryan is offline   Reply With Quote
Old 03-02-2015, 04:48 PM   #11
plabaj
Member
 
Location: Vienna

Join Date: Oct 2010
Posts: 72
Default

It seems that this bioRxiv paper is a nice complement to Sarah's NatBiotech ERCC paper. Good job!

I agree with jparsons that based on ERCCs you can nicely charcterize your samples (for example with use of erccdashboard R package). For normalization, however, 'broader' approaches seems to work better. We have shown that both PEER and SVA (not yet RNA-Seq optimized version) work better than ERCC based RUV (http://www.nature.com/nbt/journal/v3.../nbt.3000.html).
__________________
Pawel Labaj
plabaj is offline   Reply With Quote
Old 04-15-2015, 06:00 AM   #12
frymor
Senior Member
 
Location: Germany

Join Date: May 2010
Posts: 148
Default

Quote:
Originally Posted by dpryan View Post
Added noise, likely different range of expression/presence, lower number of species used for normlization, etc.
I can understand the added noise and the differences in ranges of expressions, but what is the meaning of the last part?
how does the number of species influences the normalization and for that why lower number?

thanks for the clarification.

Assa
frymor is offline   Reply With Quote
Old 04-15-2015, 06:07 AM   #13
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,479
Default

That just gets at the robustness. Your robustness increases as the number of rows in the matrix used for normalization increases (to an extent, of course).
dpryan is offline   Reply With Quote
Old 12-07-2015, 04:36 AM   #14
aleferna
Senior Member
 
Location: sweden

Join Date: Sep 2009
Posts: 121
Default What about cross batch comparison

We have recently added a 300 sample library with ERCC controls. The idea behind this is that these are part of large scale study which will happen over time. This means that we cannot be certain that all samples will be generated by the same chemistry, sequencer and prep kit. My hope is that this will help us compare between batches produced between different technologies (ie Illumina 2000 vs Illumina 4000 vs Illumina 6000? ). Does anybody know about a study comparing different RNASeq libpreps / sequencers and how to normalize between them?
aleferna is offline   Reply With Quote
Old 12-08-2015, 12:05 AM   #15
plabaj
Member
 
Location: Vienna

Join Date: Oct 2010
Posts: 72
Default

Sounds very interesting!

In terms of your question, not everything in one paper but have a look into SEQC paper about removing unwated variation as well as ABRF consortium paper here:
http://www.nature.com/nbt/collections/seqc/index.html
In general ABRF consortium might be interested in answering these types of questions.
__________________
Pawel Labaj
plabaj is offline   Reply With Quote
Old 06-14-2018, 06:39 AM   #16
finixtree
Junior Member
 
Location: NC

Join Date: Aug 2015
Posts: 3
Default

As others said, the point of ERCC is to distinguish global bias, besides technical variance. so the way you add ERCC must be based on cell number or DNA amount (equals to cell number), not RNA amount, better to add it into the lysate before RNA extraction, for less bias caused by technical issue. In that case, I don't agree with the idea spike-ins makes the data noisy.
Spike-in should be encouraged if there is enough rationale and you do it right and carefully.
if there is global gene expression bias, like Myc activation, then ERCC definitely give you the chance to see the difference. otherwise, just simply ignore the spike-ins, if there is no real biological bias.
Rule No.1 again, all results must be independently reproduced before you make 'amazing' conclusion.
finixtree is offline   Reply With Quote
Reply

Tags
deseq, deseq2, ercc, rna-seq, spike-in

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 11:24 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO