SEQanswers (
-   Bioinformatics (
-   -   Up to date ERCC spike ins RNA seq analysis (

Alex852013 02-24-2016 03:24 AM

Up to date ERCC spike ins RNA seq analysis
Hello everybody,

I'm expecting to get my ERCC spiked RNA seq sequencing files soon.
Therefore i would like to find out what's the best way to analyze them.
There are several post where people ask some specific questions to parts of the analysis, but i couldn't find a beginning to end thing.

My idea is to
- map to data with TopHat2
- find the number of reads per gene in the bam file from TopHat2 with HTSeq count
- normalize these reads to the ERCCs. But i have no idea how to do that part

I found a post suggesting to use loess normalization

But there is also a paper, which claims that loess normalization is not a really good way to go. Additionally, i don't get how they run their suggested solution.

So maybe someone has a suggestion how to do the ERCC normalization in 2016, what program i can use and what file or format one has to use. Maybe someone knows a thread, where it is written down how exactly to use the code.

I guess, finally i would need to use DESeq count to compare my triplicates within different time points.

Thanks a lot, Alex

SylvainL 02-24-2016 03:34 AM

Actually, in this paper, if you refer to the latest sentence of the antepenultimate paragraph, I understand you will get better results NOT using the ERCC spike-ins... I opened a discussion about these spike-ins right after I read this paper...

Alex852013 02-24-2016 04:12 AM

Could you please write what part you mean? If you start with the first few words of the text it will be easier to find. Thanks.

I used the ERCCs, because i think that my factor might upregulate a lot of genes. If it does so, the normalization without ERCCs will destroy the result.
This means upregulated genes might be interpreted as downregulated.

Figure 1 (without ERCCs) and figure 2A (with ERCCs) in paper shows the problem clearly. I'm afraid there is no other way to exclude this, but ERCCs.

SylvainL 02-24-2016 04:22 AM

Of course, in some cases (as wrote blancha in the other discussion), it may help... Personally, I never used and I wanted to have to opinion of the SEQanswers community. It seems, depending on the project, they can bring more issues than they can solve (but again, depending on the project).

I was referring to the the part where the authors say that their normalization is robust when applied to a set of control genes, or set of replicates, while it gave reasonable results using the ERCC spike-ins...

GenoMax 02-24-2016 04:25 AM

@Alex852013: Search here using "ERCC" and you will find the threads that @SylvainL is alluding to.

Michael.Ante 02-24-2016 04:45 AM

Hi Alex,

the ERCC spike-ins do not contain any junctions. Thus, using TopHat2 solely on the ERCC- reference will cause some trouble. Either you need to combine your "host" annotation with the ERCC spike-in ones, or you run e.g. Bowtie2 on the ERCC sequences first and use the unmapped reads for the further analysis.

Moreover, I'd suggest to use the ERCC-Dashboard to have an overview how the ERCCs behave in your experiment.
IMHO, the ERCC transcripts are not reflecting the complexity of the transcriptome. This can be useful in case of controlling coverage, strandedness, and input/gene-read correlation. But they are not designed to control for different junction/PAS-usage, overlapping genes, SNP-detection, .....
You might have a look at The 5' ends are not described correctly in the provided annotation files; whilst the polyA sequence is included in the fasta.

tl;dr The ERCCs were designed for microarrays and can control nicely for a limited set of quality parameters. For normalising data in a higher complex sample space I would not use them.

SylvainL 02-24-2016 05:23 AM

I read quickly the paper you give the link to, Alex and I think it would be interesting to re-analyze their data with a pipeline adjusted for RNAseq, and a splicing aware mapper. They used bowtie and the package "affy". And they also used the RPKM counts, and it does not seem they did replicates... To me, it looks like they more or less did everything wrong there (not a RNAseq analysis pipeline)

Alex852013 02-24-2016 08:29 AM

@ SylvainL:

The paper "Revisiting Global Gene Expression Analysis" was thought to give people an idea of the problem i'm facing. I don't think it makes sense to discuss the quality here.
To give everybody an ide of the problem without checking the paper, the pic:

1st row: for a transcription factor regulating only a few genes, no spike ins are required for sure. Normalization works perfect.
2nd row: if a transcription factor changes most of the genes (i've heard alredy 20 % of all genes is enough), the normalization will be biased, because the normlization programs assume that the expression of most genes will stay the same.
3rd row: with the ERCCs included, the normalization bias mentioned in row 2 can be avoided. That's what i want to use the spike ins for.

My protein is a transcriptional activator in a viral system, but in the human system it downregulated most of the genes. This was kind of unexpected. Therefore i want to exclude that i get the normalization bias, which is described in the picture.

@ Thank you, i can also use bowtie 2, nevertheless i already made a file which includes each ERCCs RNA like a single chromosome.

Maybe someone can nevertheless tell me, how to do the normalization. I will for sure check both ways of analysis (with and without ERCCs), but therefore i would need to know how to normalize with the ERCCs.
Thanks a lot

nucacidhunter 02-24-2016 03:11 PM

I think a better approach would have been adding ERCC spike-in to cells prior to RNA extraction. In this case an equal number of cells should be used for all samples.

Alex852013 02-24-2016 11:55 PM

That's what i did. I used the same amount of cells, added the ERCCs (i took 1 Ál diluted this one 1:100 and added 10 Ál to each tube (better than 1 Ál from a 1:10 dilution to avoid a strong pipetting bias effect).

SylvainL 02-25-2016 12:40 AM


I totally understand your points and why you want to use the ERCC spike-ins. Probably in your case, it is really necessary. But I believe it is important to look how people did their analysis to be sure their normalization method really brings a plus... Unfortunately, I do not have time right now, but I will re-analyze the data of this paper, using different pipelines to get my own idea about this spike-ins normalization...

All times are GMT -8. The time now is 09:12 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.