SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Single-cell RNA-seq with ERCC RNA Spike-In kobeho24 RNA Sequencing 7 08-22-2018 07:35 AM
DESeq analysis with ERCC RNA spike ins swebb Bioinformatics 15 06-14-2018 05:39 AM
ERCC Spike-In Control Suthira RNA Sequencing 9 06-06-2013 01:55 PM
Questions about ERCC RNA Spike-In Control Frazzled Illumina/Solexa 2 06-05-2013 05:17 PM
Normalizing with ERCC spike-in Eric Fournier RNA Sequencing 4 05-14-2013 06:35 AM

Reply
 
Thread Tools
Old 02-24-2016, 03:24 AM   #1
Alex852013
Member
 
Location: Germany

Join Date: Jan 2013
Posts: 17
Default Up to date ERCC spike ins RNA seq analysis

Hello everybody,

I'm expecting to get my ERCC spiked RNA seq sequencing files soon.
Therefore i would like to find out what's the best way to analyze them.
There are several post where people ask some specific questions to parts of the analysis, but i couldn't find a beginning to end thing.

My idea is to
- map to data with TopHat2
- find the number of reads per gene in the bam file from TopHat2 with HTSeq count
- normalize these reads to the ERCCs. But i have no idea how to do that part

I found a post suggesting to use loess normalization
https://www.biostars.org/p/81337/

But there is also a paper, which claims that loess normalization is not a really good way to go. Additionally, i don't get how they run their suggested solution.
http://www.nature.com/nbt/journal/v3.../nbt.2931.html

So maybe someone has a suggestion how to do the ERCC normalization in 2016, what program i can use and what file or format one has to use. Maybe someone knows a thread, where it is written down how exactly to use the code.

I guess, finally i would need to use DESeq count to compare my triplicates within different time points.

Thanks a lot, Alex
Alex852013 is offline   Reply With Quote
Old 02-24-2016, 03:34 AM   #2
SylvainL
Senior Member
 
Location: Geneva

Join Date: Feb 2012
Posts: 177
Default

Actually, in this paper, if you refer to the latest sentence of the antepenultimate paragraph, I understand you will get better results NOT using the ERCC spike-ins... I opened a discussion about these spike-ins right after I read this paper...
SylvainL is offline   Reply With Quote
Old 02-24-2016, 04:12 AM   #3
Alex852013
Member
 
Location: Germany

Join Date: Jan 2013
Posts: 17
Default

Could you please write what part you mean? If you start with the first few words of the text it will be easier to find. Thanks.

I used the ERCCs, because i think that my factor might upregulate a lot of genes. If it does so, the normalization without ERCCs will destroy the result.
This means upregulated genes might be interpreted as downregulated.

Figure 1 (without ERCCs) and figure 2A (with ERCCs) in paper http://www.sciencedirect.com/science...92867412012263 shows the problem clearly. I'm afraid there is no other way to exclude this, but ERCCs.
Alex852013 is offline   Reply With Quote
Old 02-24-2016, 04:22 AM   #4
SylvainL
Senior Member
 
Location: Geneva

Join Date: Feb 2012
Posts: 177
Default

Of course, in some cases (as wrote blancha in the other discussion), it may help... Personally, I never used and I wanted to have to opinion of the SEQanswers community. It seems, depending on the project, they can bring more issues than they can solve (but again, depending on the project).

I was referring to the the part where the authors say that their normalization is robust when applied to a set of control genes, or set of replicates, while it gave reasonable results using the ERCC spike-ins...
SylvainL is offline   Reply With Quote
Old 02-24-2016, 04:25 AM   #5
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,978
Default

@Alex852013: Search here using "ERCC" and you will find the threads that @SylvainL is alluding to.
GenoMax is online now   Reply With Quote
Old 02-24-2016, 04:45 AM   #6
Michael.Ante
Senior Member
 
Location: Vienna

Join Date: Oct 2011
Posts: 121
Default

Hi Alex,

the ERCC spike-ins do not contain any junctions. Thus, using TopHat2 solely on the ERCC- reference will cause some trouble. Either you need to combine your "host" annotation with the ERCC spike-in ones, or you run e.g. Bowtie2 on the ERCC sequences first and use the unmapped reads for the further analysis.

Moreover, I'd suggest to use the ERCC-Dashboard to have an overview how the ERCCs behave in your experiment.
IMHO, the ERCC transcripts are not reflecting the complexity of the transcriptome. This can be useful in case of controlling coverage, strandedness, and input/gene-read correlation. But they are not designed to control for different junction/PAS-usage, overlapping genes, SNP-detection, .....
You might have a look at https://www.biostars.org/p/170234/. The 5' ends are not described correctly in the provided annotation files; whilst the polyA sequence is included in the fasta.

tl;dr The ERCCs were designed for microarrays and can control nicely for a limited set of quality parameters. For normalising data in a higher complex sample space I would not use them.
Michael.Ante is offline   Reply With Quote
Old 02-24-2016, 05:23 AM   #7
SylvainL
Senior Member
 
Location: Geneva

Join Date: Feb 2012
Posts: 177
Default

I read quickly the paper you give the link to, Alex and I think it would be interesting to re-analyze their data with a pipeline adjusted for RNAseq, and a splicing aware mapper. They used bowtie and the package "affy". And they also used the RPKM counts, and it does not seem they did replicates... To me, it looks like they more or less did everything wrong there (not a RNAseq analysis pipeline)

Last edited by SylvainL; 02-24-2016 at 05:41 AM.
SylvainL is offline   Reply With Quote
Old 02-24-2016, 08:29 AM   #8
Alex852013
Member
 
Location: Germany

Join Date: Jan 2013
Posts: 17
Default

@ SylvainL:

The paper "Revisiting Global Gene Expression Analysis" was thought to give people an idea of the problem i'm facing. I don't think it makes sense to discuss the quality here.
To give everybody an ide of the problem without checking the paper, the pic:



1st row: for a transcription factor regulating only a few genes, no spike ins are required for sure. Normalization works perfect.
2nd row: if a transcription factor changes most of the genes (i've heard alredy 20 % of all genes is enough), the normalization will be biased, because the normlization programs assume that the expression of most genes will stay the same.
3rd row: with the ERCCs included, the normalization bias mentioned in row 2 can be avoided. That's what i want to use the spike ins for.

My protein is a transcriptional activator in a viral system, but in the human system it downregulated most of the genes. This was kind of unexpected. Therefore i want to exclude that i get the normalization bias, which is described in the picture.

@ Thank you, i can also use bowtie 2, nevertheless i already made a file which includes each ERCCs RNA like a single chromosome.

Maybe someone can nevertheless tell me, how to do the normalization. I will for sure check both ways of analysis (with and without ERCCs), but therefore i would need to know how to normalize with the ERCCs.
Thanks a lot
Alex852013 is offline   Reply With Quote
Old 02-24-2016, 03:11 PM   #9
nucacidhunter
Jafar Jabbari
 
Location: Melbourne

Join Date: Jan 2013
Posts: 1,226
Default

I think a better approach would have been adding ERCC spike-in to cells prior to RNA extraction. In this case an equal number of cells should be used for all samples.
nucacidhunter is offline   Reply With Quote
Old 02-24-2016, 11:55 PM   #10
Alex852013
Member
 
Location: Germany

Join Date: Jan 2013
Posts: 17
Default

That's what i did. I used the same amount of cells, added the ERCCs (i took 1 Ál diluted this one 1:100 and added 10 Ál to each tube (better than 1 Ál from a 1:10 dilution to avoid a strong pipetting bias effect).
Alex852013 is offline   Reply With Quote
Old 02-25-2016, 12:40 AM   #11
SylvainL
Senior Member
 
Location: Geneva

Join Date: Feb 2012
Posts: 177
Default

@Alex852013,

I totally understand your points and why you want to use the ERCC spike-ins. Probably in your case, it is really necessary. But I believe it is important to look how people did their analysis to be sure their normalization method really brings a plus... Unfortunately, I do not have time right now, but I will re-analyze the data of this paper, using different pipelines to get my own idea about this spike-ins normalization...
SylvainL is offline   Reply With Quote
Reply

Tags
2016, ercc spike ins, normalization, rna seq

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:46 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO