SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Next-generation tutorial Avro1986 Illumina/Solexa 3 06-05-2014 05:52 PM
ERCC ExFold vs. regular spike-in mix? jwfoley RNA Sequencing 1 04-21-2014 11:22 AM
CHIP Seq data analysis tutorial? dicty Bioinformatics 3 04-01-2014 06:25 PM
Guide/tutorial for the analysis of RNA-seq data MDY Bioinformatics 70 07-23-2013 12:57 AM
tutorial annovar abakelaar RNA Sequencing 0 07-27-2011 01:28 AM

Reply
 
Thread Tools
Old 09-03-2014, 01:27 AM   #1
pkstarstorm05
Member
 
Location: Melbourne

Join Date: Jun 2014
Posts: 14
Default RNA-seq Exfold Tutorial

Hi everyone,

This is a beefy one... sort of.

I've set up a big RNA-seq experiment where I'm comparing pooled mouse samples. I've clipped off a bit of tissue and extracted the RNA, then for each time point I've pooled together a few individuals since the tissue I'm using is very limited and I can't get much RNA from them. There are 4 samples per pool. After pooling, I ran Ribozero to get rid of the rRNA and during the process I spiked the samples using the ERCC ex-Fold spike in mix (0.5ul - a dilution amount that seemed to be appropriate for my experiment).

This is the set up:

Mouse E11.5 Control (4 individuals pooled in to the same tube)
Mouse E11.5 TEST (4 individuals pooled in to the same tube)
Mouse E12.5 Control (individuals...etc)
Mouse E12.5 TEST... etc etc
All the way up to
Mouse E17.5 Control (4 individuals pooled)
Mouse E17.5 TEST (4 individuals pooled)

Each pool was sequenced on the Illumina hi-seq using v3 chemistry and I have the data. The problem that I have is trying to analyze the pools for differential expression and using the ERCC spike-ins for normalization.

So just to clear a couple things up first

--The point of this experiment is not to generate an end all serial transcriptome data set for the tissue I'm studying. We were willing to spend the money to do this as an exploratory experiment to highlight specific genes that we would follow up later. So its just exploratory and not for publishing, necessarily.

--We are aware of the alternatives for the approach to this experiment, but decided that based on our goals and our budget that this would be the best approach.

Okay - so considering all of these details, I was hoping I might get some feed back on the following questions:

1. Was it necessary for us to use the ERCC ex-fold spike ins for this experimental set up? We went back and forth about this a little bit, and decided it would be best to use them. But I wanted to get a feel from the community on this. I know the ERCC spikes are supposed to help control for platform variation, but since we multiplexed all of the pools during the run (across several lanes), does this even matter?

2. How on earth do I actually normalize the data from the ERCC spike ins. I mean step by step. I have run CuffDiff, and it seems to have its own normalizing standard when performing the analysis which did produce some very interesting results... but surely it doesn't it take in to account the ERCC spike ins automatically? I've also come across forum threads where people reference random functions with no context, like "loess.normalization()". What on Earth is that supposed to mean? Sounds like excel! haha I haven't been able to find a single how-to or tutorial on how to actually run the ERCC normalization. Maybe I'm not looking in the right place? I'm not hugely familiar with the bioinformatics skills necessary for doing this, but there is also no guidance or expertise on this at the institution/dept. I'm in. But we also don't want to outsource. Can anyone give me a step by step or link to a guide for normalizing my RNA-seq data using the ERCC spike ins? I don't have an intuitive knowledge of which programs I am supposed to use and I don't know what some random function is supposed to represent or where I'm supposed to implement it... but I do have the skills to learn how to use the tools with a little guidance.

Thanks so much for any help and please let me know if you need any more information!

Cheers!

Paul
pkstarstorm05 is offline   Reply With Quote
Old 09-03-2014, 01:42 AM   #2
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,479
Default

  1. No, they weren't needed. ERCC spike-ins are mostly useful for single-cell sequencing. I wouldn't bother with them here unless the library normalization goes weird.
  2. I seriously doubt that you can use spike-ins with cuffdiff. When you see people mentioning loess normalization, they're talking about doing things in R, which is pretty much what you'll have to do as well. The general idea is to align to a genome containing the ERCC sequences in it (just concatenate your reference with them) and then get count information for the spike-ins as well as the real genes. You then import that into R using whatever method you prefer and use on the ERCC subset of that for library normalization. You then apply the computed size factors to the dataset (removing the ERCC probes) and continue with the analysis. If you have no clue what that means then either don't bother with the ERCC spike-ins (a good idea anyway since they're likely to produce crappier results) or collaborate with a local bioinformatician.
dpryan is offline   Reply With Quote
Old 10-17-2014, 06:44 PM   #3
munrosa
Junior Member
 
Location: California

Join Date: Oct 2014
Posts: 1
Default

Hi Paul,

You might be interested in the new erccdashboard R package for analyzing your data. The package is available on Bioconductor: http://bioconductor.jp/packages/3.0/...dashboard.html

The publication describing the erccdashboard,"Assessing technical performance in differential gene expression experiments with external spike-in RNA control ratio mixtures" is here: http://www.nature.com/ncomms/2014/14...comms6125.html

These resources will provide you with details and empirical evidence that should more substantively answer your questions about the utility of the ERCC spike-ins compared to the level of detail that can reasonably be provided in replies to your post. The ERCC spike-ins can be used for more than single-cell sequencing and normalization -- although these have been areas where they've seen a lot of use.

I'd be happy to work with you on your analysis of the ERCC spike-ins in your experiments and your use of the erccdashboard -- you can feel free to contact me directly.

Cheers,
Sarah
munrosa is offline   Reply With Quote
Reply

Tags
advice, exfold, normalization, rnaseq, tutorial

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 04:24 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO