![]() |
![]() |
#1 |
Member
Location: France Join Date: Jul 2013
Posts: 20
|
![]()
Hi All,
I've been testing various differential expression analyses on my single-cell RNA-Seq data either using FPKM (generated by Cufflinks, used with Fluidigm Singular Analysis Toolset or Monocle) or read counts (DESeq) but I've gotten very different readouts from the different programs. I know based on the genes that are coming out that the FPKM is more likely to be correct; however I've seen evidence of 3' bias in my own data and am wary of using FPKM since most people doing single cell RNA-Seq have demonstrated it to be problematic. So, I'm really keen to try use a non-FPKM approach but I'm not really sure how much I should or should not be manipulating the data. Most of the normalization advice focuses on studying heterogeneity within a population of cells. Brennecke et al (Nature Methods 2013) offer a great DESeq based way to normalize to spikes and technical variability to see highly variable genes within a seemingly homogenous population. Buettner et al (Nature Biotechnology, 2015) have a great followup looking at cell cycle variation. While this does interest me eventually, I also just want to know the differential expression between two different cell populations. However, due to the single cell data, it's highly variable. Does this matter? Can I consider each cell a "replicate"? Since it's highly variable, would the statistically significant genes that do come out be quite robust? Or should I be normalizing these populations to my spike-ins? (Although, please note I only have 3 spike in controls, not 92 like the majority of other published papers out there.) Anybody else have any experience with this? Thanks! |
![]() |
![]() |
![]() |
#2 |
Senior Member
Location: California Join Date: Jul 2014
Posts: 198
|
![]()
If you're interested in differential expression between two cell populations, then a straightforward DESeq-like comparison using cells as replicates seems appropriate. You're right that there will be more variability and that your hits will tend to be robust. An alternative is the 'scde' package:
http://pklab.med.harvard.edu/scde/index.html The issue of normalizing to spike-in is a separate question, and generally good if available. |
![]() |
![]() |
![]() |
#3 |
Member
Location: France Join Date: Jul 2013
Posts: 20
|
![]()
Thank you for your input!
Sometimes it's just reassuring to have someone else confirm that what you are doing isn't completely off base. I've checked out the scde package like you suggested and will use it as a comparison to make sure the same genes are coming through. |
![]() |
![]() |
![]() |
#4 |
Junior Member
Location: Toronto, Canada Join Date: Jun 2015
Posts: 3
|
![]()
Hi all,
I'm new to the world of RNAseq analysis. I'm doing single cell RNAseq as well, but I would like to do differential expression analysis within a population of cells (as opposed to between 2 different populations), to assess the level of heterogeneity in the population. I am planning to use the tophat/cufflinks/monocle pipeline, but would also like to use a raw count method to verify my hits. I have 2 questions: 1) Can I accomplish this with the SCDE package, or is this package only good for testing DE between 2 groups? If I can use SCDE, what will the output look like? 2)I've read that DESeq can be used for single cell data. I'd appreciate a description of how this works and what the output from this would look like (i.e. would I be able to get a table of p values with rows of genes and columns of cells, or something along those lines? Thanks in advance! |
![]() |
![]() |
![]() |
#5 |
Member
Location: SF Bay Area Join Date: Feb 2012
Posts: 62
|
![]() |
![]() |
![]() |
![]() |
#6 | |
Devon Ryan
Location: Freiburg, Germany Join Date: Jul 2011
Posts: 3,480
|
![]() Quote:
![]() |
|
![]() |
![]() |
![]() |
#7 |
Member
Location: France Join Date: Jul 2013
Posts: 20
|
![]()
The issue of normalization in single cell RNA-seq seems to still be a topic up for debate. The 92 ERCC spike-ins seem to be the gold standard for now and a lot of the big groups who are advancing the single cell RNA-seq field seem to rely mostly on these. They use it them to test biological and technical variation, normalize and find the heterogeneity of the cells underneath the huge noise that is inevitable in single cell RNA-seq. Those who have their own biostatisticians on hand do it themselves, but Brennecke et al Nature Methods published an R package that is available to everyone who is less advanced in mathematics and programming. This is what I've been using.
However, here's the big problem we're facing. The standard C1 Fluidigm protocol recommends only 3 Ambion Spike-Ins and because we were following the protocol exactly, this is what we did. The normalization methods for 92 spike-ins don't necessarily apply very well because we don't have enough data points. When I brought this up at a meeting with one of the people who is involved in developing the bioinformatics of single cell, they were surprised we used the Ambion spikes and told us to simply take them out of our dataset altogether, on the assumption that the number of spikes would be the same between each sample. However, we see varying numbers of spikes between samples (for various reasons, some explained, some not) and so I'm still torn between normalizing to spikes vs using traditional routes. However, when I normalize to even my three spikes, the data appears to be a bit "cleaner" when doing comparative analyses. So, if there's anyone out there who hasn't started yet, I highly recommend using the ERCC spike-ins and not the Ambion as recommended by Fluidigm. I know there is a paper currently under review that will hopefully come out in the next few months that extensively deals with the ERCC spike-ins and may hopefully shed some light on this topic. In response to amolinaro, differential expression analysis is used for looking at two populations of cells. In SCDE you will have to define those groups (eg treated vs untreated) before it will calculate the data, just like in DESeq. However, if you have one sample that you have taken from a mixed population or perhaps stimulated, then you might want to use the same method as the Brennecke paper I mentioned, or, if your cells are dividing, then check out scLVM by Buettner et al Nature Biotechnology 2015. These methods are specific for looking at highly variable genes within a population of cells. They can also find "new" populations within your group of cells as defined by similar gene expression etc. |
![]() |
![]() |
![]() |
#8 |
Devon Ryan
Location: Freiburg, Germany Join Date: Jul 2011
Posts: 3,480
|
![]()
Nice summary travelk!
Yeah, the Ambion spike-ins seem to be of questionable utility, I suppose would could measure variability with them but not much else. We've been using ERCC spike-ins in our initial dataset to hopefully add a bit of robustness to things (not that the ERCC spike-ins are perfect). |
![]() |
![]() |
![]() |
#9 |
Member
Location: France Join Date: Jul 2013
Posts: 20
|
![]()
I think the original purpose of the Ambion spike-ins was purely as a control to ensure that the lysis buffer was getting to all the wells in the C1 chip and that the RT was working efficiently for each cell (which isn't always the case so the spike-ins have been invaluable to us in that way). I don't think they were intended to be used as a normalization tool, but since they are there, it's tempting to use them. They are simply an artificial, theoretically controlled housekeeping gene in a way.
Yes, the ERCC spike-ins aren't perfect, but I think they do give a lot of information about the variability of the method in general and specifically in each data set. It's much better to have them and not need them than the other way around (which is what happened with us). I think a lot of new data and bioinformatics methods are going to be coming out in the next year or two and having the right tools available in your data now will allow you to access those methods in the future. |
![]() |
![]() |
![]() |
#10 |
Junior Member
Location: Stockholm Join Date: Sep 2015
Posts: 4
|
![]()
Hi everyone,
I would be very grateful if anyone could give me some suggestions in our single-cell RNA seq data analysis part. we have 2 groups of single cells (one normal single cells and one disease single cells), we performed single-cell RNA sequencing. Our library is made using SMART-SEQ2 protocol and it is single-end. We have around 4 million reads / single cell. Now, using Differential gene Expression analysis, we are going to find significant genes which are upregulated or downregulated in disease cells group with regards to normal group. So, which normalization technique could you recommend? Our bioinformatician uses TMM to normalize raw counts and he applies R package Monocle to perform DE. He believes that if we use RPKM, we will get many false positive genes, since we are not comparing genes in one sample, but we are comparing different samples. Do you think it is right? Many thanks in advance. |
![]() |
![]() |
![]() |
#11 | |
Member
Location: HKUST, Hong Kong Join Date: Apr 2015
Posts: 32
|
![]() Quote:
Gary |
|
![]() |
![]() |
![]() |
#12 | |
Junior Member
Location: New Haven Join Date: Mar 2016
Posts: 8
|
![]() Quote:
Your goal in scRNAseq should be to try to use a measurement that approximates absolute transcript counts for the cells: ideally with a UMI approach however normalizing with spike-ins is also a solid alternative. Otherwise there's not much of an obvious answer in normalization for C1 data. Its hard to account for non-linear distortion of amplification in C1 data without spikeins or UMIs as there's over 20 PCR cycles involved in library generation. Without either I would try out either FPKM or TPM and see how DE looks with either of them. |
|
![]() |
![]() |
![]() |
#13 | |
Member
Location: HKUST, Hong Kong Join Date: Apr 2015
Posts: 32
|
![]() Quote:
Gary |
|
![]() |
![]() |
![]() |
#14 |
Member
Location: U.S Join Date: Oct 2008
Posts: 76
|
![]()
Does anyone have a workflow yet for scRNA that allows RT barcoding and UMI labeling on sequencing whole transcript RNA? If so, what kit and analysis tools do you use?
|
![]() |
![]() |
![]() |
#15 | |
Member
Location: HKUST, Hong Kong Join Date: Apr 2015
Posts: 32
|
![]() Quote:
Gary |
|
![]() |
![]() |
![]() |
#16 | |
Junior Member
Location: France Join Date: Jun 2016
Posts: 6
|
![]() Quote:
Hi Travelk, Did you published the paper related to the data generated with the C1? I'm facing the same thing, I will have full length RNAseq data from the C1 soon and we used the Spike from Ambion as recommended in the fluidigm's protocol. Does the spike from Ambion allow you to normalize these datas? Thanks, |
|
![]() |
![]() |
![]() |
Tags |
normalization, single-cell |
Thread Tools | |
|
|