Hi All,
I've been testing various differential expression analyses on my single-cell RNA-Seq data either using FPKM (generated by Cufflinks, used with Fluidigm Singular Analysis Toolset or Monocle) or read counts (DESeq) but I've gotten very different readouts from the different programs. I know based on the genes that are coming out that the FPKM is more likely to be correct; however I've seen evidence of 3' bias in my own data and am wary of using FPKM since most people doing single cell RNA-Seq have demonstrated it to be problematic.
So, I'm really keen to try use a non-FPKM approach but I'm not really sure how much I should or should not be manipulating the data.
Most of the normalization advice focuses on studying heterogeneity within a population of cells. Brennecke et al (Nature Methods 2013) offer a great DESeq based way to normalize to spikes and technical variability to see highly variable genes within a seemingly homogenous population. Buettner et al (Nature Biotechnology, 2015) have a great followup looking at cell cycle variation.
While this does interest me eventually, I also just want to know the differential expression between two different cell populations. However, due to the single cell data, it's highly variable. Does this matter? Can I consider each cell a "replicate"? Since it's highly variable, would the statistically significant genes that do come out be quite robust? Or should I be normalizing these populations to my spike-ins? (Although, please note I only have 3 spike in controls, not 92 like the majority of other published papers out there.)
Anybody else have any experience with this?
Thanks!
I've been testing various differential expression analyses on my single-cell RNA-Seq data either using FPKM (generated by Cufflinks, used with Fluidigm Singular Analysis Toolset or Monocle) or read counts (DESeq) but I've gotten very different readouts from the different programs. I know based on the genes that are coming out that the FPKM is more likely to be correct; however I've seen evidence of 3' bias in my own data and am wary of using FPKM since most people doing single cell RNA-Seq have demonstrated it to be problematic.
So, I'm really keen to try use a non-FPKM approach but I'm not really sure how much I should or should not be manipulating the data.
Most of the normalization advice focuses on studying heterogeneity within a population of cells. Brennecke et al (Nature Methods 2013) offer a great DESeq based way to normalize to spikes and technical variability to see highly variable genes within a seemingly homogenous population. Buettner et al (Nature Biotechnology, 2015) have a great followup looking at cell cycle variation.
While this does interest me eventually, I also just want to know the differential expression between two different cell populations. However, due to the single cell data, it's highly variable. Does this matter? Can I consider each cell a "replicate"? Since it's highly variable, would the statistically significant genes that do come out be quite robust? Or should I be normalizing these populations to my spike-ins? (Although, please note I only have 3 spike in controls, not 92 like the majority of other published papers out there.)
Anybody else have any experience with this?
Thanks!
Comment