Seqanswers Leaderboard Ad

**fanli** · 03-12-2015, 09:46 AM

If you're interested in differential expression between two cell populations, then a straightforward DESeq-like comparison using cells as replicates seems appropriate. You're right that there will be more variability and that your hits will tend to be robust. An alternative is the 'scde' package:

SCDE by Kharchenko Lab at Harvard DBMI

http://pklab.med.harvard.edu/scde/index.html

The issue of normalizing to spike-in is a separate question, and generally good if available.

**travelk** · 03-24-2015, 02:42 AM

Thank you for your input!

Sometimes it's just reassuring to have someone else confirm that what you are doing isn't completely off base. I've checked out the scde package like you suggested and will use it as a comparison to make sure the same genes are coming through.

**amolinaro** · 06-23-2015, 11:46 AM

Hi all,
I'm new to the world of RNAseq analysis. I'm doing single cell RNAseq as well, but I would like to do differential expression analysis within a population of cells (as opposed to between 2 different populations), to assess the level of heterogeneity in the population. I am planning to use the tophat/cufflinks/monocle pipeline, but would also like to use a raw count method to verify my hits.

I have 2 questions:

1) Can I accomplish this with the SCDE package, or is this package only good for testing DE between 2 groups? If I can use SCDE, what will the output look like?

2)I've read that DESeq can be used for single cell data. I'd appreciate a description of how this works and what the output from this would look like (i.e. would I be able to get a table of p values with rows of genes and columns of cells, or something along those lines?

Thanks in advance!

**jparsons** · 06-29-2015, 12:30 PM

Originally posted by fanli View Post

The issue of normalizing to spike-in is a separate question, and generally good if available.

I'd be interested to hear more about your experience doing this. I have not heard many 'good' stories about normalizing to spike-ins in this space.

**dpryan** · 06-29-2015, 10:54 PM

Originally posted by jparsons View Post

I'd be interested to hear more about your experience doing this. I have not heard many 'good' stories about normalizing to spike-ins in this space.

Are there many alternatives for single-cell data? The traditional methods (TMM, quantile normalization, etc.) aren't appropriate, so I thought spike-ins were largely the only game in town. I fully expect that we're going to start producing single-cell sequencing in the next 3-6 months and would love to hear about better ways

**travelk** · 06-30-2015, 04:56 AM

The issue of normalization in single cell RNA-seq seems to still be a topic up for debate. The 92 ERCC spike-ins seem to be the gold standard for now and a lot of the big groups who are advancing the single cell RNA-seq field seem to rely mostly on these. They use it them to test biological and technical variation, normalize and find the heterogeneity of the cells underneath the huge noise that is inevitable in single cell RNA-seq. Those who have their own biostatisticians on hand do it themselves, but Brennecke et al Nature Methods published an R package that is available to everyone who is less advanced in mathematics and programming. This is what I've been using.

However, here's the big problem we're facing. The standard C1 Fluidigm protocol recommends only 3 Ambion Spike-Ins and because we were following the protocol exactly, this is what we did. The normalization methods for 92 spike-ins don't necessarily apply very well because we don't have enough data points. When I brought this up at a meeting with one of the people who is involved in developing the bioinformatics of single cell, they were surprised we used the Ambion spikes and told us to simply take them out of our dataset altogether, on the assumption that the number of spikes would be the same between each sample. However, we see varying numbers of spikes between samples (for various reasons, some explained, some not) and so I'm still torn between normalizing to spikes vs using traditional routes. However, when I normalize to even my three spikes, the data appears to be a bit "cleaner" when doing comparative analyses.

So, if there's anyone out there who hasn't started yet, I highly recommend using the ERCC spike-ins and not the Ambion as recommended by Fluidigm. I know there is a paper currently under review that will hopefully come out in the next few months that extensively deals with the ERCC spike-ins and may hopefully shed some light on this topic.

In response to amolinaro, differential expression analysis is used for looking at two populations of cells. In SCDE you will have to define those groups (eg treated vs untreated) before it will calculate the data, just like in DESeq. However, if you have one sample that you have taken from a mixed population or perhaps stimulated, then you might want to use the same method as the Brennecke paper I mentioned, or, if your cells are dividing, then check out scLVM by Buettner et al Nature Biotechnology 2015. These methods are specific for looking at highly variable genes within a population of cells. They can also find "new" populations within your group of cells as defined by similar gene expression etc.

**dpryan** · 06-30-2015, 05:10 AM

Nice summary travelk!

Yeah, the Ambion spike-ins seem to be of questionable utility, I suppose would could measure variability with them but not much else. We've been using ERCC spike-ins in our initial dataset to hopefully add a bit of robustness to things (not that the ERCC spike-ins are perfect).

**travelk** · 06-30-2015, 07:09 AM

I think the original purpose of the Ambion spike-ins was purely as a control to ensure that the lysis buffer was getting to all the wells in the C1 chip and that the RT was working efficiently for each cell (which isn't always the case so the spike-ins have been invaluable to us in that way). I don't think they were intended to be used as a normalization tool, but since they are there, it's tempting to use them. They are simply an artificial, theoretically controlled housekeeping gene in a way.

Yes, the ERCC spike-ins aren't perfect, but I think they do give a lot of information about the variability of the method in general and specifically in each data set. It's much better to have them and not need them than the other way around (which is what happened with us). I think a lot of new data and bioinformatics methods are going to be coming out in the next year or two and having the right tools available in your data now will allow you to access those methods in the future.

**immpdaf** · 02-19-2016, 04:39 AM

Hi everyone,
I would be very grateful if anyone could give me some suggestions in our single-cell RNA seq data analysis part.

we have 2 groups of single cells (one normal single cells and one disease single cells), we performed single-cell RNA sequencing. Our library is made using SMART-SEQ2 protocol and it is single-end. We have around 4 million reads / single cell.

Now, using Differential gene Expression analysis, we are going to find significant genes which are upregulated or downregulated in disease cells group with regards to normal group.
So, which normalization technique could you recommend? Our bioinformatician uses TMM to normalize raw counts and he applies R package Monocle to perform DE.
He believes that if we use RPKM, we will get many false positive genes, since we are not comparing genes in one sample, but we are comparing different samples. Do you think it is right?

Many thanks in advance.

**kobeho24** · 11-23-2016, 09:27 PM

Originally posted by immpdaf View Post

Hi everyone,
I would be very grateful if anyone could give me some suggestions in our single-cell RNA seq data analysis part.

we have 2 groups of single cells (one normal single cells and one disease single cells), we performed single-cell RNA sequencing. Our library is made using SMART-SEQ2 protocol and it is single-end. We have around 4 million reads / single cell.

Now, using Differential gene Expression analysis, we are going to find significant genes which are upregulated or downregulated in disease cells group with regards to normal group.
So, which normalization technique could you recommend? Our bioinformatician uses TMM to normalize raw counts and he applies R package Monocle to perform DE.
He believes that if we use RPKM, we will get many false positive genes, since we are not comparing genes in one sample, but we are comparing different samples. Do you think it is right?

Many thanks in advance.

It is known that TMM normalization factors do not take into account library sizes, I think it will be problematic if your library sizes are diversed.

Gary

**hideandSEQ** · 12-09-2016, 09:02 AM

Originally posted by kobeho24 View Post

It is known that TMM normalization factors do not take into account library sizes, I think it will be problematic if your library sizes are diversed.

Gary

Most analysis reviews released in the past year strongly discourage normalizing scRNA data based on library size(https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4823857/ http://www.nature.com/nrg/journal/v1...or-information ). Transcript abundance is highly variable across cells, even within homogeneous populations, and by normalizing based on library size you make the assumption that initial abundances are the same.

Your goal in scRNAseq should be to try to use a measurement that approximates absolute transcript counts for the cells: ideally with a UMI approach however normalizing with spike-ins is also a solid alternative.

Otherwise there's not much of an obvious answer in normalization for C1 data. Its hard to account for non-linear distortion of amplification in C1 data without spikeins or UMIs as there's over 20 PCR cycles involved in library generation. Without either I would try out either FPKM or TPM and see how DE looks with either of them.

**kobeho24** · 12-10-2016, 04:09 AM

Originally posted by hideandSEQ View Post

Most analysis reviews released in the past year strongly discourage normalizing scRNA data based on library size(https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4823857/ http://www.nature.com/nrg/journal/v1...or-information ). Transcript abundance is highly variable across cells, even within homogeneous populations, and by normalizing based on library size you make the assumption that initial abundances are the same.

Your goal in scRNAseq should be to try to use a measurement that approximates absolute transcript counts for the cells: ideally with a UMI approach however normalizing with spike-ins is also a solid alternative.

Otherwise there's not much of an obvious answer in normalization for C1 data. Its hard to account for non-linear distortion of amplification in C1 data without spikeins or UMIs as there's over 20 PCR cycles involved in library generation. Without either I would try out either FPKM or TPM and see how DE looks with either of them.

I disagree with the point that nomalizing the data with library size leads to the assumption that the initial abundance are the same. I suppose it's just a matter of library quantity instead of the actual initial RNA molecule abundance. Since you cannot make sure that every single library of individual cell can be sequnenced and ouput with equal amount of reads. Indeed, normalizing scRNA-seq data is still challenge in isoform or full-length transcrpt analysis, but much better in 5'/3' seq with UMI and spike-ins. I was always wondering if anybody has a kinda solid pipeline for scRNA-seq analysis on isoform level. Appreciate that in advance!

Gary

**seqgirl123** · 12-10-2016, 09:09 AM

Does anyone have a workflow yet for scRNA that allows RT barcoding and UMI labeling on sequencing whole transcript RNA? If so, what kit and analysis tools do you use?

**kobeho24** · 12-10-2016, 09:55 PM

Originally posted by seqgirl123 View Post

Does anyone have a workflow yet for scRNA that allows RT barcoding and UMI labeling on sequencing whole transcript RNA? If so, what kit and analysis tools do you use?

Frankly speaking, due to the short read length of NGS technology, there isn't such a kit or workflow. And it's not practical to do it on 3rd gen seq platform, since the throughput is still too low and not appropriate for and quantitating analysis.

Gary

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 30 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 32 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Normalization in single cell RNA-seq data

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News