Seqanswers Leaderboard Ad

**jiewencai** · 01-11-2013, 07:02 PM

I have run into the same problem with you,expect the answers……

**Wolfgang Huber** · 01-12-2013, 01:54 AM

Hi - you are getting huge dispersions, of the order of 10, indicating that the counts between your different "replicate" samples are very, very different. Have you tried looking at pairwise scatterplots of the data? I.e. something like (replace pasilla with your own data):

library("pasilla")
data("pasillaGenes")
trsf = function(x, c=1) log2(x+c)
pairs(trsf(counts(pasillaGenes)), pch=".")

Best wishes
Wolfgang

**tellsparck** · 01-12-2013, 11:19 AM

Thanks! The high dispersions are somewhat expected because the data is from single cell RNA which underwent amplification.So there is inherent cell to cell variability and technical variability coming from amplification The question now is how badly this will affect the statistical analysis that follows. Do you think using per gene est (or any other deviation from the default) may help? Have you tried to modify DeSeq for this type of data?

**Simon Anders** · 01-12-2013, 11:45 AM

First: If you ask for advice on this forum, please always mention all relevant facts. Asking as question as yours without mentioning that you are not talking about standard RNA-Seq but about something unusual and very experimental, namely single-cell RNA-Seq, just wastes everybody's time as you will only get wrong advice.

Now: The fit (red line) is indeed not very good, and we have some eays to improve the fit in siutations such as yours. This won't help much because the raw estrimates (black dots) are show that nearly all of your genes have dispersions above one and hence vary by a factor of two or more between cells of the same cell type. Unless the differences between different cell types are really drastic (at least, say, ten-fold), you cannot see them in this noise. This is not a problem of the statistical analysis, but one of the experimental protocol.

**tellsparck** · 01-13-2013, 04:26 AM

Sorry for not mentioning the nature of data.Yes it is noisy, but the differences between groups are also drastic, many genes are close to zero in one group and thousands in another.But of course there are also some with less drastic differences.Given this how can I extract the maximum info out of it? I have tried using only samples that look similar in the PCA and similar Q3 and so on...
Can you suggest any modifications in DeSeq that can improve the analysis?
Many thanks

**Simon Anders** · 01-16-2013, 12:36 AM

"many genes are close to zero in one group and thousands in another" -- yes, this is a drastic difference, but have a look at your replicates: I guess you will see equally drastic changes between two cells of the same type. This is a quite common problem in single-cell RNA-Seq, and you are not the first one to find this out the hard way, sorry.

As you have many samples, you could try to switch to 'sharingMode="gene-ests-only"'. This might be a little bit anticonservative, and if even this does not give you anything there might simply be nothing in your data.

**tellsparck** · 01-17-2013, 10:01 AM

Thanks Simon. Yes, using 'gene est only' mode works and outputs a good sized list of differential expression, among them internal control genes which we know should be differentially expressed. (I can get this even with default settings in some comparisons).Two of my groups are from very closely related cells and it is here that I have to change sharing mode.
Do you think I should fix the fit as you mentioned before? If so, how can I do this?
Thanks again!

**Simon Anders** · 01-17-2013, 10:16 AM

No, the whole point of "gene-ests-only" is that it instructs DESeq to ignore the fit, so it doesn't matter any more if it's bad.

**tellsparck** · 01-22-2013, 12:50 PM

Thanks Simon. But I can still benefit from improving the fit in some comparisons where I do not use gene-est-only- do you have a code for this which you can share with me?

**billstevens** · 02-02-2013, 10:49 AM

Hello community!

I have 3 conditions and 5 total replicates. So my DOF is 15 samples - 3 conditions = 12. Therefore I am using the gene-est only for estimating my dispersions.

cds <- estimateDispersions(cds, method= "per-condition", sharingMode="gene-est-only" )

Can anyone point me to where it is discussed to use gene-est when enough DOF have been reached? I saw in a post that Simon mentions once you get to 10 to 15, you can use gene-est, but I'm doing my proposal next week, and I want to be able to point to something more formal.

Thanks very much!!

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 30 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 32 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

DESeq-strange disoersion plot and using shorth

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News