Unconfigured Ad

**Michael Love** · 12-19-2014, 10:24 AM

Now is a great chance to read the just-publised manuscript ;-)

http://genomebiology.com/2014/15/12/550/abstract (see Fig 2)

or see Fig 1 in the vignette:

vignette("DESeq2")

Relevant is the betaPrior argument of ?DESeq and the addMLE argument of ?results

**lmolokin** · 02-19-2015, 12:52 PM

high count outlier

Michael,
I have a question related to LFC shrinkage but involving a high count gene instead (the gene with the highest mean in my dataset to be exact). From what I can tell, DESeq2 has deemed this gene a dispersion outlier based on gene's data from the DESeqDataSet object:

baseMean 8.65E+05
baseVar 2.16E+12
allZero FALSE
dispGeneEst 2.888
dispFit 0.030
dispersion 2.888
dispIter 7
dispOutlier TRUE
dispMAP 8.31E-02

Granted its dispersion is high, this gene is the target of my experiment's treatment and is expected to be heavily down regulated in the treated group vs control and according to edgeR by more than a hundred fold. However, the results of the DESeq2 Wald test on this gene yield:

baseMean 864665.543
log2FoldChange -0.127
lfcSE 0.074
stat -1.704
pvalue 0.088
padj 0.533

I realize that outlier LFC's are shruken based on several factors but is it possible to have a massive fold change be shruken to such a low and insignificant value? Is there an alternate approach here that would reduce the severity of the shrinking?

Thanks!

**Michael Love** · 02-19-2015, 01:03 PM

You can turn off the LFC shrinkage, if you want to go with the MLE LFC estimate by setting betaPrior=FALSE in DESeq(). You will then get an MLE estimate of fold change which is simple (average in group2)/(average in group1). The average can be thrown off when you have high within-group variance, as reflected by the very large dispersion estimate for this gene. What are the counts like for this gene?

Another way to understand the variance of the MLE LFC estimate is that if you removed the sample with the largest count, how much would the estimate change?

Code:

k <- counts(dds, normalized=TRUE)[ gene.idx, ]
idx <- which.max(k)
cond <- dds$condition[-idx]
k <- k[-idx]
log2(mean(k[cond == "B"])/mean(k[cond == "A"]))

**lmolokin** · 02-19-2015, 01:28 PM

Here are the individual sample counts for this gene:

No code has to be inserted here.

**Michael Love** · 02-19-2015, 01:44 PM

Note that the range within each group is more than x10. So while there is a consistent difference (treatment always less than ~1/8 of the control), there is very large within-group variance, and this results in more shrinkage for the LFC of this particular gene. Are those normalized counts? Also is the model ~ tx or ~ subj + tx?

**lmolokin** · 02-19-2015, 01:50 PM

These are just raw counts. Is there a way pull the normalized counts?

Model includes subj to account for subj-to-subj variability.

The high degree of shrinkage makes more sense to me after looking at these counts again.

**Michael Love** · 02-19-2015, 02:09 PM

counts(dds, normalized=TRUE) will give normalized counts.

But then I might recommend using betaPrior=FALSE here, justified by the reasoning that there are very large expected fold changes for individual genes, but not so many large fold changes that the width of the prior adjusts to allow such large fold changes. The shrinking of fold changes requires that the software can estimate the range of reasonable values for LFC by looking at the distribution of LFCs (particularly the upper quantile of the absolute LFC). But there might be precise fold changes which are above this upper quantile, and so the prior is too narrow for the targeted genes. The prior might then be a bad assumption for this dataset, so I think it's reasonable to turn it off here. The inference works just as well without the prior included.

**lmolokin** · 02-20-2015, 09:32 AM

Makes sense. Thanks Michael, always a huge help!

Topics	Statistics	Last Post
A New Method Makes Hantavirus Genome Analysis Faster and More Accessible by SEQadmin2 Started by SEQadmin2, Yesterday, 10:09 AM	0 responses 10 views 0 reactions	Last Post by SEQadmin2 Yesterday, 10:09 AM
A New Single-Cell Method Maps DNA-Protein Interactions by SEQadmin2 Started by SEQadmin2, 06-04-2026, 08:59 AM	0 responses 20 views 0 reactions	Last Post by SEQadmin2 06-04-2026, 08:59 AM
Long-Read RNA Sequencing Uncovers a Hidden Layer of Immune Cell Regulation by SEQadmin2 Started by SEQadmin2, 06-02-2026, 12:03 PM	0 responses 27 views 0 reactions	Last Post by SEQadmin2 06-02-2026, 12:03 PM
DNA Methylation Study Reveals How Epigenetic Changes Pass Between Generations by SEQadmin2 Started by SEQadmin2, 06-02-2026, 11:40 AM	0 responses 22 views 0 reactions	Last Post by SEQadmin2 06-02-2026, 11:40 AM

Unconfigured Ad

DESeq2 log2 fold change discrepancy

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News