Seqanswers Leaderboard Ad

**sindrle** · 01-30-2014, 11:14 AM

MDSplot-edgeR-no-outleiers.pdf

And here it is without..

Looks much better, next step is to see if they affect my result much.

But what will decide if I keep then or not?

**dpryan** · 01-30-2014, 11:32 AM

That has got to be the most outlying outlier I've ever seen

Perhaps that sample is from the wrong organ (our core facility has swapped samples before...which became very very obvious when I made the equivalent graph). Alternatively, what's the size factor on that sample? One sample having vastly fewer reads can also cause this sort of thing.

There are algorithms to detect outliers, but if a sample doesn't stick out like a sore thumb on a graph like this then it should be kept in.

**sindrle** · 01-30-2014, 01:27 PM

Haha!
Heres some comparisons...

With outlier: D253-B2
Variation: 49.21% BCV
-1 232
0 22578
1 905

Without outliers: D253-B2
Variation: 49.99% BCV
[,1]
-1 120
0 23038
1 557

Conclusion: With outlier gives twice as many hits, this is worrisome...

Here the same thing with the world record outlier N270-B2

With outliers: N208-B1 & N270-B2
Variation: 48.87 BCV
-1 399
0 22131
1 1185

Without outliers: N208-B1 & N270-B2
Variation: 49,78% BCV
-1 147
0 22800
1 768

About the same result...

Should I exclude all three? Or run some more test and maybe keep everyone, except the insane one?

**dpryan** · 01-31-2014, 01:54 AM

You might consider using the SVA package on your dataset just too see if it detects an obvious background variable to control. I'm not surprised that removing D253-B2 gives fewer hits, that one sample was driving a lot of the results (I've never been that happy with how edgeR deals with outliers, which is why I usually use DESeq2). So, I wouldn't be too worried by that.

For the other two samples, I suspect that sva will tell you that their variation is due to component that can be compensated for.

**sindrle** · 01-31-2014, 06:33 AM

Thank you very much for your response, very good as always.

Let me just hack my own thread here.. How do you set the "prior.df" option ?

d <- estimateGLMTagwiseDisp(d, design, prior.df = ???)

**dpryan** · 01-31-2014, 06:47 AM

I would have to search for that in the Bioconductor list (it's come up a few times, but I don't recall the answer since I rarely use edgeR). There's actually a way to avoid the issue, which is to use glmQLFTest(), which doesn't require that you estimate the tagwise dispersion (it ends up calling routines in limma that estimate the prior df). I've never actually tried that, but it should produce more conservative results.

**sindrle** · 01-31-2014, 06:53 AM

I have search to death...

And read every thread popping up. But its different from the old vs new version. And also Gordon Smyth says: prior.df = G_0 * df.residual

df.residuals = my libraries - GLM coeffisients

But what is G_0??

In old version n.prior = your libraries, which would help the calculation. But this option I cannot find anymore..

Im running edgeR, DESeq2 and Cuffdiff 2.1.1 (soon also DEXseq).

I don't want to exclude anyone, since everyone has their strength and weaknesses. How come you only rely on DESeq2?

**dpryan** · 01-31-2014, 06:56 AM

It's consistently produced the most reliable results for my datasets. Cuffdiff can rarely handle my experimental designs, so it's not even in the running.

**sindrle** · 01-31-2014, 07:17 AM

Im trying the glmQLFTest, thanks.

How did you find out why DEseq2 was the most consistent?

**dpryan** · 01-31-2014, 07:21 AM

I don't really know why it's turned out to give a bit more reliable results, though it tends to deal with outlier values pretty well (it'll flag and ignore such genes by default, though sometimes you need to disable this). We've done enough qPCR validations of additional samples to give me some comfort in that. Of course that's for the datasets that I work on, YMMV!

**sindrle** · 01-31-2014, 08:05 AM

glmQLFTest does not work...

Error:

"Error in quantile.default(zresid, prob = prob) :
missing values and NaN's not allowed if 'na.rm' is FALSE
In addition: Warning message:
In fitFDistRobustly(var, df1 = df, covariate = covariate, winsor.tail.p = winsor.tail.p) :
small x values have been offset away from zero"

Ive tried remove NAs:

# d$counts[is.na(d$counts)] <- 0
# apply(d$counts,2,function(x) sum(is.na(x)))

Did not work..

Ive read an answer here, but I don't have the development version...

[BioC] edgeR Quasi-likelihood with tagwise dispersion?

https://stat.ethz.ch/pipermail/bioconductor/2013-March/051549.html

*Will soon uninstall edgeR because of annoyed*

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Today, 11:49 AM	0 responses 12 views 0 likes	Last Post by seqadmin Today, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Yesterday, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin Yesterday, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

MDS-plot diagnostics

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News