SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
edgeR MDS plot: any way to get genes contributing to axes out? seeker RNA Sequencing 4 01-12-2015 05:16 AM
PCA / MDS analysis useful if only small number of genes (e.g. 1-20) diff. expressed? Alex234 RNA Sequencing 3 12-10-2013 12:59 PM
cummeRbund MDS/PCA gene list jake13 Bioinformatics 1 08-29-2013 11:39 PM
PCA vs MDS plots hchang10 Bioinformatics 0 06-23-2013 02:01 AM
next-gen sequencing in diagnostics brjordan General 0 04-07-2009 05:08 AM

Reply
 
Thread Tools
Old 01-30-2014, 08:41 AM   #1
sindrle
Senior Member
 
Location: Norway

Join Date: Aug 2013
Posts: 266
Default MDS-plot diagnostics

MDSplot-edgeR.pdf

So, this really does not look good...
Suggestions on whats going on here?
sindrle is offline   Reply With Quote
Old 01-30-2014, 10:14 AM   #2
sindrle
Senior Member
 
Location: Norway

Join Date: Aug 2013
Posts: 266
Default

MDSplot-edgeR-no-outleiers.pdf

And here it is without..

Looks much better, next step is to see if they affect my result much.


But what will decide if I keep then or not?
sindrle is offline   Reply With Quote
Old 01-30-2014, 10:32 AM   #3
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

That has got to be the most outlying outlier I've ever seen

Perhaps that sample is from the wrong organ (our core facility has swapped samples before...which became very very obvious when I made the equivalent graph). Alternatively, what's the size factor on that sample? One sample having vastly fewer reads can also cause this sort of thing.

There are algorithms to detect outliers, but if a sample doesn't stick out like a sore thumb on a graph like this then it should be kept in.
dpryan is offline   Reply With Quote
Old 01-30-2014, 12:27 PM   #4
sindrle
Senior Member
 
Location: Norway

Join Date: Aug 2013
Posts: 266
Default

Haha!
Heres some comparisons...

With outlier: D253-B2
Variation: 49.21% BCV
-1 232
0 22578
1 905

Without outliers: D253-B2
Variation: 49.99% BCV
[,1]
-1 120
0 23038
1 557

Conclusion: With outlier gives twice as many hits, this is worrisome...

Here the same thing with the world record outlier N270-B2

With outliers: N208-B1 & N270-B2
Variation: 48.87 BCV
-1 399
0 22131
1 1185

Without outliers: N208-B1 & N270-B2
Variation: 49,78% BCV
-1 147
0 22800
1 768

About the same result...


Should I exclude all three? Or run some more test and maybe keep everyone, except the insane one?
sindrle is offline   Reply With Quote
Old 01-31-2014, 12:54 AM   #5
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

You might consider using the SVA package on your dataset just too see if it detects an obvious background variable to control. I'm not surprised that removing D253-B2 gives fewer hits, that one sample was driving a lot of the results (I've never been that happy with how edgeR deals with outliers, which is why I usually use DESeq2). So, I wouldn't be too worried by that.

For the other two samples, I suspect that sva will tell you that their variation is due to component that can be compensated for.
dpryan is offline   Reply With Quote
Old 01-31-2014, 05:33 AM   #6
sindrle
Senior Member
 
Location: Norway

Join Date: Aug 2013
Posts: 266
Default

Thank you very much for your response, very good as always.

Let me just hack my own thread here.. How do you set the "prior.df" option ?

d <- estimateGLMTagwiseDisp(d, design, prior.df = ???)
sindrle is offline   Reply With Quote
Old 01-31-2014, 05:47 AM   #7
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

I would have to search for that in the Bioconductor list (it's come up a few times, but I don't recall the answer since I rarely use edgeR). There's actually a way to avoid the issue, which is to use glmQLFTest(), which doesn't require that you estimate the tagwise dispersion (it ends up calling routines in limma that estimate the prior df). I've never actually tried that, but it should produce more conservative results.
dpryan is offline   Reply With Quote
Old 01-31-2014, 05:53 AM   #8
sindrle
Senior Member
 
Location: Norway

Join Date: Aug 2013
Posts: 266
Default

I have search to death...

And read every thread popping up. But its different from the old vs new version. And also Gordon Smyth says: prior.df = G_0 * df.residual

df.residuals = my libraries - GLM coeffisients

But what is G_0??

In old version n.prior = your libraries, which would help the calculation. But this option I cannot find anymore..

Im running edgeR, DESeq2 and Cuffdiff 2.1.1 (soon also DEXseq).

I don't want to exclude anyone, since everyone has their strength and weaknesses. How come you only rely on DESeq2?
sindrle is offline   Reply With Quote
Old 01-31-2014, 05:56 AM   #9
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

It's consistently produced the most reliable results for my datasets. Cuffdiff can rarely handle my experimental designs, so it's not even in the running.
dpryan is offline   Reply With Quote
Old 01-31-2014, 06:17 AM   #10
sindrle
Senior Member
 
Location: Norway

Join Date: Aug 2013
Posts: 266
Default

Im trying the glmQLFTest, thanks.

How did you find out why DEseq2 was the most consistent?
sindrle is offline   Reply With Quote
Old 01-31-2014, 06:21 AM   #11
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

I don't really know why it's turned out to give a bit more reliable results, though it tends to deal with outlier values pretty well (it'll flag and ignore such genes by default, though sometimes you need to disable this). We've done enough qPCR validations of additional samples to give me some comfort in that. Of course that's for the datasets that I work on, YMMV!
dpryan is offline   Reply With Quote
Old 01-31-2014, 07:05 AM   #12
sindrle
Senior Member
 
Location: Norway

Join Date: Aug 2013
Posts: 266
Default

glmQLFTest does not work...

Error:

"Error in quantile.default(zresid, prob = prob) :
missing values and NaN's not allowed if 'na.rm' is FALSE
In addition: Warning message:
In fitFDistRobustly(var, df1 = df, covariate = covariate, winsor.tail.p = winsor.tail.p) :
small x values have been offset away from zero"

Ive tried remove NAs:

# d$counts[is.na(d$counts)] <- 0
# apply(d$counts,2,function(x) sum(is.na(x)))

Did not work..

Ive read an answer here, but I don't have the development version...

https://stat.ethz.ch/pipermail/bioco...ch/051549.html

*Will soon uninstall edgeR because of annoyed*
sindrle is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 09:40 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO