Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
MDSplot-edgeR-no-outleiers.pdf
And here it is without..
Looks much better, next step is to see if they affect my result much.
But what will decide if I keep then or not?
-
That has got to be the most outlying outlier I've ever seen
Perhaps that sample is from the wrong organ (our core facility has swapped samples before...which became very very obvious when I made the equivalent graph). Alternatively, what's the size factor on that sample? One sample having vastly fewer reads can also cause this sort of thing.
There are algorithms to detect outliers, but if a sample doesn't stick out like a sore thumb on a graph like this then it should be kept in.
Comment
-
Haha!
Heres some comparisons...
With outlier: D253-B2
Variation: 49.21% BCV
-1 232
0 22578
1 905
Without outliers: D253-B2
Variation: 49.99% BCV
[,1]
-1 120
0 23038
1 557
Conclusion: With outlier gives twice as many hits, this is worrisome...
Here the same thing with the world record outlier N270-B2
With outliers: N208-B1 & N270-B2
Variation: 48.87 BCV
-1 399
0 22131
1 1185
Without outliers: N208-B1 & N270-B2
Variation: 49,78% BCV
-1 147
0 22800
1 768
About the same result...
Should I exclude all three? Or run some more test and maybe keep everyone, except the insane one?
Comment
-
You might consider using the SVA package on your dataset just too see if it detects an obvious background variable to control. I'm not surprised that removing D253-B2 gives fewer hits, that one sample was driving a lot of the results (I've never been that happy with how edgeR deals with outliers, which is why I usually use DESeq2). So, I wouldn't be too worried by that.
For the other two samples, I suspect that sva will tell you that their variation is due to component that can be compensated for.
Comment
-
I would have to search for that in the Bioconductor list (it's come up a few times, but I don't recall the answer since I rarely use edgeR). There's actually a way to avoid the issue, which is to use glmQLFTest(), which doesn't require that you estimate the tagwise dispersion (it ends up calling routines in limma that estimate the prior df). I've never actually tried that, but it should produce more conservative results.
Comment
-
I have search to death...
And read every thread popping up. But its different from the old vs new version. And also Gordon Smyth says: prior.df = G_0 * df.residual
df.residuals = my libraries - GLM coeffisients
But what is G_0??
In old version n.prior = your libraries, which would help the calculation. But this option I cannot find anymore..
Im running edgeR, DESeq2 and Cuffdiff 2.1.1 (soon also DEXseq).
I don't want to exclude anyone, since everyone has their strength and weaknesses. How come you only rely on DESeq2?
Comment
-
I don't really know why it's turned out to give a bit more reliable results, though it tends to deal with outlier values pretty well (it'll flag and ignore such genes by default, though sometimes you need to disable this). We've done enough qPCR validations of additional samples to give me some comfort in that. Of course that's for the datasets that I work on, YMMV!
Comment
-
glmQLFTest does not work...
Error:
"Error in quantile.default(zresid, prob = prob) :
missing values and NaN's not allowed if 'na.rm' is FALSE
In addition: Warning message:
In fitFDistRobustly(var, df1 = df, covariate = covariate, winsor.tail.p = winsor.tail.p) :
small x values have been offset away from zero"
Ive tried remove NAs:
# d$counts[is.na(d$counts)] <- 0
# apply(d$counts,2,function(x) sum(is.na(x)))
Did not work..
Ive read an answer here, but I don't have the development version...
*Will soon uninstall edgeR because of annoyed*
Comment
Latest Articles
Collapse
-
by seqadmin
The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...-
Channel: Articles
04-22-2024, 07:01 AM -
-
by seqadmin
Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...-
Channel: Articles
04-04-2024, 04:25 PM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, Today, 11:49 AM
|
0 responses
12 views
0 likes
|
Last Post
by seqadmin
Today, 11:49 AM
|
||
Started by seqadmin, Yesterday, 08:47 AM
|
0 responses
16 views
0 likes
|
Last Post
by seqadmin
Yesterday, 08:47 AM
|
||
Started by seqadmin, 04-11-2024, 12:08 PM
|
0 responses
61 views
0 likes
|
Last Post
by seqadmin
04-11-2024, 12:08 PM
|
||
Started by seqadmin, 04-10-2024, 10:19 PM
|
0 responses
60 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 10:19 PM
|
Comment