Seqanswers Leaderboard Ad

**Simon Anders** · 01-27-2014, 11:07 AM

Originally posted by sindrle View Post

"
You run DESeq2, you pick out 10 genes you want to look at including p values.
Say 6 genes have p < 0.05.
You then use p.adjust in R.
What FDR do you choose and why?
Which n do you set?

If you pick the ten genes a priori, i.e., in a manner that is independent of the the outcome, the you can run p.adjust only on the p values from these 10 genes.

By a choice "a priori", I mean that you knew before doing the analysis that these genes are worth looking at and others are not. If, however, you have chosen these ten genes precisely because their expression data in this very experiment looked so interesting that you want them to be in your result list, then you need to run p.adjust on all genes.

In the former case, you only wanted to look at these genes, so your test only has to reject the null hypothesis that precisely these genes seem to have a signal that looks interesting but arose only due to chance. In the latter case, you have to reject the null hypothesis that somewhere in your data with its many genes, some of which will show strong signals merely due to chance fluctuations, there will be ten genes, which look so far out as to appear interesting. As this is much more likely to happen if it may be any 10 genes rather than a fixed set of 10 genes, therefore the signal has to be stronger to convince us that it is not mere chance. Hence the more stringent multiple-testing adjustment.

**dpryan** · 01-27-2014, 11:09 AM

Originally posted by sindrle View Post

Do you know how to do this in edgeR?

See the "genefilter" package for some useful functions.

**Simon Anders** · 01-27-2014, 11:11 AM

Originally posted by rskr View Post

I don't think FDR is very important for RNA-seq.

Sorry, I cannot let this stand like this, as it might be misunderstood to mean that accounting for multiple hypothesis testing is optional in RNA-Seq data analysis. Of course, you always need to account for multiple hypothesis testing when you test many hypothesis (here: many genes).

On the flip side, you could get a fabulous FDR, by simply not sequencing very much.

Um, no, you don't. Why should you?

**rskr** · 01-27-2014, 11:15 AM

Originally posted by dpryan View Post

@rskr: Given that this is in the context of DESeq2 (I realize that the thread is titled with edgeR...), low-count genes are automatically dropped and power maximized (I have to admit that it's handy to not have to do this myself anymore). So, the low-coverage genes screwing the p-values critique doesn't apply.

Low coverage genes can still be significant, just not at the same rate as the higher coverage genes, though it may be possible to filter out certain genes which have zero chance of being significant, however the power to tell depends on the proportion as well as the coverage, so as I said FDR isn't so important.

**Simon Anders** · 01-27-2014, 11:17 AM

Originally posted by rskr View Post

[...], so as I said FDR isn't so important.

So, what do you suggest to do instead?

**raphael123** · 01-27-2014, 11:25 AM

p value is just a widely used joke. The signification of p value is hard to get and imply assumptions that lot of people don t know.
FDR is just a bigger joke. Your best pvalue will most of the time be multiply by your number of p value.
So if you have 10 genes to test giving you 10 pvalues, the best is multiply by 10, the second best by 5, then by 3.3333, then by 2.5, then by 2 etc .....

Here is the code in R

Code:

# produce a vector of FDR with an ordered pval vector
fdr = function(pval){
 size=length(pval)
 if(size<2) return(pval)

 #the worst pval is multiply by (size) / (size-1)
 FDR=c( min( 1 , pval[size]*(size)/(size-1)   ))

 for( i in 1:(size-1)) FDR=c(FDR,min(FDR[i] , pval[size-i]*(size)/(size-i)))

 # We have to revers the vector to be consistant
 return(rev(FDR))
}

**rskr** · 01-27-2014, 12:39 PM

Originally posted by Simon Anders View Post

So, what do you suggest to do instead?

I don't know for sure. I first noticed the problem doing some meta-analysis hypothesis testing on coverage merging p-values of bases with each base having a different statistical power. Using Fishers meta-analysis procedure, it became obvious that the underpowered bases were dominating the the test and that Fishers test assumed all published results were adequately powered. It would be nice if that theory were also better. I came up with a heuristic involving weighted sums of -log(p-values) and information entropy as degrees of freedom, which has a certain appeal to it.

**sindrle** · 01-27-2014, 01:09 PM

Originally posted by dpryan View Post

See the "genefilter" package for some useful functions.

Interesting, thanks!
Another question, excuse my ignorance.. But look at these codes:

> FDR <- p.adjust(lrt$table$PValue, method="BH")
> sum(FDR < 0.05)

Is this the way to choose FDR < 0.1:

> FDR <- p.adjust(lrt$table$PValue, method="BH")
> sum(FDR < 0.1)

**csmatyi** · 02-07-2014, 09:58 AM

Originally posted by swbarnes2 View Post

I feel that this is an appropriate contribution:

http://xkcd.com/882/

oh it's so appropriate

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 30 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 32 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News