SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
edgeR ECHo Bioinformatics 13 04-18-2013 01:01 PM
Why edgeR ouput some p values larger than 1 idyll_ty RNA Sequencing 1 02-01-2013 03:14 PM
DESeq and edgeR papori Bioinformatics 3 05-15-2012 06:29 PM
edgeR separate offset values/normalizations for each tag sruddy RNA Sequencing 1 07-08-2011 03:59 PM
edgeR Puva Bioinformatics 2 05-19-2011 09:04 AM

Reply
 
Thread Tools
Old 03-14-2012, 05:07 AM   #1
pbrand
Member
 
Location: Bochum, Germany

Join Date: Feb 2012
Posts: 13
Default help: adjusted p-values in edgeR

Hi everybody,

I am relatively new to RNA-Seq and DE statistics. We have chosen to use edgeR for DE analysis and I understood most functions so far or I could answer them via searching the literature. But now I have a problem in understanding the adjustment for multiple testing.

My questions:
1. Do I have to adjust? Are the not-adjusted p-values useless (In case I have some significant differences in DE when not-adjusted but none after adjustment)?

2. Which adjustment method is the best and why?

Thanks in advance,
philipp
pbrand is offline   Reply With Quote
Old 03-14-2012, 05:41 AM   #2
chadn737
Senior Member
 
Location: US

Join Date: Jan 2009
Posts: 392
Default

You should use the adjusted p-values. This controls for false positives that you get when making multiple comparisons.

EdgeR uses as its default the Benjamini-Hochberg procedure. This is also the default for other programs like DESeq. Other methods are more conservative. I would just stick with Benjamini-Hochberg.

Last edited by chadn737; 03-14-2012 at 05:46 AM.
chadn737 is offline   Reply With Quote
Old 03-14-2012, 05:47 AM   #3
pbrand
Member
 
Location: Bochum, Germany

Join Date: Feb 2012
Posts: 13
Default

Quote:
Originally Posted by chadn737 View Post
You should use the adjusted p-values.
So I can consider them as 'truly' differentially expressed?

Can I say that I do not have DE, if all not-adjusted p-values which are significant show no significance anymore after adjustment?

Last edited by pbrand; 03-14-2012 at 05:53 AM.
pbrand is offline   Reply With Quote
Old 03-14-2012, 05:50 AM   #4
chadn737
Senior Member
 
Location: US

Join Date: Jan 2009
Posts: 392
Default

Quote:
Originally Posted by pbrand View Post
So I can consider them as 'truly' different expressed?

Can I say that I do not have DE, if all not-adjusted p-values which are significant show no significance anymore after adjustment?
First I should ask how many replicates you have.
chadn737 is offline   Reply With Quote
Old 03-14-2012, 05:54 AM   #5
pbrand
Member
 
Location: Bochum, Germany

Join Date: Feb 2012
Posts: 13
Default

because of our low budget, I have just two samples with pooled RNA of 40 specimen each.
pbrand is offline   Reply With Quote
Old 03-14-2012, 05:57 AM   #6
chadn737
Senior Member
 
Location: US

Join Date: Jan 2009
Posts: 392
Default

And you are trying to find DE between these two samples? I don't think there is much you can say about differential expression without replicates.
chadn737 is offline   Reply With Quote
Old 03-14-2012, 06:13 AM   #7
pbrand
Member
 
Location: Bochum, Germany

Join Date: Feb 2012
Posts: 13
Default

Quote:
Originally Posted by chadn737 View Post
And you are trying to find DE between these two samples? I don't think there is much you can say about differential expression without replicates.
Unfortunately, yes. But what we can do is trying to find out possible candidate DE - genes for realtime PCR. Therefore I wanted to follow the suggested way to estimate DE via estimated dispersion.

Hope that some money occurs if the results look promising to get replicates..
pbrand is offline   Reply With Quote
Old 03-14-2012, 06:25 AM   #8
chadn737
Senior Member
 
Location: US

Join Date: Jan 2009
Posts: 392
Default

Quote:
Originally Posted by pbrand View Post
Unfortunately, yes. But what we can do is trying to find out possible candidate DE - genes for realtime PCR. Therefore I wanted to follow the suggested way to estimate DE via estimated dispersion.

Hope that some money occurs if the results look promising to get replicates..
If you are using the data as an initial screen to get candidates that you will verify by qRT-PCR, then its fine to use the unadjusted p-values.

The qRT-PCR will either confirm these results or not. But for publication of the RNA-seq, you really do need replicates and adjusted p-values.
chadn737 is offline   Reply With Quote
Old 03-14-2012, 06:28 AM   #9
pbrand
Member
 
Location: Bochum, Germany

Join Date: Feb 2012
Posts: 13
Default

Thanks a lot for your answers!
Philipp
pbrand is offline   Reply With Quote
Old 08-06-2014, 10:54 PM   #10
younko
Member
 
Location: USA

Join Date: May 2014
Posts: 24
Default

Hello chadn737

I also have similar problems.
I do not have a replicate. but I have several patients having same disease.

For example, in my case, I have 10 patients having same disease.
We treated the drug and did RNA-seq for before-drug/after-drug.

What we want to do is to find the differentially expressed genes reponsing to the drug. Instead of replications, we have 10 patients samples.. for pre/post.

I used edgeR and DESeq for this.. but I could not find the any DEG with adjusted p value .. So I am thinking to look at the p value instead of adjusted pvalue by considering the fact that we can have FP... of course...

Would it be okay?
younko is offline   Reply With Quote
Old 08-06-2014, 11:19 PM   #11
mikep
Member
 
Location: Singapore

Join Date: Feb 2011
Posts: 45
Default

Actually, what you describe are replicates. It just so happens that the biological variability is overwhelming your treatment variability.

Did you try a paired analysis?

As mentioned above, if you are going to validate candidate genes by some other method then the pvalue doesn't matter as much (though you'll waste time wading through a bunch of false positives). If you want to publish it standalone, it has to pass the correction.
mikep is offline   Reply With Quote
Old 08-07-2014, 05:40 AM   #12
mbblack
Senior Member
 
Location: Research Triangle Park, NC

Join Date: Aug 2009
Posts: 245
Default

Quote:
Originally Posted by younko View Post
Hello chadn737

I also have similar problems.
I do not have a replicate. but I have several patients having same disease.

For example, in my case, I have 10 patients having same disease.
We treated the drug and did RNA-seq for before-drug/after-drug.

What we want to do is to find the differentially expressed genes reponsing to the drug. Instead of replications, we have 10 patients samples.. for pre/post.

I used edgeR and DESeq for this.. but I could not find the any DEG with adjusted p value .. So I am thinking to look at the p value instead of adjusted pvalue by considering the fact that we can have FP... of course...

Would it be okay?
As mikep said, your biological variation between patients is likely the problem, obscuring your treatment affect.

You could try simple pairwise, patient by patient, comparisons using some of the available non-parametric methods, like a Rank Product analysis (RANKPROD in R, for example uses permutation tests in the absence of replicates), the R tool GFOLD (which uses bayesion posterior prediction of fold change to calculate p-values and does not require replicates).
__________________
Michael Black, Ph.D.
ScitoVation LLC. RTP, N.C.
mbblack is offline   Reply With Quote
Old 08-07-2014, 06:48 PM   #13
younko
Member
 
Location: USA

Join Date: May 2014
Posts: 24
Default

Thank you for your response.(mikep and mbblack)!!

Q1 : So, are you recommending the "Wilcoxon signed-rank test " instead of GLM model from edgeR or DESeq?? This would not consider the variation like GLM anova analysis...... Do you think is it okay?

Actually, I made the glm model with EdgeR.

design <- model.matrix(~drug)


Quote:
If you want to publish it standalone, it has to pass the correction.
Q2 : As I mentioned, if I don't have any DEG with correted pvalue, then, I cannot publish the result??? Or can I validate in anyway with DEG filtered with pvalue instead of adjusted p-value for publication?
younko is offline   Reply With Quote
Old 08-07-2014, 07:19 PM   #14
mikep
Member
 
Location: Singapore

Join Date: Feb 2011
Posts: 45
Default

Well I'm a fan of parametric over parametric, what I was suggesting is a better design in which you take the paired nature of the data into account

... my R is crap but something like design <- model.matrix(~drug+patientID). The paired analysis may well go a long way to removing the biological variability.

re Q2: correct, if the pvalues are not significant after correction you cannot publish alone. if you have some kind of orthagonal validation like qPCR then you can.
mikep is offline   Reply With Quote
Old 08-08-2014, 04:16 AM   #15
mbblack
Senior Member
 
Location: Research Triangle Park, NC

Join Date: Aug 2009
Posts: 245
Default

If you take a look at the GFOLD publication, for example, and the GFOLD documentation in BioConductor, it is more sophisticated than a simple Wilcoxon test. GFOLD represents one effort to deal with the (admittedly undesirable) situtation of not have any biological replicates, yet still attempt to maintain some rigor to how differentially expressed genes are selected.

People publish differential gene expression results based on nothing but fold change, but be prepared to fully justify why you did not include replication into your study design (e.g. sometimes, clinical studies, or any study using very difficult to obtain and/or highly limited samples are simply unable to include biological replicates). You would also be best to validate at least some of the genes, at least those critical to your final interpretation or conclusions, or you'll find you are required to anyway to get past peer review.

Given that there are published methods for making the most, statistically, from no or very few replicates, you would also be best to explore them, as a reviewer familiar with them will point them out to you if you do not.
__________________
Michael Black, Ph.D.
ScitoVation LLC. RTP, N.C.
mbblack is offline   Reply With Quote
Old 08-17-2014, 11:45 PM   #16
younko
Member
 
Location: USA

Join Date: May 2014
Posts: 24
Default

Dear mikep and mbblack

Thank you so much for your comments.
I have one more quick question. I tried to analyze my data to find DEG in two condition. I have used edgeR, DESeq, DESeq2, cuffdiff or just simple wilconxon rank sum test etc.
However, I could not find any DEG with adjusted p value in any of these methods..
TOO UPSET

There are two hypothesis.
1) Hypothesis 1

Maybe, our experimental design is wrong so we could not detect any significant gene at all.. That is the reason we could not identify any DEG gene.. (End of story..) we could not do anything !!!
( I hope it is not !! cause there should be some genes responsing our drug)

2) Hypothesis 2

In some reason(small sample, so little statistical power), we could not find statistically significant DEG genes with adjusted p-value. (Still we could find some DEGs with p value) even though there exist DEG genes responsing the drug.

==============================================

In this case, what option I can choose?
I could not just throw away our data. I would like to find meaningful output........

So, I am thinking that (based on your suggestion)

Just use the pvalue to detect DEG genes and do any additional analysis with these DEG gene based on pvalue.. and then if we filter some more meaningful DEG among previously detected DEG genes,we can validate with experimental validation. (qPCR etc I am not sure how it would be feasible because of lack of money! ) ...
Or is there any other way we can claim our result without experimental validation??? Any suggestion??

Or Experiemnental validation is the only way I can do??
younko is offline   Reply With Quote
Old 08-18-2014, 01:12 AM   #17
kopi-o
Senior Member
 
Location: Stockholm, Sweden

Join Date: Feb 2008
Posts: 319
Default

Did you try the design <- model.matrix(~drug+patientID) mentioned by mikep?

In many (perhaps most) of the analyses I have done of patient RNA-seq data, there are no differentially expressed genes found if you only look at the treatment/condition without telling the software to perform a paired analysis (specifying the patient ID in the design formula is one way of doing so). The reason being, as mbblack and mikep have already said, that the biological variation between individuals is obscuring the treatment effect.
kopi-o is offline   Reply With Quote
Old 08-18-2014, 01:22 AM   #18
mikep
Member
 
Location: Singapore

Join Date: Feb 2011
Posts: 45
Default

That is unfortunate. You are however correct, further validation is absolutely required.

I have a question, how many genes pass uncorrected pv 0.05 and what is the range of the fold change you see?

EDIT: good point kopi-o, I assumed the "two condition" comment meant 2 factor, but it may not.

Last edited by mikep; 08-18-2014 at 01:26 AM.
mikep is offline   Reply With Quote
Old 08-18-2014, 07:05 PM   #19
younko
Member
 
Location: USA

Join Date: May 2014
Posts: 24
Default

Dear Kopi-o and mikep

Thank you for your comments!!
Here is the specific analysis result I have done!

I have 12 samples from 6 patients with pre- and post- drug treatment.

For my data format

Quote:
patient drug
KYM_pre 1 pre
KYM_post 1 post
GDY_pre 2 pre
GDY_post 2 post
HKS_pre 3 pre
HKS_post 3 post
KWK_pre 4 pre
KWK_post 4 post
PYM_pre 5 pre
PYM_post 5 post
SHM_pre 6 pre
SHM_post 6 post
------------------------------------------------------------------------------
For EdgeR

design <- model.matrix(~patient+drug)
data_e <- DGEList(counts=f_data)
data_e <- calcNormFactors(data_e)
data_e <- estimateGLMCommonDisp(data_e, design)
data_e <- estimateGLMTrendedDisp(data_e, design)
data_e <- estimateGLMTagwiseDisp(data_e, design)
## Fit the model, testing the coefficient for the treated vs untreated comparison
data_efit <- glmFit(data_e, design)
data_efit <- glmLRT(data_efit, coef="drugpost")
topTags(data_efit)

In this EdgeR analysis I got still only 125(16) DEG based on p value threshold 0.05 (pvalue threshold < 0.01) (no DEG based on adjusted p value)

==============================================

For DESeq

---------------------------------------------------------------------

meta <- data.frame(row.names=colnames(data), patient = patient, drug = drug)
d_fa <- newCountDataSet(data, meta)
d_fa <- estimateSizeFactors(d_fa)


d_fa <- estimateDispersions(d_fa, method="blind", sharingMode="fit-only")
(Is it okay??)
dh_fit1 = fitNbinomGLMs(d_fa, count ~ patient + enzyme)
dh_fit0 = fitNbinomGLMs(d_fa, count ~ patient)
str(dh_fit1)
dh_pvalsGLM = nbinomGLMTest( dh_fit1, dh_fit0 )
dh_padjGLM = p.adjust( dh_pvalsGLM, method="BH" )


dh_dtable <- transform(dh_fit1, pval=dh_pvalsGLM, padj=dh_padjGLM)
dh_dtable <- dh_dtable[order(dh_dtable$pval), ]

In this DESeq analysis , I only got 52(8) DEG gene based on p value threshold 0.05(pvalue < 0.01) (No DEG based on adjusted p values)..

And among those 125(16), 52(8) DEG genes from EdgeR, DESeq with p value threshold 0.05 (0.01 respectively) ..

In adddition, about the FC ratio, I got range of logFC value -2.7 to 2.7 for EdgeR.
DeSEq does not return FC ratio... so I am not sure how I can report FC value..


I assume that edgeR model and DEseq model should test the same thing... e.g. effect of drug considering the patient effect (paired analysis..) but it seems that reported DEG gene list looks halfly overlapped... though..

Anyhow, what do you think about this result?? Does it make sense??
younko is offline   Reply With Quote
Reply

Tags
differential expression, edger, rna-seq

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:20 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO