SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
DESeq: GLMs for RNA-Seq with interaction terms EMeyer General 2 07-21-2014 02:10 PM
Nested Model in DESeq SMcTaggart Bioinformatics 11 04-15-2014 02:38 AM
How to assign values to size factor (DESeq) kentnf Bioinformatics 4 01-14-2014 11:48 PM
Determining clinical significance of variants Rocketknight Bioinformatics 2 09-28-2012 08:54 AM
PubMed: A signal-noise model for significance analysis of ChIP-seq with negative cont Newsbot! Literature Watch 0 05-09-2010 07:00 PM

Reply
 
Thread Tools
Old 02-11-2013, 05:24 AM   #1
DJParker
Junior Member
 
Location: UK

Join Date: Jan 2012
Posts: 7
Question DESeq Multi-factor designs: determining the significance of model terms

Hello,

I have performed a two-factor design RNA seq experiment (treatment and sex) and would like to know if these factors have a significant effect on gene expression.

To do this I followed the instructions as in the DESeq manual and fitted the following models:

fit0 <- fitNbinomGLMs( cds, count ~ treatment)
fit1 <- fitNbinomGLMs( cds, count ~ treatment + sex)
fit2 <- fitNbinomGLMs( cds, count ~ treatment + sex + treatment * sex)

Is it possible to determine which of these models fits the data best?

Typically when fitting glms I would use AIC to determine model term significance (i.e. does fitting an interaction term significantly improve the fit of the model?). Is there any way to do this is DESeq?

Many thanks,

Darren
DJParker is offline   Reply With Quote
Old 02-17-2013, 12:27 AM   #2
Wolfgang Huber
Senior Member
 
Location: Heidelberg, Germany

Join Date: Aug 2009
Posts: 109
Default

Dear Darren,

it seems you have successfully mastered the mechanics of DESeq, but from the wording, perhaps it would be good to get an update on statistical theory, namely the topics 'hypothesis testing' and 'model selection' and their (different) associated concepts and tools. People often get confused between these two. It seems that what you actually want are tests against the null hypotheses:
  • no gene is affected by treatment,
  • no gene is affected by sex,
  • no gene is affected by treatment in a sex-dependent manner.
The DESeq vignette explains how to use the function
nbinomGLMTest to perform chi-squared tests on a gene-by-gene manner, which you then need to follow by the suitable multing testing computation to arrive at the experiment-wise test p-value.

Hope this helps
Best wishes
Wolfgang
__________________
Wolfgang Huber
EMBL
Wolfgang Huber is offline   Reply With Quote
Old 07-16-2014, 03:54 PM   #3
raphael123
Member
 
Location: Mc Gill -- Montreal

Join Date: Dec 2013
Posts: 37
Default

Reading that and the DESeq vignette I am still confuse :

Quote:
fit1 = fitNbinomGLMs( cdsFull, count ~ libType + condition )
fit0 = fitNbinomGLMs( cdsFull, count ~ libType )
pvalsGLM = nbinomGLMTest( fit1, fit0 )
Here the p value for a single row (in this case let s say RNA) is testing a null hypothesis.
I can t formulate it, is it the probability that this miRNA counts are explain better by one model than an other ?

Let s say taht RNA12 is significant (after FDR) what can I say ?

Let s say that I want to test for all my RNA if the libType is driving an effect, can I do that ? By doing something like averaging the p values ?
raphael123 is offline   Reply With Quote
Old 07-17-2014, 12:50 AM   #4
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

By definition, a likelihood ratio test compares two models. If RNA12 is significant, then including condition in the model produces a significantly better fit than not including this. In a paper, this is written as, "condition significantly affects the expression of RNA12."

To test for libType, you would compare fit1 to a model of ~condition.
dpryan is offline   Reply With Quote
Old 07-17-2014, 08:08 AM   #5
ecolliso
Junior Member
 
Location: UK

Join Date: Apr 2014
Posts: 1
Default Multi factor DESeq2 help

Dear all,
Looking for some advice on a multi-factor DESeq2 analysis please (Iím new to all this so bear with me!)

The experimental design involves a cross-factorial design in which subjects were exposed to one of three pesticide treatments (Control, Pesticide1, Pesticide2) and challenged with one of three injection treatments (NaÔve=no injection, Buffer, Injection), plus sampled at one of two time points.

The main questions I think I want to address are as follows:
(For each timepoint)
- What genes are differentially expressed as a result of an injection (i.e in Buffer vs NaÔve, Injection vs NaÔve, and Injection vs Buffer) in 1) Control group 2) Pesticide1 group 3) Pesticide2 group?
- What genes are differentially expressed as a result of exposure to either Pesticide1 or Pesticide2 compared to Control-fed in 1)NaÔve 2) Buffer-injected 3) Injected?
- And, ideally, what genes are differentially expressed as a result of injection in a pesticide-dependent manner (ie interaction between Pesticide and Injection)?

Having had a read through the DESeq2 vignette and other posts on here, I think the best way to address this might be to subset my samples (e.g. subset to only control-fed 8 hour samples) and run the DESeq2 differential expression pipeline for each subset, and compare the different levels of the (eg injection) treatment using the contrast arguments to results.

The main thing I wanted to check was if using this subsetting approach, do I need to make any added adjustment of the P-value to account for the fact Iíll be doing multiple tests? As far as I can understand (but believe me, stats is not my strong point) having already adjusted for the FDR, I donít need to make any further adjustment for the multiple testing due to subsetting, but please correct me if Iím wrong.

Or is there an alternative way to test my hypotheses without subsetting? Although I can use a GLM formula to include multiple treatments (and their interactions) (eg ~Pesticide*Injection) , my difficulty if I do this is how to address the contrasts. Using the resultsNames argument this gives me interaction combinations as follows

> resultsNames(dds)
[1] "Intercept"
[2] "Pesticide_Pesticide1_vs_Control"
[3] "Pesticide_Pesticide2_vs_Control"
[4] "Injection_Naive_vs_Injection"
[5] "Injection_Buffer_vs_Injection"
[6] "PesticidePesticide1.InjectionNaive"
[7] "PesticidePesticide2.InjectionNaive"
[8] "PesticidePesticide1.InjectionBuffer"
[9] "PesticidePesticide2.InjectionBuffer"

The first set of contrasts make sense to me as Pesticide ďControlĒ and Injection ďNaÔveĒ were set as the base levels, but could someone clarify what these interaction contrasts mean please. Does this give (for [6]) the genes that are differentially expressed between Pesticide1 vs Control in a manner dependent on whether the injection treatment was NaÔve vs Injection?

Also, for eg [2] this is the genes differentially expressed between Pesticide1 and Control accounting for Injection treatment. By accounting for injection treatment, does this mean that the test accounts for any genes that were either differentially expressed due to injection and/or with pesticide dependent on injection, and only gives those genes that are affected by Pesticide1 across all the injection treatments?

Apologies for the essay of a post, but any thoughts would be greatly appreciated. Many thanks in advance.
ecolliso is offline   Reply With Quote
Old 07-17-2014, 09:57 AM   #6
Michael Love
Senior Member
 
Location: Boston

Join Date: Jul 2013
Posts: 333
Default

hi,

Interaction models are a bit complicated, so I really recommend that users take the time to read a statistics reference on interactions in linear models, or to consult a statistician at your local institution who can answer basic questions. There's no reason to expect that this tricky stuff can be best learned on forums.

That said, I just posted the following to the Bioc mailing list, for a brief explanation of how to interpret interaction terms:

Quote:
Suppose we have a design ~ A + B+ A:B, and A=0/1, B=0/1. This means
we are modeling a log fold change due to variable A=1 over A=0, and a
log fold change due to variable B=1 over B=0. The interaction term is
an additional log fold change when A=1 and B=1 beyond the main effect
for A and the main effect for B. If the log fold change for the
interaction term is 0, then we know: the fold change when A=1 and B=1
is simply the two main effect fold changes multiplied (the log fold
changes added).
One quick note: you might want to set the base level of Injection to naive. By default R will choose the base level using alphabetical order. See the DESeq2 vignette for code examples.

To answer some of your questions:

What about multiple testing across contrasts. The multiple test correction in DESeq2 is over genes, per results table. Alternative options for correction are described well in the limma vignette, section 13.3 "Multiple Testing Across Contrasts". As we cannot predict which contrasts the user will be interested in, in order to make the results table reproducible, the default correction is over genes. We leave alternate strategies using the pvalue column to the user.

You can generate all of the contrasts of interest without subsetting. See the 'contrast' argument of ?results, and you can ask more questions here. Please also though post your code and sessionInfo().

Quote:
By accounting for injection treatment, does this mean that the test accounts for any genes that were either differentially expressed due to injection and/or with pesticide dependent on injection, and only gives those genes that are affected by Pesticide1 across all the injection treatments?
In this interaction model, the Pesticide_Pesticide1_vs_Control effect is for the base level of injection. The pest1 vs control effect for other levels of injection is equal to the sum of the interaction term and the effect for the base level of injection. This will make more sense after checking a reference for interactions in linear models.
Michael Love is offline   Reply With Quote
Old 07-17-2014, 10:03 AM   #7
raphael123
Member
 
Location: Mc Gill -- Montreal

Join Date: Dec 2013
Posts: 37
Default

Thanks Ecolliso,

I would say that you have the same kind of design that I have, maybe more complex. I am trying to understand what I can do in DESeq1 or 2

I can answer to
Quote:
if using this subsetting approach, do I need to make any added adjustment of the P-value to account for the fact Iíll be doing multiple tests?
adjustment on p value should theorically be applied on all the p values ever computed on earth .... It s of course not mathematical it s just a widely use way to safely rely on something. So yes you should correct for all your p values but if you don t you can still publish for instance . More that that you should have a p value that is really significant, even 0.001 is still one chance over 1000 to be wrong, it s kind of a lot of chance no ?

About GLM, here is some questions:
  1. First, from here : One reason for calling the general linear model ďgeneralĒ is that it can handle an X that is not numerical as well as one that is numerical. I cant find in DESeq a way to check continuous covariates like age or time ....

  1. Second I wonder if puting the subject identifier as a covariate will help to make a pair wise test ...


  1. Last, GLM usually give a p value for the intercept and each coefficient, why DESeq does not do it ?
raphael123 is offline   Reply With Quote
Old 07-17-2014, 10:07 AM   #8
Michael Love
Senior Member
 
Location: Boston

Join Date: Jul 2013
Posts: 333
Default

quick answers:

Search the DESeq2 vignette for "continuous covariate"

Yes including the subject identifier, i.e. "~ patient + condition" will test the change relative for each patient (pair-wise).

You can produce results tables for any coefficient using 'contrast' or 'name' argument to the results() function.
Michael Love is offline   Reply With Quote
Old 07-21-2014, 02:21 PM   #9
raphael123
Member
 
Location: Mc Gill -- Montreal

Join Date: Dec 2013
Posts: 37
Default

Thanks Mickael, I found the way to use continuous covariate, it was simple !
Now to test the change relative for each patient, I should fit a GLM including the subject identifier, i.e. "~ patient + condition", like you say.
Then I have to go to DESeq2 since DEseq have no result() function, I guess the p value associated with each coeficient is the usual that you find in the basic glm() funtion of R. So I can use it a as testing for my differential expression like the regular negative binomial test ! That s great ! Do you know why people use GLM only in case of covariate ?

Many Thanks!
raphael123 is offline   Reply With Quote
Reply

Tags
deseq, rnaseq

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:13 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO