SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Can I use DESeq and edgeR for mixed ANOVA abebe Illumina/Solexa 18 07-13-2017 05:45 AM
comparing results by cuffdiff, edgeR, DESeq PFS Bioinformatics 5 03-12-2014 04:01 AM
DESeq and edgeR up/down regulation murphycj Bioinformatics 7 09-21-2011 08:09 AM
BaySeq vs GLM in EdgeR and DESeq Hilary April Smith Bioinformatics 0 08-03-2011 11:14 AM
edgeR vs DESeq vs bayseq Azazel Bioinformatics 1 10-07-2010 08:11 AM

Reply
 
Thread Tools
Old 01-13-2012, 01:29 PM   #1
arrchi
Member
 
Location: ma

Join Date: Mar 2011
Posts: 46
Default Can edgeR/DESeq have more than one covariate?

Beside diagnosis, can I include age, gender and etc as covariates in the GLM model? I could not find such information in their paper or manual.

Thanks a lot.

Last edited by arrchi; 01-13-2012 at 01:31 PM.
arrchi is offline   Reply With Quote
Old 01-14-2012, 12:24 AM   #2
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994
Default

Sure, this was added to both packages mid 2010. Are you reading the current versions of the manuals?
Simon Anders is offline   Reply With Quote
Old 07-03-2012, 01:21 PM   #3
adumitri
Member
 
Location: Cambridge, MA

Join Date: Jan 2010
Posts: 27
Default

Hi,

I read Simon's answer from a few months ago, but could not find details about covariates in the DESeq manual. The covariates I am interested in accounting for are quantitative and are known to affect gene expression (e.g. RIN can have a huge influence on mRNA levels). Could you clarify whether or not DESeq allows for this type of covariates? I previously ran into this post, which seemed to imply that the quantitative covariates cannot be easily incorporated.

Thank you,
Alexandra
adumitri is offline   Reply With Quote
Old 07-07-2012, 06:08 PM   #4
Gordon Smyth
Member
 
Location: Melbourne, Australia

Join Date: Apr 2011
Posts: 91
Default

edgeR provides complete support for any number of covariates and factors, provided of course you have enough libraries available to estimate the parameters, in particular more libraries than coefficients. See

http://nar.oxfordjournals.org/content/40/10/4288

or the edgeR User's Guide.

There are two case studies in the edgeR User's Guide which involve two experimental factors.

Gordon
Gordon Smyth is offline   Reply With Quote
Old 07-09-2012, 01:43 PM   #5
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994
Default

In principle, using quantitative covariates should be possible with both packages, even though I have not seen this actually being done, because it is rarely useful in practice: As we are talking about (generalized) linear models, there should be some reason to assume that your covariate influences log expression in a linear manner. For you example, RIN, I do not think that this would be a good assumption.
Simon Anders is offline   Reply With Quote
Old 07-10-2012, 08:30 AM   #6
adumitri
Member
 
Location: Cambridge, MA

Join Date: Jan 2010
Posts: 27
Default

Hi Simon,

I do not fully understand your statement regarding RIN. Why did you say that RIN most likely does not influence expression in a linear fashion? Do you have any literature evidence that would suggest this to be the case?

The RNA-Seq samples we are looking at have been previously analyzed in a larger Parkinson disease/control expression study that used microarrays for assessing gene expression levels. In the microarray study, we included RIN in the used linear model because of its high impact on gene expression. Actually, in many cases RIN would be a stronger predictor for gene expression than the case/control status itself. From my point of view, it makes sense that samples with low(er) RIN will have more RNA degradation, and the apparent level of expression will be influenced by RINs. To a lesser extent, post-mortem interval and age are also predictors of gene expression.

Additionally, even after multiple RNA extractions, some of the samples obtained from diseased tissue that we included in our recent RNA-Seq study had a tendency for lower RINs than the samples obtained from control tissue. Therefore, for our analyses, we need to compare a set of diseased samples and a set of control samples that differ in terms of average RIN, covariate that we know to be predictive of gene expression. In this case, it seems mandatory to include RIN in the used analyses. What would your opinion be?

Gordon, thank you for the article reference.

Alexandra
adumitri is offline   Reply With Quote
Old 07-10-2012, 09:07 AM   #7
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994
Default

Interesting. If you say that you saw a linear dependence on RIN in earlier studies, it certainly makes sense to try to add it in a quantitative manner. It did not occur to me that sample integrity can be so hard to control that one needs to account for its variation, but then, I never had to work with post-mortem samples.

I might have thought things through in my previous post, because, in principle, the GLM approaches of edgeR and DESeq should not care whether covariates are categorical or quantitative. Just proceed according the the vignettes and use a numerical vector instead of only factors in the model frame, and ask again, if this throws an error.
Simon Anders is offline   Reply With Quote
Old 10-28-2013, 11:55 AM   #8
koduu
Junior Member
 
Location: estonia

Join Date: Sep 2013
Posts: 1
Default

Hello,

I am a complete newbie in statistics, so please forgive me if this question sounds really illogical. I basically tried to to the same thing as the OP wanted, but i got stuck at the estimateDispersions function. I have a relatively large sample size (32) and i want to have 2 covariates one binary e.g. "sex" and another quantitative variable e.g. "RIN", that is partially replicated (e.g. 2, 2, 2.5, 3, 3.5, 3.5,......).
If I do the estimateDispersions stage, then the internal function modelMatrixToConditionFactor(modelFrame) creates a condition vector with 22 levels (each having 1 to 3 replicates) and then estimates dispersions.

My question is, can I just leave out the quantitative variable from the estimate dispersions stage, and only use it in fitNbinomGLMs() or is that just mathematically uncorrect and hence wrong?
What I am trying to do, is to prove that the quantitative variable has a significant effect to the expression.

Any advise would be greatly appreciated.
koduu is offline   Reply With Quote
Old 10-28-2013, 03:37 PM   #9
Gordon Smyth
Member
 
Location: Melbourne, Australia

Join Date: Apr 2011
Posts: 91
Default

Mathematically incorrect.

Try the edgeR package. It has fast, reliable glm features and has no trouble with this sort of scenario.
Gordon Smyth is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:04 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO