I thought I understood what the contrast= option in glmLRT means as opposed to coef= (which just looks for DE wrt the index=1 sample, presumed to be the control), but I'm getting crazy results from using contrast= to get what I think should be the equivalent to coef=.
I've got a 4 times X 5 conditions design matrix, so 20 total "treatments." I want to get the DE for condition vs control for each of the four times. The first non-control condition is called AS2 and the control is called Col (for AS2-enhanced transgenic samples and "Columbia", a standard Arabidopsis wild type, if you're curious). So, for example, I'd like to get DE{Col(0)|AS2(0)}, DE{Col(30)|AS2(30)}, etc., where 0 and 30 minutes are two of the four times. But I'll focus on t=0 here, which seems to be where the problem lies. The "control" as edgeR is concerned is condition=Col and time=0 with index 1.
The design matrix is, in order:
(Intercept), ConditionAS2, ConditionKan, ConditionRev, ConditionSTM,
ConditionCol:Timet030, ConditionAS2:Timet030, ConditionKan:Timet030, ConditionRev:Timet030, ConditionSTM:Timet030,
ConditionCol:Timet060, etc.
Here's the two calls that I think should be equivalent:
(a) glmLRT(f, coef=2)
This returns a very reasonable dataset for DE{Col(0)|AS2(0)} which compares favorably to other DE analyses (edgeR and DESeq2) using only the appropriate samples. I don't doubt that it's fine. However,
(b) glmLRT(f, contrast=c(-1,1,0,0,0, 0,0,0,0,0, 0,0,0,0,0, 0,0,0,0,0))
returns complete garbage with logFC values in the stratosphere. To make matters more confusing, it does return seemingly correct data for non-t=0 contrasts, for example,
(c) glmLRT(f, contrast=c(0,0,0,0,0, -1,1,0,0,0, 0,0,0,0,0, 0,0,0,0,0))
seems to return a very reasonable dataset for DE{Col(30)|AS2(30)}.
What am I missing about the meaning of the first index in the array passed to contrast= in glmLRT??? Is there some reason why contrast= can only be used for contrasts between non-control samples? I don't get it.
I've got a 4 times X 5 conditions design matrix, so 20 total "treatments." I want to get the DE for condition vs control for each of the four times. The first non-control condition is called AS2 and the control is called Col (for AS2-enhanced transgenic samples and "Columbia", a standard Arabidopsis wild type, if you're curious). So, for example, I'd like to get DE{Col(0)|AS2(0)}, DE{Col(30)|AS2(30)}, etc., where 0 and 30 minutes are two of the four times. But I'll focus on t=0 here, which seems to be where the problem lies. The "control" as edgeR is concerned is condition=Col and time=0 with index 1.
The design matrix is, in order:
(Intercept), ConditionAS2, ConditionKan, ConditionRev, ConditionSTM,
ConditionCol:Timet030, ConditionAS2:Timet030, ConditionKan:Timet030, ConditionRev:Timet030, ConditionSTM:Timet030,
ConditionCol:Timet060, etc.
Here's the two calls that I think should be equivalent:
(a) glmLRT(f, coef=2)
This returns a very reasonable dataset for DE{Col(0)|AS2(0)} which compares favorably to other DE analyses (edgeR and DESeq2) using only the appropriate samples. I don't doubt that it's fine. However,
(b) glmLRT(f, contrast=c(-1,1,0,0,0, 0,0,0,0,0, 0,0,0,0,0, 0,0,0,0,0))
returns complete garbage with logFC values in the stratosphere. To make matters more confusing, it does return seemingly correct data for non-t=0 contrasts, for example,
(c) glmLRT(f, contrast=c(0,0,0,0,0, -1,1,0,0,0, 0,0,0,0,0, 0,0,0,0,0))
seems to return a very reasonable dataset for DE{Col(30)|AS2(30)}.
What am I missing about the meaning of the first index in the array passed to contrast= in glmLRT??? Is there some reason why contrast= can only be used for contrasts between non-control samples? I don't get it.
Comment