View Single Post
Old 03-10-2015, 01:04 PM   #5
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994
Default

Quote:
However, in the results, I do not understand why the interaction effect of Location and Status gives nearly the exact opposite log fold change.
Is this something normal or it indicates something wrong with the analysis?
Short answer: Don't worry, it's all fine. The values are nearly the same because they should be.

Long answer: Well, this will be long...

This is normal, but a bit unfortunate. In DESeq2, we use a non-standard way of setting up model matrices, which we call "extended model matrix".
As explained in more detail in our paper, this is necessary to get proper shrinkage of log-fold-change estimates via ridge penality.

In essence, using your example, and first explaining it without interaction: With standard model matrices, the intercept coefficient would be the expected log expression for a sample from patient level 1, location level 1 (ileum), status level 1 (NI), and then there are coefficients for all other patients, giving the differences in expression if the sample is any of the other patients, and one coefficient for the other location level (caecum) and one the other status level (I). The location coefficient, for example, is the difference between log expression for the other location level (caecum) and the first location level (ileum). For each factor, there is one less coefficient than there are levels.

For extended model matrices, the intercept is the average over all levels rather then the value for all factors being the first level, and then, there is one coefficient for each level of each factor. These coefficients give the difference to the grand average (not to the first level as before).

So, LocationIleum is the difference between ileum samples and the average over all samples, and LocationCaecum is the difference between caecum samples and the average. The difference caecum-vs-illeum is hence the difference between the LocationCaecum and the LocationIlleum coefficient -- and this is what 'results' will calculate for you if you ask for this contrast. If you had a balanced design, these two coefficient would be exactly the same value with opposite signs (because the grand average would be in the middle of both), and half of what the coefficient with standard design matrices would be. As your design is not quite balanced, the values are only about the same, but this is not an issue.

Now, for the interaction: An interaction is a difference of difference: In your case, it is the difference between (a) the difference between I and NI for caecum samples and (b) the difference between I and NI for illeum samples:

interaction = (caecumI - caecumNI) - (illeumI - illeumNI)

You could also ask for other interactions, e.g.

interaction' = (caecumI - illeumI) - (caecumNI - illeumNI)

and besides these two, there are two more ways to arrange the terms in this way. But all four of these are the same and differ only by sign, as you will notice quickly if you write the equations without parantheses. It is hence fully expected that all four interaction coefficients are the same, up to sign. There is only one interaction value -- it's only due to the need for extended design matrices that the same information appears four times. And the lack of balances causes them to differ, but only slightly.

Two final remarks:

- The interaction in my double difference above is twice of what the coefficient says, again because these are the differences to the grand average, not to each other. (Mike: If you read this please double-check whether I'm right.)

- Strictly speaking, we need extended design matrices only if we have more than two levels. Hence, normally, DESeq2 switches to standard design matrices (with so-called sum contrasts) if all factors have only two levels -- to save users from exactly the confusion we have here. Unfortunately, your patient factor has more than two levels, and our code is not smart enough to realize that we need extended coding only this factor.

Last edited by Simon Anders; 03-10-2015 at 01:07 PM.
Simon Anders is offline   Reply With Quote