Hi all,
I am trying to figure out the correct formula to use for my model in an RNA-seq experiment I am analyzing with DESeq2.
The experimental design covers 10 patients, done in 4 batches. No single batch is comprised solely of a single patient. The patients are distributed equally between two groups. Group 1 contains 2 male and 3 female patients and Group 2 is 3 male/2 female. 1 G1 male and 1 G2 male are Asian; the remainder of the patients are Caucasian. From each patient, we have collected 3 different tissue types.
The primary goals of the experiment are to: (1) understand how each tissue differs between groups and (2) understand how the tissues differ from one another within groups. The gender, race, and batch effects are not of particular importance, but I'd like to be able to control for them. We have a moderate expectation that there will be differences between genders; whether race will have an effect is less clear.
I initially tried "~ patient + batch + ethnicity + gender + group + tissue + group:tissue" but, aside from the fact that I'm unsure whether this is correct, my design matrix is degenerate. No matter what variables I drop or how I add interaction terms, R always tells me that my design matrix has rank (at most) 14.
Eventually, I followed the example given in section 3.5 of the edgeR manual (even though I am using DESeq2), using only "~ group + groupatient + group:tissue" and blocking the patients within the groups. This design matrix does have 14 columns and is full rank, but I am not using the gender/batch/ethnicity information, and it is unclear to me whether I can accurately assess differences between two tissues within the same group: I know different groups within a tissue should be group1 + group1.tissue1 versus group2 + group2.tissue1, but I cannot likewise do tissue1 + group1.tissue1 versus tissue2 + group1.tissue2 because I am not fitting main effects for the tissues.
I feel like the complication here is that there are multiple blocking variables, and I have both within- and between-group comparisons. I thought about trying to use sva or RUVseq to try to remove the effects of the variables that are not of real interest, but using sva runs into the same problem of needing to have a full model with those variables included, and using RUVseq doesn't take advantage of the known relationships between the variables and the samples (plus I don't have any well defined control genes).
Here is my actual sample matrix, if it's helpful (note that one of the 30 samples was removed for QC reasons):
Thanks so much for your help and sorry about the long post!
I am trying to figure out the correct formula to use for my model in an RNA-seq experiment I am analyzing with DESeq2.
The experimental design covers 10 patients, done in 4 batches. No single batch is comprised solely of a single patient. The patients are distributed equally between two groups. Group 1 contains 2 male and 3 female patients and Group 2 is 3 male/2 female. 1 G1 male and 1 G2 male are Asian; the remainder of the patients are Caucasian. From each patient, we have collected 3 different tissue types.
The primary goals of the experiment are to: (1) understand how each tissue differs between groups and (2) understand how the tissues differ from one another within groups. The gender, race, and batch effects are not of particular importance, but I'd like to be able to control for them. We have a moderate expectation that there will be differences between genders; whether race will have an effect is less clear.
I initially tried "~ patient + batch + ethnicity + gender + group + tissue + group:tissue" but, aside from the fact that I'm unsure whether this is correct, my design matrix is degenerate. No matter what variables I drop or how I add interaction terms, R always tells me that my design matrix has rank (at most) 14.
Eventually, I followed the example given in section 3.5 of the edgeR manual (even though I am using DESeq2), using only "~ group + groupatient + group:tissue" and blocking the patients within the groups. This design matrix does have 14 columns and is full rank, but I am not using the gender/batch/ethnicity information, and it is unclear to me whether I can accurately assess differences between two tissues within the same group: I know different groups within a tissue should be group1 + group1.tissue1 versus group2 + group2.tissue1, but I cannot likewise do tissue1 + group1.tissue1 versus tissue2 + group1.tissue2 because I am not fitting main effects for the tissues.
I feel like the complication here is that there are multiple blocking variables, and I have both within- and between-group comparisons. I thought about trying to use sva or RUVseq to try to remove the effects of the variables that are not of real interest, but using sva runs into the same problem of needing to have a full model with those variables included, and using RUVseq doesn't take advantage of the known relationships between the variables and the samples (plus I don't have any well defined control genes).
Here is my actual sample matrix, if it's helpful (note that one of the 30 samples was removed for QC reasons):
Code:
group patient gender ethnicity tissue batch G2 X30844 M Asian T2 Batch_1 G2 X30844 M Asian T3 Batch_1 G2 X30844 M Asian T1 Batch_1 G2 X30855 M Caucasian T2 Batch_1 G2 X30855 M Caucasian T3 Batch_1 G2 X30855 M Caucasian T1 Batch_1 G1 X30999 F Caucasian T2 Batch_3 G1 X30999 F Caucasian T3 Batch_3 G1 X30999 F Caucasian T1 Batch_3 G1 X31002 F Caucasian T2 Batch_3 G1 X31002 F Caucasian T3 Batch_3 G1 X31122 F Caucasian T2 Batch_4 G1 X31122 F Caucasian T3 Batch_4 G1 X31122 F Caucasian T1 Batch_4 G2 X31132 M Caucasian T2 Batch_4 G2 X31132 M Caucasian T3 Batch_4 G2 X31132 M Caucasian T1 Batch_4 G1 X31134 M Asian T2 Batch_4 G1 X31134 M Asian T3 Batch_4 G1 X31134 M Asian T1 Batch_4 G1 X31188 M Caucasian T2 Batch_5 G1 X31188 M Caucasian T3 Batch_5 G1 X31188 M Caucasian T1 Batch_5 G2 X31193 F Caucasian T2 Batch_5 G2 X31193 F Caucasian T3 Batch_5 G2 X31193 F Caucasian T1 Batch_5 G2 X31195 F Caucasian T2 Batch_5 G2 X31195 F Caucasian T3 Batch_5 G2 X31195 F Caucasian T1 Batch_5
Comment