Hello All,
I'm looking for your opinions on proper analysis of gene expression data from RNAseq with a rather complicated design. I am running a controlled lab infection experiment, exposing two populations of hosts to parasites, and looking for changes in gene expression associated with either infection, or failure of infection (Exposed). Here's the basics:
Total n = 95
Population:
GG:55
RR:40
Family:
19 total
~5 individuals/family (family member are either control, exposed, or infected)
Status:
Control:36
Exposed:37
Infected:22
Batch is the lane they were sequenced on (yes, I should have multiplexed properly, but that mistake has already been made).
I also have data on sex, size of host, and size of parasite.
In the PCA, families cluster together, so I believe that family should be considered a random effect. After looking around a bit on various threads, it's often suggested to use a fixed effect instead of a random effect, but there are usually many fewer levels that I have here (3-4 vs 19).
How would you incorporate family into the analysis? I'm open to all suggestions.
Alternatively, a general strategy for model testing in gene expression would be equally welcome.
Thanks in advance for you help
Lohman
I'm looking for your opinions on proper analysis of gene expression data from RNAseq with a rather complicated design. I am running a controlled lab infection experiment, exposing two populations of hosts to parasites, and looking for changes in gene expression associated with either infection, or failure of infection (Exposed). Here's the basics:
Total n = 95
Population:
GG:55
RR:40
Family:
19 total
~5 individuals/family (family member are either control, exposed, or infected)
Status:
Control:36
Exposed:37
Infected:22
Batch is the lane they were sequenced on (yes, I should have multiplexed properly, but that mistake has already been made).
I also have data on sex, size of host, and size of parasite.
In the PCA, families cluster together, so I believe that family should be considered a random effect. After looking around a bit on various threads, it's often suggested to use a fixed effect instead of a random effect, but there are usually many fewer levels that I have here (3-4 vs 19).
How would you incorporate family into the analysis? I'm open to all suggestions.
Alternatively, a general strategy for model testing in gene expression would be equally welcome.
Thanks in advance for you help
Lohman
Comment