Hi All,
Sorry, this is going to take some explaination. I am trying to use DESeq to identify differentially expressed genes pre and post treatment with a drug using a repeated measures design:
We have five patients. From each patient we have RNA-seq data pre and post treatment.
I believe that this should well modelled by the GLM functionality of DESeq. The problem I have is with estimating the variance. DESeq regards each patient x treatment combination as a seperate condition, and thus my design has only one replicate per condition. This means that as far as DESeq is concerned the experiment has no biological replication, when in fact we have five biological replicates. Indeed its difficult to see how more biological replication could be included: more samples from the same patients would be technical replicates and would lead to an under estimate of the true varience.
Although it would seem that it defeats the whole point of the pair/repeated design, its is possible to estimate varience from all samples using the method = "blind" setting of DESeq. Apart from the fact power must be lost by pooling conditions like this, the estimates of the pooled varience don't seem very good: see attached plot. I also tried using local regression rather than the parametric fit - it made things worst.
This approach finds only a very small number of genes to be differentially expressed. This of course could be correct, but its much smaller than is found if I treat this as a two condition problem with five replicates in each condition (although the variance estimation doesn't look any better, even if done on a per-condition basis) and I though treating the problem as a GLM should improve power.
If anyone has anythoughts on how best to do this analysis, either from a statisitical point of view, or a how to use DESeq point of view it would be much appreciated.
Yours,
Ian
===
Sorry, this is going to take some explaination. I am trying to use DESeq to identify differentially expressed genes pre and post treatment with a drug using a repeated measures design:
We have five patients. From each patient we have RNA-seq data pre and post treatment.
I believe that this should well modelled by the GLM functionality of DESeq. The problem I have is with estimating the variance. DESeq regards each patient x treatment combination as a seperate condition, and thus my design has only one replicate per condition. This means that as far as DESeq is concerned the experiment has no biological replication, when in fact we have five biological replicates. Indeed its difficult to see how more biological replication could be included: more samples from the same patients would be technical replicates and would lead to an under estimate of the true varience.
Although it would seem that it defeats the whole point of the pair/repeated design, its is possible to estimate varience from all samples using the method = "blind" setting of DESeq. Apart from the fact power must be lost by pooling conditions like this, the estimates of the pooled varience don't seem very good: see attached plot. I also tried using local regression rather than the parametric fit - it made things worst.
This approach finds only a very small number of genes to be differentially expressed. This of course could be correct, but its much smaller than is found if I treat this as a two condition problem with five replicates in each condition (although the variance estimation doesn't look any better, even if done on a per-condition basis) and I though treating the problem as a GLM should improve power.
If anyone has anythoughts on how best to do this analysis, either from a statisitical point of view, or a how to use DESeq point of view it would be much appreciated.
Yours,
Ian
===