I guess I mostly am asking the DESeq team, ie. Simon Anders, but wanted to put this out there for others doing similar experiments.
I want to make a series of pair-wise comparisons, A vs. B, A vs. C, A vs. D, etc. I have 2 replicates per time point. When measuring dispersion using DEseq, is it more sound to do so off of the full count table (a1 a2 b1 b2 c1 c2...) or just off of the data you are looking at for that particular test (a1 a2 c1 c2)?
Having set it up both ways, I get more genes above any given padj cutoff if I measure dispersion on the more limited set, but is this being sloppy?
Somewhat relatedly, if my dispersion graph is really ugly (many points are really far from the red line), but I am still getting tons of genes with padj <0.05, can I assume that the differences that I am seeing are statistically valid and also probably real and that I am just missing smaller changes that are hidden by the noise of my data, or are even my statistically significant changes suspect?
Thanks,
Anna
I want to make a series of pair-wise comparisons, A vs. B, A vs. C, A vs. D, etc. I have 2 replicates per time point. When measuring dispersion using DEseq, is it more sound to do so off of the full count table (a1 a2 b1 b2 c1 c2...) or just off of the data you are looking at for that particular test (a1 a2 c1 c2)?
Having set it up both ways, I get more genes above any given padj cutoff if I measure dispersion on the more limited set, but is this being sloppy?
Somewhat relatedly, if my dispersion graph is really ugly (many points are really far from the red line), but I am still getting tons of genes with padj <0.05, can I assume that the differences that I am seeing are statistically valid and also probably real and that I am just missing smaller changes that are hidden by the noise of my data, or are even my statistically significant changes suspect?
Thanks,
Anna