Seqanswers Leaderboard Ad

**Simon Anders** · 11-19-2011, 02:40 AM

Sure, you can analyse more complex design. See the section on GLMs in the vignette. How precisely to set up the test depends on what hypothesis you want to test.

**mbjohnson** · 10-17-2012, 01:04 PM

Hi Simon,
Can you please clarify this for me? If I have more than one factor e.g. treatment and timepoint, i use the GLM full model approach. If I have only one factor, but it has more than two levels (A, B, C), should I still use the GLM approach? Or is it better to use the simpler model and do nbinomTest several times for each 2-way comparison (A vs B; A vs C; B vs C)? Is there a way to use the simpler model, but also perform the differential expression in one step (e.g. anova, especially for many-level factors)?
Many thanks for all your work on DESeq!
Matt

**Simon Anders** · 10-18-2012, 05:53 AM

For pair-wise comparisons, you have to subset your data set to only the samples involved. To be consistent with the ANOVA-style result for all levels, you should do the subsetting after the dispersion estimation.

**mbjohnson** · 10-18-2012, 06:31 AM

Thanks, Simon. By subsetting, I assume you mean to simply run a number of nbinomTest commands, one for each comparison, using the same countDataSet (after dispersion estimation). For example:

Code:

design <- data.frame(
	sample.names = sampleTable$V1,
	count.files = sampleTable$V2,
	condition = c("A", "A", "A", "B", "B", "B", "C", "C", "C")
)

cds <- newCountDataSetFromHTSeqCount(design, directory="/data/dir")
cds <- estimateSizeFactors( cds )
cds <- estimateDispersions( cds )

AvsB <- nbinomTest(cds, "A", "B")
AvsC <- nbinomTest(cds, "A", "C")
BvsC <- nbinomTest(cds, "B", "C")

**NateP** · 10-18-2012, 06:50 AM

I was just going to make a thread on a similar vein, so may as well ask my question in this one.

Also dealing with a subset of pairwise comparisons in an analysis, and hte correct way to run it with DESeq.

Say you have a time course analysis with 3 bioligical replicates collected from 6 different time points. The comparisons we are interested it looking at is how all of the time points are different compared to time 1.

So 5 different pairwise tests: t1 vs t2, t1 vs t3, t1 vs t4, t1 vs t5, and t1 vs t6.

So Simon, the appropriate way to run this analysis using your DESeq would be to have them all in one count data set and then just run 5 different nbinomTests with (cds, "t1", "t2"), (cds, "t1", t3")... etc? Or taking the raw counts for "t1" and "t2", putting them in their own table, and creating / testing a count data set for each pair?

In addition, since we are doing multiple tests on the same data set, is there a need to re-do the False Discovery Rate calculation by combining the raw p-values from all 5 pairwise tests into a single list, and re-running p.adjust on the full set of results? Or is keeping the FDR values for each individual test acceptable?

**john_nl** · 10-23-2012, 01:14 AM

I have a similar experimental set-up as above and therefore face the same decision. Essentially the question is, assuming > 2 samples (comparisons) should the variance estimation (estimateDispersions) be performed using ALL of the samples before performing the pairwise DE test, or should the variance estimation be restricted to the pair of samples that one is testing for DE?

Cheers,

**huali** · 11-01-2012, 07:14 AM

Will the comparison between the full model and the reduced model (only intercept) give the overall significance of time effect?

dfit1 <- fitNbinomGLMs(d, count ~ condition)
dfit0 <- fitNbinomGLMs(d, count ~ 1)
dpval <- nbinomGLMTest(dfit1, dfit0)
dpadj <- p.adjust(dpval, method="BH")

**Gabriele Zoppoli** · 07-05-2013, 05:34 AM

This is indeed an important point john_nl; my impression is that estimating dispersion for only the two levels you're going to compare is a bit cheating on statistics... One would expect the dispersion to be calculated on all the condition levels, and then perform an ANOVA with contrasts... Does DESeq support this?

**Wolfgang Huber** · 07-07-2013, 03:50 AM

Gabriela
yes, DESeq supports GLMs of any type. See also Simon's earlier posts.

**haggardd** · 09-03-2013, 03:52 PM

Hello,

I have a couple questions regarding biological replicates in DESeq. I have HTSeq output files from RNA-seq data examining the effects of three chemicals on gene expression. There are two experiments for this data. One is examining the expression changes compared to a vehicle control (1%) at a high concentration of chemical (10uM) and the other experiment is examining gene expression changes at a lower concentration (1uM) of the chemicals at a lower vehicl concentration (0.1%). Currently, I have been running DESeq on the two experiments separately, i.e. two separate R codes for each experimental setup so the design variable contains 4 conditions corresponding to their respective HTSeq output files for each experiment (hopefully this all makes sense).
My first question is whether I should run DESeq on the experiments combined instead of keeping them separate. In this case I would have a design variable contain all 8 conditions. My reasoning (potentially naive reasoning) about combining the two experiments is that I would better estimate the overall gene dispersions for all genes examined and yet still be able to run the 'nbinomTest()' normally if I define the conditions correctly. (Maybe I’m getting confused in the vignette’s definition of condition and factor?)
My second question is with regards to outliers as identified by the PCA plot function of DESeq. I have generated PCA plots for both experiments (keeping them separate) in order to see whether the treatments group together and the general pattern of the data. For both the high concentration and low concentration experiments the PCA plots show that some of the replicates differ rather substantially from their respective treatment groups (see images). Now, I know that removing outliers from analyses is risky business and needs to be justified, but based on how different these replicates are from the treatments would it be ok to take these out?

Thanks for all the help!!

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 58 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 53 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 45 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 55 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

DESeq: more than 2 levels per condition?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News