SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
DESeq2 vs EdgeR for multi-factor designs keysoon Bioinformatics 10 01-13-2015 05:04 AM
DESeq2 multi-factor designs id0 Bioinformatics 6 01-08-2015 12:23 PM
DESeq2 multi-factor designs id0 Bioinformatics 3 01-10-2014 06:25 AM
DESeq2: Multi-factor designs sindrle Bioinformatics 10 10-21-2013 07:47 AM
Cuffdiff time series with two groups sindrle Bioinformatics 11 10-14-2013 03:41 PM

Reply
 
Thread Tools
Old 01-14-2015, 07:37 AM   #1
jutos
Junior Member
 
Location: canada

Join Date: Jan 2010
Posts: 5
Default time course,multi groups with DESeq2

Hi All
I am currently working on a set of RNAseq data which contain 4 genotypes (A, B, C, D) and 5 time points. I'd like to compare gene expression across all time points between 4 genotypes. Differential expressed genes accross 5 time points will be identified based on p values. However, the LRT considered full data and gives one padj. I also tried the contrast function between genotype A and B, A and C for example. A similar question was asked previously by someone else @ https://www.biostars.org/p/119342/. Looks like LRT is for testing all levels at once EVEN contrast is between A and B or A and C or A and D.

below is my design and some codes:
design = ~ genotype + timepoint + genotype:timepoint
> dds_time <- DESeq(dds, test = "LRT", reduced = ~ genotype + timepoint)
> res_time <- results(dds_time)

#CONTRAST_RESULTS
> results_A_B <- results(dds_time, contrast=c("genotype","A","B"))
> results_A_C <- results(dds_time, contrast=c("genotype","A","C"))


After reading some other posts and manuals, looks like I have two options:
1, using the current analysis but recalculate p values using Wald test.
I did this:
> dds_wald <- nbinomWaldTest(dds_time, betaPrior=FALSE)
found results columns, replacing these
> results_A_B_wald <- results(dds_time, contrast=c("genotype","A","B"))
> results_A_C_wald <- results(dds_time, contrast=c("genotype","A","C"))

padj values for specific comparisons were given using this method.

2, rerun pair-wise analysis (6 comparisons) using LRT separately?

Which option is better and why?
Am I correct if I want to identify the significant differences between genotypes across all time points?
Another question is that is it possible to cluster the differential expressed genes (e.g. A vs B) and plot different clusters using DESeq2?
Thanks,
jutos is offline   Reply With Quote
Old 01-14-2015, 09:48 AM   #2
Michael Love
Senior Member
 
Location: Boston

Join Date: Jul 2013
Posts: 333
Default

hi jutos,

Just to make sure I get it right, do you want to find (A) genes which show time-specific differences between two genotypes or (B) genes which are different between two genotypes, but not necessarily in a time-specific way?

Imagine two lines which have a sinusoidal pattern, moving up and down in parallel (≈). If these are the time course gene expression of two genotypes, then these would have a small p-value for (B) but not for (A).

Can you print out as.data.frame(colData(dds))? also packageVersion("DESeq2")
Michael Love is offline   Reply With Quote
Old 01-14-2015, 11:33 AM   #3
jutos
Junior Member
 
Location: canada

Join Date: Jan 2010
Posts: 5
Default

Hi Michael
Thanks for your response! Not sure if I understand correctly.
(A) indicates gene differences at a specific timepoint.
E.g. gene1.timepoint1 in genotype A vs gene1.timepoint1 genotype B.

If (B) means gene1 in A vs gene1 in B across all timepoints, then that's what I want. I'd like to see a pattern but not specific timepoint.

Yes, basically I am comparing two lines(significantly different or not). Sinusoidal pattern is one case. Other cases like, gene1 moving up in A but going down in B (<) or gene1 moving up in A but no change in B. etc.

>as.data.frame(colData(dds_time))
genotype timepoint sizeFactor
A_t1_rep1.ht A 1 1.1361501
A_t1_rep2.ht A 1 0.8687581
A_t2_rep1.ht A 2 1.2196618
A_t2_rep2.ht A 2 1.1299548
A_t3_rep1.ht A 3 1.050075
A_t3_rep2.ht A 3 0.9519595
A_t4_rep1.ht A 4 0.8285208
A_t4_rep2.ht A 4 1.0376003
A_t5_rep1.ht A 5 0.6985669
A_t5_rep2.ht A 5 0.9170495
B_t1_rep1.ht B 1 1.1789369
B_t1_rep2.ht B 1 0.9942072
B_t2_rep1.ht B 2 1.1213339
B_t2_rep2.ht B 2 1.0135584
B_t3_rep1.ht B 3 1.1813726
B_t3_rep2.ht B 3 1.080713
B_t4_rep1.ht B 4 1.1646828
B_t4_rep2.ht B 4 1.2200219
B_t5_rep1.ht B 5 0.9224539
B_t5_rep2.ht B 5 0.8493717
C_t1_rep1.ht C 1 1.0054339
C_t1_rep2.ht C 1 1.1109099
C_t2_rep1.ht C 2 1.0307073
C_t2_rep2.ht C 2 1.1476691
C_t3_rep1.ht C 3 1.1442486
C_t3_rep2.ht C 3 1.3069118
C_t4_rep1.ht C 4 0.977317
C_t4_rep2.ht C 4 0.9845946
C_t5_rep1.ht C 5 0.7378932
C_t5_rep2.ht C 5 0.7934279
D_t1_rep1.ht D 1 0.9474263
D_t1_rep2.ht D 1 1.0351776
D_t2_rep1.ht D 2 1.0540152
D_t2_rep2.ht D 2 1.1222096
D_t3_rep1.ht D 3 0.9024823
D_t3_rep2.ht D 3 1.3581637
D_t4_rep1.ht D 4 1.1031467
D_t4_rep2.ht D 4 1.1892484
D_t5_rep1.ht D 5 0.9206771
D_t5_rep2.ht D 5 0.6865151

> packageVersion("DESeq2")
[1] ‘1.6.3’
jutos is offline   Reply With Quote
Old 01-14-2015, 12:00 PM   #4
Michael Love
Senior Member
 
Location: Boston

Join Date: Jul 2013
Posts: 333
Default

All the testing is gene by gene, so we can leave gene out of the mix. If there are global changes across time for all genes, these are normalized away inside the model by the size factor.

For your test, suppose there is a gene where the expression is higher in genotype B than A, but the expression is flat across timepoints (the = symbol). Are you interested in this gene?

Your example where expression goes up for one genotype and down in another (the < symbol), I would call looking for a "time-specific difference between genotypes".

However, the example of lines moving in parallel (the ≈ symbol) this is not a time-specific difference between genotypes, but a difference in baseline expression, or what we would call a main effect.

Of these cases, which are you interested in?
Michael Love is offline   Reply With Quote
Old 01-14-2015, 12:32 PM   #5
jutos
Junior Member
 
Location: canada

Join Date: Jan 2010
Posts: 5
Default

Hi Michael

I am mostly interested in what you call the "time-specific difference between genotypes".
would LRT or wald test do the job?
Thanks,
jutos is offline   Reply With Quote
Old 01-14-2015, 02:45 PM   #6
Michael Love
Senior Member
 
Location: Boston

Join Date: Jul 2013
Posts: 333
Default

The LRT will find all genes with time-specific differences:

design(dds) = ~ genotype + timepoint + genotype:timepoint
dds = DESeq(dds, test="LRT", reduced=~ genotype + timepoint)
results(dds)

Note that the LFC column only describes the fold changes from a single coefficient in the model, although the likelihood ratio test statistic and p-values are from removing all of the terms associated with genotype:timepoint from the design.

See the workflow for an example of visualizing the results with heatmaps:

http://bioconductor.org/help/workflows/rnaseqGene/#time

Also there is a section of the vignette on clustering using rlog or VST.
Michael Love is offline   Reply With Quote
Old 01-19-2015, 05:20 PM   #7
jutos
Junior Member
 
Location: canada

Join Date: Jan 2010
Posts: 5
Default

Hi Michael

The workflow for time course works fine with two strains. I have two more questions regarding to the pvalues of LRT and wald test. sorry for not being a statistics person.
1, in the example, padj is for differences between two strains. I was wondering what is biological meaning of the padj if there are 3 or more strains? is it accounting for any differences in all strains over all time points?

2, wald test was used for specific time point in the example.
How about the res_mut_vs_wt? Is it testing linear model for all time points between mut and wt?

res_mut_vs_wt <- results(ddsTC, name="strain_mut_vs_wt", test="Wald")

Thanks,
jutos is offline   Reply With Quote
Old 01-20-2015, 11:04 AM   #8
Michael Love
Senior Member
 
Location: Boston

Join Date: Jul 2013
Posts: 333
Default

"is it accounting for any differences in all strains over all time points?"

Basically yes, the LRT for ~ genotype + timepoint + genotype:timepoint vs
~ genotype + timepoint gives p-values which are small if there are any differences for any strain at any time point (which are not explained by time differences which are shared across all genotypes or genotype differences which are present at the first time point).

The Mut vs WT effect here is the difference which is present at the first time point.

(Make sure that the levels(dds$timepoint) are in the correct order, and if not, you should refactor using factor() and specifying the levels.)
Michael Love is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 02:16 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO