SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Number of clusters for Timecourse SylvainL Bioinformatics 0 04-27-2015 12:32 AM
How do I create a summarized experiment object for DeSeq2?? flytibia RNA Sequencing 2 03-10-2015 06:07 AM
Cuffdiff timecourse replicates Chinney14 RNA Sequencing 5 11-06-2014 11:42 AM
RNA-seq wheat-fungal timecourse Gordona RNA Sequencing 0 04-24-2014 07:16 AM
Differential expression DESeq using GLM for timecourse experiment jaymiRNA Bioinformatics 2 08-06-2012 05:41 AM

Reply
 
Thread Tools
Old 11-05-2015, 02:26 AM   #1
frymor
Senior Member
 
Location: Germany

Join Date: May 2010
Posts: 149
Default DESeq2 dynamics in a timecourse experiment

dear all,

we are trying to analyse a time-course data set of several time-point with replica. we're looking for genes which are changing across all TP as well as genes with a difference on a pair-wise comparison between two time points.

My questions regards the sensitivity of the dynamics in the analysis of DSeq2.
Identifying significant changes when considering multiple time-points isn't as simple as a consistent change over all time-points continuously, because a gene can be high at 1 or 2 time-points and off at the rest, and that is a significant biological difference.

my question is therefore does DESeq2 timecourse model design is able to identify genes which are going up and down over multiple time-points in my time course experiments

for example, if I have a gene which goes high up after 1h, than normalized and than again go down after 24h, will deseq2 be able to identify this genes as a significantly deregulated genes over all time-points?
Do I need to check for all coefficients to get a complete overview of the genes changed over time?

thanks

Assa
frymor is offline   Reply With Quote
Old 11-05-2015, 03:31 AM   #2
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

For the question of, "What genes change over time?", what you want to do is an LRT test where the reduced model lacks the time component. This will give you everything that changes, regardless of whether it's a consistent change or only up/down at one time point.

Other than that, I think people are typically encouraged to spend some time with the PCA plots. You can often make some guess as to useful pairwise comparisons from them.

Edit: Note that fold changes from the LRT results probably aren't meaningful, but the p-values are.
dpryan is offline   Reply With Quote
Old 11-06-2015, 12:06 AM   #3
frymor
Senior Member
 
Location: Germany

Join Date: May 2010
Posts: 149
Default

Quote:
Originally Posted by dpryan View Post
For the question of, "What genes change over time?", what you want to do is an LRT test where the reduced model lacks the time component. This will give you everything that changes, regardless of whether it's a consistent change or only up/down at one time point.
this are my full and reduced models
Code:
# create the DESeq object from a matrix. 
dds<-DESeqDataSetFromMatrix(countData=countTable, colData=phenotype, design= ~ replica + hours )
dds = DESeq(dds, test="LRT", reduced=~replica)
What will be the difference though both in term of my question and in the results I am getting, if I'll add the interaction term replica:time to the two models line that.

Code:
# create the DESeq object from a matrix. 
dds<-DESeqDataSetFromMatrix(countData=countTable, colData=phenotype, design= ~ replica + replica:hours + hours )
dds = DESeq(dds, test="LRT", reduced=~replica + replica:hours)
Does the interaction term must be always at the end of the model (as this is usually, the part I'm interested in, when using such interactions?
frymor is offline   Reply With Quote
Old 11-07-2015, 08:51 AM   #4
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

The odds are quite good that you want the interaction. Without it you're just looking for consistent changes over time. So the full model would be "~replica*hours" and the reduced model just "~replica".

The order of things in the formula only matters for a few plotting functions (they usually default to coloring according to the last model component), the statistics will be the same regardless.
dpryan is offline   Reply With Quote
Old 11-08-2015, 01:34 AM   #5
frymor
Senior Member
 
Location: Germany

Join Date: May 2010
Posts: 149
Default

Quote:
Originally Posted by dpryan View Post
The odds are quite good that you want the interaction. Without it you're just looking for consistent changes over time. So the full model would be "~replica*hours" and the reduced model just "~replica".
Is there a difference between these two models?
full - ~replica * hours
reduced - ~replica

and

full - ~replica + hours + replica:hours
reduced - ~replica +replica:hours

Assa
frymor is offline   Reply With Quote
Old 11-08-2015, 01:44 AM   #6
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

The only difference is the reduced model, which in one case has the interaction. It's then a matter of what exact question you want to ask. A reduced model of "~replica" will give you "all hours dependent changes, including those only occurring due to a hours:replica interaction", while the "~replica+replica:hours" excludes the aforementioned interaction.
dpryan is offline   Reply With Quote
Old 11-08-2015, 01:59 AM   #7
frymor
Senior Member
 
Location: Germany

Join Date: May 2010
Posts: 149
Default

Quote:
Originally Posted by dpryan View Post
The only difference is the reduced model, which in one case has the interaction. It's then a matter of what exact question you want to ask. A reduced model of "~replica" will give you "all hours dependent changes, including those only occurring due to a hours:replica interaction", while the "~replica+replica:hours" excludes the aforementioned interaction.
Thanks for the quick response.

Maybe this is a very basic statistical question, but to be honest I am not quite sure where is the difference

In this workflow I am trying to identify all the genes, which are showing a significant difference of behaviour across all time points, either in one specific time-point, in two consecutive or alternate time points.

What exactly would the interaction term "replica:hours" would tell me / add to the information I already have?
What changes can happen due to an hours:replica interaction?

Does it mean, with this term I am getting the genes, which are sig. DE because the intensity of gene X from replica1 of TP1 is higher than ctrl1 of TP1, even though the same gene from the same condition, but from replica2 and ctrl2 of TP1 is not significant?
so basically, If I am excluding this interaction - Is it possible to have replica-specific DE genes?

Why would i want to have such genes at all (if I am correct in my assumption)?

Last edited by frymor; 11-08-2015 at 10:26 PM.
frymor is offline   Reply With Quote
Old 11-08-2015, 11:28 PM   #8
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

If "replica" is doing anything then I presume that it can have a time-dependent effect. This is known as an "interaction" and is the "replica:hour" part of your model. Not including that in the reduced model says, "include any genes that might change due to this," which I presume you want...though without knowing the actual context that's just an assumption. If you just want genes that change as a function of time, ignoring any "replica"-dependent changes, then go ahead and keep "replica:hours" in you reduced model.

BTW, I hope "replica" doesn't denote the biological replicates
dpryan is offline   Reply With Quote
Old 11-09-2015, 02:07 AM   #9
frymor
Senior Member
 
Location: Germany

Join Date: May 2010
Posts: 149
Post

Quote:
Originally Posted by dpryan View Post
BTW, I hope "replica" doesn't denote the biological replicates
Now I am confused. "replica" are the biological replica I have.

This is my colData:
Code:
sampleName	hours	replica	batch
IFM_Myoblast_1	0	1	2014
IFM_Myoblast_2	0	2	2014
IFM_Myoblast_3	0	3	2014
IFM16h_1	16	1	2013
IFM16h_2	16	2	2015
IFM16h_3	16	3	2015
IFM24h_1	24	1	2013
IFM24h_2	24	2	2015
IFM24h_3	24	3	2015
IFM30h_1	30	1	2013
IFM30h_2	30	2	2014
IFM48h_1	48	1	2013
IFM48h_2	48	2	2014
IFM48h_3	48	3	2014
IFM72h_1	72	1	2013
IFM72h_2	72	2	2014
IFM90h_1	90	1	2013
IFM90h_2	90	2	2013
IFM100h_1	100	1	2013
IFM100h_2	100	2	2014
As you can see, replica are the biological replica. I am assuming that thee replica are not significantly different from each other and therefore would like to assume, that within each time-point, I don't have any significant differences.

BUT

If i do have them there, I would like them to be ignored (=not taken into account, when calculating the differential expression). For that reason I do like to add them to the reduced model, if I understood it correctly.
frymor is offline   Reply With Quote
Old 11-09-2015, 02:11 AM   #10
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

Don't include a term for replicates in your models.

Your full model should just be ~hours and the reduced ~1. Note that "hours" MUST be a factor in order for this to work. I see that you have a batch effect, in which case a full model of ~batch+hours and a reduced model of ~batch will hopefully work. Make sure that "batch" is also a factor.
dpryan is offline   Reply With Quote
Old 11-09-2015, 02:21 AM   #11
frymor
Senior Member
 
Location: Germany

Join Date: May 2010
Posts: 149
Default

why shouldn't the biological replicates be included in the model?

I haven't included the batch, because we don't see any batch effect when plotting a PCA. But i will try it with that two, when removing the replica factors from the model.
frymor is offline   Reply With Quote
Old 11-09-2015, 02:24 AM   #12
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

If you include a coefficient for replicates then they won't be used as replicates anymore, rather they'll be treated as though they're different groups. This drastically reduces your statistical power. You should only ever include a factor for replicates when you have something like a case-control study and need specific samples to be paired to each other.
dpryan is offline   Reply With Quote
Old 11-17-2015, 04:39 AM   #13
frymor
Senior Member
 
Location: Germany

Join Date: May 2010
Posts: 149
Default

Hi all,

thanks Devon for all the help. It seems to works quite good.

Just to make sure I have understood the way to create the multi-factor designs I was asked by a college of mine about a different experiment
We have different mouse samples.
we have two conditions - wild-type and KO
we have stimulated (s) and unstimulated (us) samples
(=all together 12 samples, 3xWTus, 3xKOus, 3xWTs and 3xKOs samples).

Code:
name	condition	stimulation
Vav_KO_1	KO	no
Vav_KO_2	KO	no
Vav_KO_2_C	KO	yes
Vav_KO_4_C	KO	yes
Vav_KO_5	KO	no
Vav_KO_5_C	KO	yes
Vav_WT_1	wildtype	no
Vav_WT_1_C	wildtype	yes
Vav_WT_2	wildtype	no
Vav_WT_2_C	wildtype	yes
Vav_WT_4	wildtype	no
Vav_WT_4_C	wildtype	yes
We are interested in two groups of genes
1. What genes are differentially regulated due to the different conditions (knock-out)? In other words I would like to know what genes are influenced by the KO, and the KO only?
2. What genes are changed in the knock-out due to the stimulation?

As this is a pair-wise comparison I will use the Wald test.
This is the design formulae I would use in this case is
Code:
~condition + stimulation + condition:stimulation
or, as these two are identical
Code:
dds <- DESeqDataSetFromMatrix(countData = countTable,
                              colData = Phenotype,
                              design = ~condition*stimulation)
dds <- DESeq(dds)
to identify the group of DE genes for each of the questions at hand, I will use the results() command with the following parameters:
to answer the first question and identify how the knock-out
Code:
resultsWT.KO <- results(dds, contrast=c("condition", "KO", "wildtype"))
for the second question, the defaults results command should suffice if I understnad it correctly, so running this command will give me the genes, which are changed between the KO and the WT due to the stimulation of the samples (and not due to the KO).
Code:
resultsStimulations <- results(dds)
Are my assumptions correct?

Thanks for any help and corrections if needed,

Assa
frymor is offline   Reply With Quote
Old 11-17-2015, 04:58 AM   #14
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

1. This isn't actually a pair-wise comparison, it's a classical factorial design (sorry, I know this gets overly confusing...I swear this makes sense if someone just draws stuff on a white board!). For this, you just want the results() from "condition" (using a contrast as you did is nice since then you know whether WT or KO is the numerator).

2. That should work, though you might specify results(dds, name="condition:stimulation") or something like that to be absolutely sure.
dpryan is offline   Reply With Quote
Old 11-17-2015, 05:08 AM   #15
frymor
Senior Member
 
Location: Germany

Join Date: May 2010
Posts: 149
Default

thanks again,

Quote:
Originally Posted by dpryan View Post
2. That should work, though you might specify results(dds, name="condition:stimulation") or something like that to be absolutely sure.
if I take one of the names from my resultNames() vector:
Code:
> resultsNames(dds)
[1] "Intercept"                         "condition_wildtype_vs_trippleKO"  
[3] "stimulation_none_vs_Curdlan"       "conditionwildtype.stimulationnone"
it would be as such:

Code:
resultsStimulations <- results(dds, name="conditionwildtype.stimulationnone")
frymor is offline   Reply With Quote
Old 11-17-2015, 05:10 AM   #16
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

Ah, yes, that's correct. I always forget that it uses a "." rather than ":".
dpryan is offline   Reply With Quote
Old 11-17-2015, 07:47 AM   #17
frymor
Senior Member
 
Location: Germany

Join Date: May 2010
Posts: 149
Default

Sorry to be so picky, but I still have some difficulties understanding it, even if it might not seems this way.

What would be the difference between the two models:
Code:
~condition + stimulation
and
Code:
~condition + stimulation + condition:stimulation
in term of the question answered by the model?
I know what the user guide says about interactions, but to be honest, I'm not sure what it really means.
Is there a way to simplify it better?

my try would be this:
the first model try to identify the differences in the two groups of stimulation, buy accounting for the differences in the condition groups (this is reformulating the user guide). Does it means, that I sort of trying to discard all changes happening due to conditions and concentrate only on the effect causes by the stimulation?

the second model adds an interaction. according to the user guide, it means that i am trying to calculate the differences of one condition (here my term condition) based on the second condition (here my term stimulation)?
Does this means here that I'm trying to see the differences happening in my samples between the WT and KO based on the stimulation changes?

This doesn't sound very simplified, but maybe someone can do it better
frymor is offline   Reply With Quote
Old 11-17-2015, 11:46 PM   #18
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

No worries about being picky, this is important to get correct

This boils down to, "so what's an interaction anyway?". Let's take your experiment as the example and consider the design:
Code:
~condition + stimulation
Let's just consider one example gene, called foo. Suppose in the WT unstimulated case, foo has a value of 1. Upon stimulation, that increases to 2. If don't stimulate but instead knock out some other gene, the value instead goes to 3. That covers 3 of the 4 groups. What this model says is that in the case of both stimulating AND knocking out some other gene, we expect the resulting value to be 2*3=6 (the fold changes multiply).

But suppose the effect due to knocking out some gene is dependent upon stimulation (e.g., the stimulation activates a pathway that we've partly inactivated due to the knockout). We then would have what's referred to as an interaction between the stimulation and the condition. So we would no longer expect the knockout and stimulated group to have a value of 6, but something else entirely. The resulting interaction coefficient tells you the fold change from what you would expect if the knockout and stimulation effect don't interfere with each other (or have a synergistic effect). It could well be the case that instead of this group having a value of 6 we instead see no change from the baseline WT unstimulated group, so the value is 1, meaning that the fit coefficient is 1/6. Interactions always describe the additional change on top of what would be expected if your various conditions acted independently.

Hope that helps.
dpryan is offline   Reply With Quote
Old 11-18-2015, 12:25 AM   #19
frymor
Senior Member
 
Location: Germany

Join Date: May 2010
Posts: 149
Default

Yes, it helps a lot. I think I start understanding it.

I have another question about the colData. Does the order of the samples here must be the same as in the count table.
I am asking because I have done the analysis mentioned above and found no DE genes. When I than changed the colData file to that:

Code:
name	condition	stimulation
Vav_KO_1	KO	no
Vav_KO_2	KO	no
Vav_KO_5	KO	no
Vav_KO_2_C	KO	yes
Vav_KO_4_C	KO	yes
Vav_KO_5_C	KO	yes
Vav_WT_1	wildtype	no
Vav_WT_2	wildtype	no
Vav_WT_4	wildtype	no
Vav_WT_4_C	wildtype	yes
Vav_WT_1_C	wildtype	yes
Vav_WT_2_C	wildtype	yes
I suddenly get 106 genes with an adjp<=0.1

the sample names was not changed and is identical to the column names of the count table.
Now I also get a different last comparison possiblity:
Code:
> resultsNames(dds)
[1] "Intercept"                        "condition_wildtype_vs_KO"        
[3] "stimulation_yes_vs_no"            "conditionwildtype.stimulationyes"
compared to "conditionwildtype.stimulationnone" from before.
Is there an explanation for this kind of changes?

thanks in advance
frymor is offline   Reply With Quote
Old 11-18-2015, 01:14 AM   #20
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

That's...weird. I don't have a good explanation for that. Well, actually I don't know why you got "Curdlan" as part of a coefficient name before given that it wasn't in the original colData. The coefficient names at least make sense this time. Given that, my only guess is that something went wonky previously.
dpryan is offline   Reply With Quote
Reply

Tags
coefficients, deseq2, time-series, timecourse

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 04:05 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO