SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Offline Tools or packages (prefer R) for miRNA-seq time series counts data analysis ? unique379 Bioinformatics 0 10-10-2014 04:31 AM
RNAseq time series data w/ controls for each time point Mocca RNA Sequencing 2 08-07-2013 11:12 PM
Comparative Time Series Analysis of RNA-Seq data? ashuchawla Bioinformatics 8 07-25-2013 12:12 PM
How to analyse time-series data with edgeR ca85 RNA Sequencing 0 10-06-2011 05:04 AM
time series analysis of count data? greigite Bioinformatics 0 06-14-2011 05:11 PM

Reply
 
Thread Tools
Old 02-01-2015, 02:11 PM   #1
Beebola
Junior Member
 
Location: Canada & Prague

Join Date: Jan 2015
Posts: 4
Default DESeq2 Analysis of Time Series data and model comparison

I have a question regarding defining and comparing models in DESeq2. My data are counts (index of abundance) for each taxon identified through fungal ITS high-throughput amplicon data from soils sampled seasonally under three different stand types.

Experimental design:
• randomized complete block - blocks=6; treatment=3 (standType = spruce, beech, oak) = 18 experimental units
• sampling times = 5 seasons (fall_1, winter, spring, summer, fall_2)
• total sample size = 90 samples

I am interested in looking at the response of species over time in my treatments with the goal of identifying species that show time dependency differences in counts among stand types (i.e. significant stand x time interactions). I have been following the “RNA-Seq workflow:gene-level exploratory analysis and differential expression” vignette (http://www.bioconductor.org/help/workflows/rnaseqGene/), specifically the section on “Time series experiments”.
I used the two models below to test if the interaction of standType:Season was an important factor.

m1 (full) ~standType+ standType:Season
m2 (reduced) ~ standType

This is a relatively similar set up to the vignette and my understanding is that since the only difference between the two models is the interaction term, species with low p values in the Likelihood Ratio Test results table are the species that show stand-specific effects in time. Correct?

I can pull out the species that show different patterns over time among the three treatments by querying all the contrast via standType and comparing the species with significant padj values between treatments. From here I can visualize patterns of any particular species by plotting the counts using code in the vignette.

This seems good so far, but after reading more on setting up models, I realize I have not properly defined my models. Since I have repeatedly sampled the same 18 sites five different times, I wonder if
• m1 (full) should be defined as a repeated measures design keeping samples = 90 which would more properly account for the non-independence of samples collected at the same site.
• and m2 (reduced) should identify samples from the same site as replicates collapsing the 90 samples to = 18 to avoid pseudoreplication and associated concerns about artificial inflation of statistical power.

It seems that the model statements should be structured to ensure the data are processed as repeated measures of 18 experimental units not as 90 independent samples. Am I correct in being concerned about this? If so, should I be looking into how to specify repeated measures model statements for m1 and m2?
Beebola is offline   Reply With Quote
Old 02-02-2015, 01:10 PM   #2
Michael Love
Senior Member
 
Location: Boston

Join Date: Jul 2013
Posts: 333
Default

Your initial design doesn't include Season in the full and reduced, which is pretty important (we have this in our rnaseqGene example as 'time'). This term keeps track of differences over season, while the interaction term keeps track of differences over season which are specific for individual standTypes.

re: the distinct sites, there is a useful trick for specifying such a design (originally from the edgeR user guide). I posted this on the Bioc support site here: https://support.bioconductor.org/p/62357/#62368 Take a look at that post to get the idea. Your model should take into account the 18 sites, by including a new variable 'nested.site' which keeps track of the 6 sites within each standType. This will be a column of numbers 1-6 identifying the different sites within each standType, and tracking these across all seasons.

You can then use a design (corrected):

Code:
~ season + standType + standType:nested.site + season:standType
The reduced model would only remove season:standType, in order to find genes where there are standType specific differences across season.

Last edited by Michael Love; 02-06-2015 at 07:31 AM.
Michael Love is offline   Reply With Quote
Old 02-05-2015, 08:23 AM   #3
Beebola
Junior Member
 
Location: Canada & Prague

Join Date: Jan 2015
Posts: 4
Default

Thanks for the reply and advice on model design!

I didn’t originally include season in my model as I did not set out to specifically test for seasonal effect and communities group strongly to standType. I was thinking that leaving out the term would just mean the variation caused by season would be attributed to the unexplained variation, but maybe this is not fully correct.

I ran the models as describe above using a data column with siteID (1-18) for the information of nested.site but ran into this error:
litDEnf_seas1<-DESeqDataSet(litternDEf, ~ standType + season + season:siteID + season:standTtye)

Error in DESeqDataSet(litternDEf, ~standType + season + season:siteID + season:standType) :
the model matrix is not full rank, so the model cannot be fit as specified.
one or more variables or interaction terms in the design formula
are linear combinations of the others and must be removed


Shoko also notes this issue here: https://support.bioconductor.org/p/63134/ where it seemed to be a result of missing samples.

Quote: "You are encountering this error, because you have missing samples dispersed in the cross of treatment and time"

I cannot see any missing sample/treatment combinations but it is expected that a species may not be present in all of the 90 samples (or in this model, in all 18 sites or in all 5 seasons). Would species absences at a given site in a given season cause the error? The error only occurs when I add the interaction term season:siteID into the model.

Thanks in advance
Beebola is offline   Reply With Quote
Old 02-06-2015, 06:54 AM   #4
Michael Love
Senior Member
 
Location: Boston

Join Date: Jul 2013
Posts: 333
Default

Can you show the full column data, including the new siteID columns?
Michael Love is offline   Reply With Quote
Old 02-06-2015, 07:24 AM   #5
Beebola
Junior Member
 
Location: Canada & Prague

Join Date: Jan 2015
Posts: 4
Default

Hi,
the data are attached in a .txt file.

Thanks,

Barbara
Attached Files
File Type: txt DESeq_litter_samdata.txt (5.0 KB, 16 views)
Beebola is offline   Reply With Quote
Old 02-06-2015, 07:37 AM   #6
Michael Love
Senior Member
 
Location: Boston

Join Date: Jul 2013
Posts: 333
Default

So you don't have the same problem as Shoko.

Go back and check the Bioc support link I posted.

You need to create a new column, nested.site, which should only be values "site_1" to "site_6". These keep track of the different sites within each stand type.

Here is a small example of how it should look:

spruce site_1 fall1
spruce site_2 fall1
spruce site_3 fall1
...
beech site_1 fall1
beech site_2 fall1
beech site_3 fall1
...
oak site_1 fall1
oak site_2 fall1
oak site_3 fall1

It's not a problem that spruce site_1 is not the same site as oak site_1, because we use an interaction term standType:nested.site, which produces unique coefficients in the model for each combination of stand and site.
Michael Love is offline   Reply With Quote
Old 02-06-2015, 08:15 AM   #7
Beebola
Junior Member
 
Location: Canada & Prague

Join Date: Jan 2015
Posts: 4
Thumbs up

Okay,I see the issue. I just ran the model with the new column data and it ran without error. Thanks for the help and prompt reply!
Beebola is offline   Reply With Quote
Old 07-13-2015, 04:26 PM   #8
lbragg
Member
 
Location: Brisbane

Join Date: Sep 2009
Posts: 14
Default

Quote:
Originally Posted by Michael Love View Post
So you don't have the same problem as Shoko.

Go back and check the Bioc support link I posted.

You need to create a new column, nested.site, which should only be values "site_1" to "site_6". These keep track of the different sites within each stand type.

Here is a small example of how it should look:

spruce site_1 fall1
spruce site_2 fall1
spruce site_3 fall1
...
beech site_1 fall1
beech site_2 fall1
beech site_3 fall1
...
oak site_1 fall1
oak site_2 fall1
oak site_3 fall1

It's not a problem that spruce site_1 is not the same site as oak site_1, because we use an interaction term standType:nested.site, which produces unique coefficients in the model for each combination of stand and site.
I have a longitudinal study where individuals are nested within diet, and also there may be a batch processing effect (samples processed in two batches).

Would I make two of these new id columns, one for individuals within diet, and one for individuals within batch?

Thanks!
lbragg is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 04:45 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO