SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
DESeq2 Contrasts cacti Bioinformatics 4 08-04-2015 12:20 PM
DESeq2 - using contrasts vs. multiple runs for DE analysis rutgers2015 Bioinformatics 2 07-14-2015 11:34 AM
Contrasts with DESeq2 mistrm RNA Sequencing 3 07-24-2014 10:03 AM
Paired design versus unpaired design in DESeq2 KristenC RNA Sequencing 1 05-29-2014 12:05 PM
Differential expression analysis on RPKMs - Contrasts and contrasts of contrasts giorgifm Bioinformatics 6 08-16-2013 11:50 AM

Reply
 
Thread Tools
Old 10-28-2014, 05:16 AM   #21
Michael Love
Senior Member
 
Location: Boston

Join Date: Jul 2013
Posts: 333
Default

I'm not sure about this, but I would recommend following the message and updating R to the latest GUI version.
Michael Love is offline   Reply With Quote
Old 10-28-2014, 07:50 AM   #22
gamb
Member
 
Location: USA

Join Date: Oct 2014
Posts: 11
Default

Thats the thing... I am running the latest GUI
gamb is offline   Reply With Quote
Old 10-28-2014, 09:55 AM   #23
Michael Love
Senior Member
 
Location: Boston

Join Date: Jul 2013
Posts: 333
Default

It might be a problem in .RData, which is loaded every time. Try deleting this file. (although doesn't explain why you get it on two separate computers)
Michael Love is offline   Reply With Quote
Old 12-03-2014, 01:46 PM   #24
gtosser
Junior Member
 
Location: Toulouse

Join Date: Sep 2014
Posts: 4
Default

Quote:
Originally Posted by Michael Love View Post
hi gamb,

Firstly, as you are just starting to analyze, best to update to Bioconductor 3.0 and DESeq2 version 1.6 which was just released.

As the analysis is a bit complicated, I'd recommend using DESeq(dds, betaPrior=FALSE) as this simplifies the building of contrasts with interactions.

The design should be

Code:
~ tissue*chem1*chem2
...because you're interested in both the interactions of chem1 and chem2, and in the specific effects for each tissue.

The list of effects is given by

Code:
resultsNames(dds)
You can build up the contrasts using a list like below. I'll give some examples and see if you can extend to the others. Comparing the tissues alone is simple by specifying the name of the factor and the two levels to compare, e.g.:

Code:
results(dds, contrast=c("tissue","C","B"))
Then some basics about interaction terms: the effect of chem 1 in tissue B will be:

Code:
results(dds, contrast=list(c("chem1","chem1.tissueB")))
...because you need to add the main chem 1 effect (the effect for tissue A, the base level of tissue) and the extra chem 1 effect for tissue B.

If you wanted to test if chem 1 has an effect in tissue B different than tissue A you would only test the interaction term "chem1.tissueB", because this is that difference.

Tissue A is the base level, so the effect of chem 1 in tissue A is simply

Code:
results(dds, name="chem1")
The effect of chem 1 and chem 2 in tissue B is a bit lengthy, you need to add main effects and all relevant interaction terms:

Code:
results(dds, contrast=list(c("chem1","chem2","chem1.tissueB","chem2.tissueB","chem1.chem2","chem1.chem2.tissueB")))
It's simpler for tissue A because this is the base level:

Code:
results(dds, contrast=list(c("chem1","chem2","chem1.chem2")))
And if you want to test only the interaction, answering if the effect of the two chems is different than each separately, you remove some of these effects. For tissue B this contrast would be:

Code:
results(dds, contrast=list(c("chem1.chem2","chem1.chem2.tissueB")))
For tissue A it is again simpler:

Code:
results(dds, name="chem1.chem2")
Note that another way to compare individual groups would be to create a new variable which describes all groups:

Code:
dds$group = factor(paste(dds$chem1, dds$chem2, dds$tissue))
design(dds) = ~ group
Then use the following to compare to specific groups:

Code:
results(dds, contrast=c("group","...","..."))
Dear Michael,

I am afraid I cannot build my contrsats either.
I have cells from 2 species (bovin, porc) that have been treated or not (treatment=pos or treatment=neg)

Here is my model
design (ddsX)<-~species+treatment+species:treatment

resultsNames(ddsX)
#[1] "Intercept" "species_porc_vs_bovin" #"treatment_pos_vs_neg" "speciesporc.treatmentpos"

If I understand well, I can find the genes with a treatment effect with:
results(ddsX, contrast=c("treatment","neg","pos"))

then the genes with a species effect with:
results(ddsX, contrast=c("species","bovin","porc"))

But how can I find the genes:
- from species "bovin" with a treatment effect
- from species "porc" with a treatment effect
- with interaction (that do not react to the treatment in a same way in "bovin" or "porc")?

I am using DESeq2_1.6.2

Thanks for your help

Gwenola
gtosser is offline   Reply With Quote
Old 12-03-2014, 01:51 PM   #25
Michael Love
Senior Member
 
Location: Boston

Join Date: Jul 2013
Posts: 333
Default

hi Gwenola,

Check in the Examples section of the help for ?results:

You should find, Example 2: two conditions, two groups, with interaction term.
Michael Love is offline   Reply With Quote
Old 12-04-2014, 07:19 AM   #26
gtosser
Junior Member
 
Location: Toulouse

Join Date: Sep 2014
Posts: 4
Default Thanks

Thanks a lot.

## Example 2: two conditions, two groups, with interaction term

# the condition effect. Is this the condition effect in groupX?
results(dds, contrast=c("condition","B","A"))


# the condition effect in group B. Is there a typo (condition effect in group Y?)
results(dds, contrast=c(0,0,1,1))
# or, equivalently using list to add these two effects
results(dds, contrast=list(c("condition_B_vs_A","groupY.conditionB")))

If I want the condition effect (whatever the group), is that
results(dds, contrast=c(0,0,1,1/2))?


Best regards
Gwenola

Quote:
Originally Posted by Michael Love View Post
hi Gwenola,

Check in the Examples section of the help for ?results:

You should find, Example 2: two conditions, two groups, with interaction term.
gtosser is offline   Reply With Quote
Old 12-04-2014, 09:08 AM   #27
Michael Love
Senior Member
 
Location: Boston

Join Date: Jul 2013
Posts: 333
Default

Is this the condition effect in groupX?

yes

Is there a typo (condition effect in group Y?)

yes, thanks! I've just fixed this in devel

If I want the condition effect (whatever the group), is that results(dds, contrast=c(0,0,1,1/2))?


yes, that's the condition effect averaged over both groups
Michael Love is offline   Reply With Quote
Old 12-08-2014, 09:17 AM   #28
gtosser
Junior Member
 
Location: Toulouse

Join Date: Sep 2014
Posts: 4
Default Thanks

Thanks for your answer. It is clear now.
Gwenola
gtosser is offline   Reply With Quote
Old 03-30-2015, 05:38 AM   #29
rozitaa
Member
 
Location: Sweden

Join Date: Jun 2013
Posts: 51
Default

Quote:
Originally Posted by Michael Love View Post
There is a way to control for the animal effect in this experiment, by recoding the animals within blocks of chem1*chem2 and adding an interaction term. It gets a bit trickier to form contrasts, but I will show how to do this.

It helps to clean up the colData a bit, I'll show on a clean version, and you can reorder your samples correspondingly. Note that you should reorder the columns of the count matrix so that they match. If you have a SummarizedExperiment, the colData and matrix are tied together, so simply reordering the columns of the SummarizedExperiment does both for you.

I made a sample data.frame like your data:

Code:
d <- data.frame(anim=factor(rep(1:12,each=3)),
  chem1=factor(rep(rep(c("N","Y"),each=6),3)),
  chem2=factor(rep(rep(c("N","Y"),each=3),6)),
  tiss=factor(rep(1:3,12)))
this looks like

Code:
    anim chem1 chem2 tiss
 1     1     N     N    1
 2     1     N     N    2
 3     1     N     N    3
 4     2     N     Y    1
 5     2     N     Y    2
 6     2     N     Y    3
 7     3     Y     N    1
 8     3     Y     N    2
 9     3     Y     N    3
 10    4     Y     Y    1
 11    4     Y     Y    2
 12    4     Y     Y    3
 13    5     N     N    1
 14    5     N     N    2
 15    5     N     N    3
 16    6     N     Y    1
 17    6     N     Y    2
 18    6     N     Y    3
 19    7     Y     N    1
 20    7     Y     N    2
 21    7     Y     N    3
 22    8     Y     Y    1
 23    8     Y     Y    2
 24    8     Y     Y    3
 25    9     N     N    1
 26    9     N     N    2
 27    9     N     N    3
 28   10     N     Y    1
 29   10     N     Y    2
 30   10     N     Y    3
 31   11     Y     N    1
 32   11     Y     N    2
 33   11     Y     N    3
 34   12     Y     Y    1
 35   12     Y     Y    2
 36   12     Y     Y    3
Now reorder by chem1, chem2 and tiss. Then add a new variable which will be used to control animal specific effects within each block of chem1*chem2.

Code:
idx <- with(d, order(chem1, chem2, tiss))
d <- d[idx,]
d$anim.nested <- factor(rep(1:3,12))
You can examine d to see how this works.

The model matrix is then the chem1 and chem2 terms and interaction, as well as the interactions of these with animal, and the interaction of these with tissue:

Code:
m <- model.matrix(~ chem1 + chem2 + chem1:chem2 + chem1:chem2:anim.nested + chem1:chem2:tiss, data=d)
The column names of this are:

Code:
> colnames(m)
  [1] "(Intercept)"                "chem1Y"
  [3] "chem2Y"                     "chem1Y:chem2Y"
  [5] "chem1N:chem2N:anim.nested2" "chem1Y:chem2N:anim.nested2"
  [7] "chem1N:chem2Y:anim.nested2" "chem1Y:chem2Y:anim.nested2"
  [9] "chem1N:chem2N:anim.nested3" "chem1Y:chem2N:anim.nested3"
 [11] "chem1N:chem2Y:anim.nested3" "chem1Y:chem2Y:anim.nested3"
 [13] "chem1N:chem2N:tiss2"        "chem1Y:chem2N:tiss2"
 [15] "chem1N:chem2Y:tiss2"        "chem1Y:chem2Y:tiss2"
 [17] "chem1N:chem2N:tiss3"        "chem1Y:chem2N:tiss3"
 [19] "chem1N:chem2Y:tiss3"        "chem1Y:chem2Y:tiss3"

Many of the contrasts become hard to form, though possible with numeric contrasts. A trick for getting the numeric contrasts of the other comparisons, is to take the column means of the model matrix, for the groups of samples you want to compare. For instance, if you want to find the main chem1 effect in tissue 2, the contrast would be:

Code:
ctrst = colMeans(m[d$tiss == "2" & d$chem1 == "Y" & d$chem2 == "N",]) -
        colMeans(m[d$tiss == "2" & d$chem1 == "N" & d$chem2 == "N",])
This ends up being the chem1 base effect, the chem1Y effect in tissue 2 minus the chem1N effect in tissue 2 and a few 1/3 and -1/3 to the animal interaction effects.

Edit: note that interaction terms can be tested using the same logic, but you need to subtract the groups with only one chem and then add the base group. So it looks like: chem1Y chem2Y - chem1N chem2Y - chem1Y chem2N + chem1N chem2N.

Also, I had to update that custom line of size factor code for when all rows have a zero. If you were using that line of code to avoid the issue that all rows of your count matrix have one or more zeros, you should get the edited version:

https://support.bioconductor.org/p/62246/

We're implementing a solution here which was on the list of todos for a while now, but it won't be out for 6 months. Typical RNA-Seq doesn't have this issue, but since phyloseq we are seeing more metagenomics datasets which often have no genes without a zero.

Hi Michael,

I have also kinda same experimental design and I tried to use your suggestions in this thread but still I get the same error regarding to linear combination.
In my project each sample has been treated with one type of gut_microbiota: A, B, C, D, E and we have measured expression level of each mouse at three different tissues: s, p, l. Now I would like to see DE genes resulted from comparing of every gut_microbiota status (B, C, D, E) to the baseline (A) on each tissue separately. For each type of gut_microbiota I have different number of replicates, varying from 2 up to 7 replicates. Here is my design matrix:

Code:
d <- data.frame(animal=c(rep(c('5231', '5232', '5233', '5234', '5161', '5162', '5163'), each=3), rep('5164', 4) , rep(c('5165', '5166', '5167', '5168', '5169', '5170', '5171', '5172', '5173', '5174', '5175', '5176', '5177', '5178'), each=3), rep('5179', 2)), 
                           gut_microbiota = c(rep('A', 12), rep('B', 9), rep('E', 10), rep('D', 12), rep('B', 9), rep('D', 9), rep('C', 8)), 
                           tissue=c(rep(c('s', 'p', 'l'), each=1, 7), 's', 'p', 'l', 'l', rep(c('s', 'p', 'l'), each=1, 14), c('s', 'p')))
resulted in:

Code:
animal gut_microbiota tissue
1    5231              A      s
2    5231              A      p
3    5231              A      l
4    5232              A      s
5    5232              A      p
6    5232              A      l
7    5233              A      s
8    5233              A      p
9    5233              A      l
10   5234              A      s
11   5234              A      p
12   5234              A      l
13   5161              B      s
14   5161              B      p
15   5161              B      l
16   5162              B      s
17   5162              B      p
18   5162              B      l
19   5163              B      s
20   5163              B      p
21   5163              B      l
22   5164              E      s
23   5164              E      p
24   5164              E      l
25   5164              E      l
26   5165              E      s
27   5165              E      p
28   5165              E      l
29   5166              E      s
30   5166              E      p
31   5166              E      l
32   5167              D      s
33   5167              D      p
34   5167              D      l
35   5168              D      s
36   5168              D      p
37   5168              D      l
38   5169              D      s
39   5169              D      p
40   5169              D      l
41   5170              D      s
42   5170              D      p
43   5170              D      l
44   5171              B      s
45   5171              B      p
46   5171              B      l
47   5172              B      s
48   5172              B      p
49   5172              B      l
50   5173              B      s
51   5173              B      p
52   5173              B      l
53   5174              D      s
54   5174              D      p
55   5174              D      l
56   5175              D      s
57   5175              D      p
58   5175              D      l
59   5176              D      s
60   5176              D      p
61   5176              D      l
62   5177              C      s
63   5177              C      p
64   5177              C      l
65   5178              C      s
66   5178              C      p
67   5178              C      l
68   5179              C      s
69   5179              C      p
I have reordered design matrix as you mentioned and add a new term called, animal.nested:
Code:
idx <- with(d, order(gut_microbiota, tissue))
d <- d[idx, ]
d$animal.nested <- factor(c(rep(1:4, 3), rep(1:6, 3), c(1:2), rep(1:3, 2), rep(1:7, 3), 1, rep(1:3, 3)))
And finally this is my fitting design:
Code:
dds = DESeqDataSetFromHTSeqCount(sampleTable=sample_table, directory='~/htseq/', design= ~ gut_microbiota + gut_microbiota:animal.nested + gut_microbiota:tissue)
Can you please help with this? What is wrong in my codes?

Last edited by rozitaa; 03-30-2015 at 05:41 AM.
rozitaa is offline   Reply With Quote
Old 03-30-2015, 06:57 AM   #30
Michael Love
Senior Member
 
Location: Boston

Join Date: Jul 2013
Posts: 333
Default

hi rozitaa,

The issue here is that there are more animals in certain tissues. So when the R function model.matrix tries to form the model matrix internal to the DESeq() function, this results in a column of zeros for the nested animal 6, microbiota A combination for example.

There is an easy solution though. In the next release of Bioconductor, which will be released 17 April 2015, and has DESeq2 v1.8, you can provide a custom model matrix to DESeq(). (You can test the development version of DESeq2 v1.7 already, but this requires downloading the development version of R, and this is not impossible but not always easy.)

So all you need to do is to remove these interaction columns which have no samples:

Code:
design <- ~ gut_microbiota + gut_microbiota:animal.nested + gut_microbiota:tissue
mm <- model.matix(design, data=colData(dds))
## remove columns of mm which have all zeros:
(all.zero <- apply(mm, 2, function(x) all(x == 0)))
mm <- mm[, !all.zero]
dds <- DESeq(dds, full=mm, betaPrior=FALSE)
We do have to turn off the log2 fold change shrinkage for this options though (betaPrior=FALSE). But the inference and results tables work the same.

Last edited by Michael Love; 03-31-2015 at 06:45 AM. Reason: fixed code
Michael Love is offline   Reply With Quote
Old 03-30-2015, 07:51 AM   #31
rozitaa
Member
 
Location: Sweden

Join Date: Jun 2013
Posts: 51
Default

Quote:
Originally Posted by Michael Love View Post
hi rozitaa,

The issue here is that there are more animals in certain tissues. So when the R function model.matrix tries to form the model matrix internal to the DESeq() function, this results in a column of zeros for the nested animal 6, tissue A combination for example.

There is an easy solution though. In the next release of Bioconductor, which will be released 17 April 2015, and has DESeq2 v1.8, you can provide a custom model matrix to DESeq(). (You can test the development version of DESeq2 v1.7 already, but this requires downloading the development version of R, and this is not impossible but not always easy.)

So all you need to do is to remove these interaction columns which have no samples:

Code:
design <- ~ gut_microbiota + gut_microbiota:animal.nested + gut_microbiota:tissue
mm <- model.matix(design, data=colData(dds))
## remove columns of mm which have all zeros:
(all.zero <- apply(mm, 2, function(x) all(x == 0)))
mm <- mm[, !all.zero]
dds <- DESeq(dds, design=mm, betaPrior=FALSE)
We do have to turn off the log2 fold change shrinkage for this options though (betaPrior=FALSE). But the inference and results tables work the same.

Thanks a lot Micheal! I got the idea!

The only problem that I have now is I am using HTSeq-count data this my code:
Code:
design <- ~ gut_microbiota + gut_microbiota:animal.nested + gut_microbiota:tissue
mm <- model.matrix(design, data=sample_table)
(all.zero <- apply(mm, 2, function(x) all(x == 0)))
mm <- as.data.frame(mm[, !all.zero])
dds = DESeqDataSetFromHTSeqCount(sampleTable=sample_table, directory='~/htseq/', design=mm)
But I get an error:
Code:
Error in terms.default(object, data = data) : 
  no terms component nor attribute
Is it because of the way I am fitting the design matrix into the htseq-count data?
rozitaa is offline   Reply With Quote
Old 03-30-2015, 07:54 AM   #32
Michael Love
Senior Member
 
Location: Boston

Join Date: Jul 2013
Posts: 333
Default

So, again, you'll only be able to do this custom model matrix in version 1.8.

packageVersion("DESeq2")

Additionally, you can only provide the matrix to the 'full' argument of DESeq(). In order to construct a DESeqDataSet object, you can use design=~1, because this will be ignored and the custom matrix will be used instead.

Last edited by Michael Love; 03-31-2015 at 06:46 AM.
Michael Love is offline   Reply With Quote
Old 03-30-2015, 08:04 AM   #33
rozitaa
Member
 
Location: Sweden

Join Date: Jun 2013
Posts: 51
Default

Quote:
Originally Posted by Michael Love View Post
So, again, you'll only be able to do this custom model matrix in version 1.8.

packageVersion("DESeq2")

Additionally, you can only provide the matrix to DESeq(). In order to construct a DESeqDataSet object, you can use design=~1, because this will be ignored and the custom matrix will be used instead.
I see!! I cannot wait till April 17th since I have deadline. And it is hard for me to install the developmental version of R! Is there any other solution for my problem?
rozitaa is offline   Reply With Quote
Old 03-30-2015, 08:35 AM   #34
Michael Love
Senior Member
 
Location: Boston

Join Date: Jul 2013
Posts: 333
Default

"Now I would like to see DE genes resulted from comparing of every gut_microbiota status (B, C, D, E) to the baseline (A) on each tissue separately."

Sorry, reading over your column data and question more carefully, I realized that this advice about nesting doesn't apply to your experimental design. The nesting approach works for experimental designs where the comparison of interest is nested within individuals/animal. For example if you wanted to compare tissue differences for each microbiota. However, you want to compare microbiota differences for each tissue. The problem for analysis with the nesting approach is that you cannot control for the animal-specific effects and compare across microbiota, because these variables are confounded. Whereas, animal and tissue are not confounded, which allows the nesting approach to work.

I'd recommend you subset each tissue into 3 DESeqDataSets and run a design of ~ gut_microbiota for each.
Michael Love is offline   Reply With Quote
Old 03-31-2015, 04:58 AM   #35
rozitaa
Member
 
Location: Sweden

Join Date: Jun 2013
Posts: 51
Default

Quote:
Originally Posted by Michael Love View Post
I'd recommend you subset each tissue into 3 DESeqDataSets and run a design of ~ gut_microbiota for each.
Thanks a lot Micheal!
rozitaa is offline   Reply With Quote
Old 06-22-2015, 09:28 PM   #36
lbragg
Member
 
Location: Brisbane

Join Date: Sep 2009
Posts: 14
Default

Quote:
Originally Posted by Michael Love View Post
Also, I had to update that custom line of size factor code for when all rows have a zero. If you were using that line of code to avoid the issue that all rows of your count matrix have one or more zeros, you should get the edited version:

https://support.bioconductor.org/p/62246/

We're implementing a solution here which was on the list of todos for a while now, but it won't be out for 6 months. Typical RNA-Seq doesn't have this issue, but since phyloseq we are seeing more metagenomics datasets which often have no genes without a zero.
Hi there,

I have a metatranscriptomic dataset where every gene has at least one zero count. I tried the geoMeans function you've supplied via the link, but I keep getting the following error:

Code:
dds = estimateSizeFactors(dds, geoMeans=geoMeans)
Error in if (any(value <= 0)) { : missing value where TRUE/FALSE needed
This is using DESeq2 1.8.1 on R 3.2.

Happy to provide a MRE if you like.

Lauren
lbragg is offline   Reply With Quote
Old 06-23-2015, 06:25 AM   #37
Michael Love
Senior Member
 
Location: Boston

Join Date: Jul 2013
Posts: 333
Default

what does summary(geoMeans) give you?

Last edited by Michael Love; 06-23-2015 at 06:28 AM. Reason: replying to wrong post
Michael Love is offline   Reply With Quote
Old 06-24-2015, 03:50 PM   #38
lbragg
Member
 
Location: Brisbane

Join Date: Sep 2009
Posts: 14
Default

Quote:
Originally Posted by Michael Love View Post
what does summary(geoMeans) give you?
Code:
summary(geoMeans)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  1.414   2.289   2.828   3.731   4.000  28.650
lbragg is offline   Reply With Quote
Old 06-25-2015, 07:19 AM   #39
Michael Love
Senior Member
 
Location: Boston

Join Date: Jul 2013
Posts: 333
Default

Another test (I suspect there is a column with all zeros):

table(colSums(counts(dds)) == 0)
Michael Love is offline   Reply With Quote
Old 06-25-2015, 03:06 PM   #40
lbragg
Member
 
Location: Brisbane

Join Date: Sep 2009
Posts: 14
Default

Quote:
Originally Posted by Michael Love View Post
Another test (I suspect there is a column with all zeros):

table(colSums(counts(dds)) == 0)
You're right, there is one sample with all zeros. Thanks.

Last edited by lbragg; 06-25-2015 at 03:31 PM.
lbragg is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 01:06 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO