SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Reply
 
Thread Tools
Old 08-27-2013, 07:40 AM   #21
NateP
Junior Member
 
Location: Nebraska

Join Date: Sep 2011
Posts: 9
Default

Glad to see contrasts appear in DESeq2, thanks!

One minor thing I noticed when running contrasts:
When you use the Cooks Outlier filtering on the base analysis, the outlier genes (with "NA" pvalues in the regular analysis) are no longer filtered when you do a contrast. I'm not sure if that is intentional or not?
NateP is offline   Reply With Quote
Old 08-27-2013, 08:18 AM   #22
Michael Love
Senior Member
 
Location: Boston

Join Date: Jul 2013
Posts: 333
Default

You're right. Thanks for pointing this out.

I will work on a fix, probably moving the cooksCutoff argument to results() function, so it can be modified without rerunning the GLM fitting.
Michael Love is offline   Reply With Quote
Old 08-29-2013, 06:18 AM   #23
Michael Love
Senior Member
 
Location: Boston

Join Date: Jul 2013
Posts: 333
Default

I committed this fix in version 1.1.31 of the devel branch. It made sense to move the outlier detection by Cook's distance as well as all p-value adjustment to the results() function.
Michael Love is offline   Reply With Quote
Old 10-16-2013, 08:36 AM   #24
KHubbard
Junior Member
 
Location: Baltimore

Join Date: May 2011
Posts: 8
Default baseMeanA and baseMeanB

Hello,

In DESeq, it was possible to view/export the baseMean values for each condition (baseMeanA and baseMeanB) in addition to the overall baseMean. In DESeq2, I have only been able to view/export the overall baseMean. Is it possible to acquire the baseMean values for each condition as well? Thanks!
KHubbard is offline   Reply With Quote
Old 10-16-2013, 08:53 AM   #25
Michael Love
Senior Member
 
Location: Boston

Join Date: Jul 2013
Posts: 333
Default

We generalized the code, and as these columns are only appropriate for a 2 condition analysis, they were dropped.

You can get these with some custom code though:

baseMeanA <- rowMeans(counts(dds,normalized=TRUE)[,colData(dds)$condition == "A"])

More general:

baseMeanPerLvl <- sapply(levels(colData(dds)$condition), function(lvl) rowMeans(counts(dds,normalized=TRUE)[,colData(dds)$condition == lvl]))
Michael Love is offline   Reply With Quote
Old 10-16-2013, 09:57 AM   #26
alisrpp
Member
 
Location: New York

Join Date: Dec 2010
Posts: 40
Default

Hi,

I'm new in DESeq2 and i'm having problems to create my dds.

I have a matrix with eXpress results (read counts) with 2 conditions (SP, EB) and 2 replicates/condition.

This is what i'm doing:

Code:
Cele_SPvsEB = read.csv (file.choose(), header=TRUE, row.names=1)
CeleDesign <- data.frame(
  row.names = colnames(Cele_SPvsEB),
  condition = factor(c("SP", "SP", "EB", "EB")))
dds <- DESeqDataSetFromMatrix(countData = Cele_SPvsEB,
                              colData = CeleDesign,
                              design = ~ condition)
dds
And i'm getting this error message:

Code:
Error in get(name, envir = asNamespace(pkg), inherits = FALSE) : 
  object '.setDummyField' not found
Any help?
Thanks.
alisrpp is offline   Reply With Quote
Old 10-16-2013, 10:04 AM   #27
Michael Love
Senior Member
 
Location: Boston

Join Date: Jul 2013
Posts: 333
Default

Can you show the output of traceback()

and also can you show the output of head(Cele_SPvsEB)
Michael Love is offline   Reply With Quote
Old 10-16-2013, 10:13 AM   #28
alisrpp
Member
 
Location: New York

Join Date: Dec 2010
Posts: 40
Default

Sure!

Code:
> traceback()
14: get(name, envir = asNamespace(pkg), inherits = FALSE)
13: methods:::.setDummyField
12: (function (value) 
    {
        if (missing(value)) 
            `.->data`
        else {
            methods:::.setDummyField(.self, ".->data", "SimpleList", 
                "data", FALSE, value)
            value
        }
    })(quote(<S4 object of class "SimpleList">))
11: assign(field, value, envir = env)
10: envRefSetField(.Object, field, classDef, selfEnv, elements[[field]])
9: methods::initRefFields(.Object, classDef, selfEnv, list(...))
8: initialize(value, ...)
7: initialize(value, ...)
6: methods::new(def, ...)
5: .ShallowSimpleListAssays$new(data = assays)
4: .local(assays, ...)
3: SummarizedExperiment(assays = SimpleList(counts = countData), 
       colData = colData, ...)
2: SummarizedExperiment(assays = SimpleList(counts = countData), 
       colData = colData, ...)
1: DESeqDataSetFromMatrix(countData = Cele_SPvsEB, colData = CeleDesign, 
       design = ~condition)
Code:
> head(Cele_SPvsEB)
                    C4_APR10_spermatic_cysts CRL_2APR10_spermatic_cysts CRL_1_15JUL11_embryos
comp100_c0_seq1                           21                         24                     0
comp100004_c0_seq1                        48                          0                   494
comp1000074_c0_seq1                       42                          0                    55
comp1000109_c0_seq1                       32                         18                     0
comp100011_c0_seq1                        40                          3                   298
comp100013_c0_seq1                        92                          0                    62
                    CRL_2_15JUL11_embryos
comp100_c0_seq1                         0
comp100004_c0_seq1                      0
comp1000074_c0_seq1                     0
comp1000109_c0_seq1                     0
comp100011_c0_seq1                      0
comp100013_c0_seq1                      0
alisrpp is offline   Reply With Quote
Old 10-16-2013, 10:29 AM   #29
Michael Love
Senior Member
 
Location: Boston

Join Date: Jul 2013
Posts: 333
Default

Googling the setDummyField error I came to this thread

http://permalink.gmane.org/gmane.sci...onductor/48611

Where they suggest there might be a problem with using R 3.0.0 with packages built with R 3.0.1. Could this be the case? Can you paste your sessionInfo()
Michael Love is offline   Reply With Quote
Old 10-16-2013, 01:04 PM   #30
alisrpp
Member
 
Location: New York

Join Date: Dec 2010
Posts: 40
Default

Done! I updated R and now is working perfecly.
Thanks Michael!
alisrpp is offline   Reply With Quote
Old 10-16-2013, 01:35 PM   #31
alisrpp
Member
 
Location: New York

Join Date: Dec 2010
Posts: 40
Default

Now i'm getting this error:

Code:
Warning messages:
1: glm.fit: algorithm did not converge 
2: In estimateDispersionsFit(object, fitType = fitType, quiet = quiet) :
  parametric fit failed, trying local fit. use plotDispEsts() to check the quality of the local fit
This is my code:

Code:
#Count matrix input
Cele_SPvsEB = read.csv (file.choose(), header=TRUE, row.names=1)
CeleDesign <- data.frame(
  row.names = colnames(Cele_SPvsEB$target_id),
  condition = factor(c("SP", "SP", "EB", "EB")))
dds <- DESeqDataSetFromMatrix(countData = Cele_SPvsEB,
                              colData = CeleDesign,
                              design = ~ condition)
dds
I obtained this:

Code:
class: DESeqDataSet 
dim: 198821 4 
exptData(0):
assays(1): counts
rownames(198821): comp100_c0_seq1 comp100004_c0_seq1 ... comp99994_c0_seq1 comp99999_c0_seq1
rowData metadata column names(0):
colnames(4): 1 2 3 4
colData names(1): condition
And after running:

Code:
#Differential expression analysis
dds <- DESeq(dds)
(...) the error warning is appearing.

Any idea how to fix that?

I also want to change the colnames, how can i do that?

Thanks.
alisrpp is offline   Reply With Quote
Old 10-17-2013, 05:11 AM   #32
Michael Love
Senior Member
 
Location: Boston

Join Date: Jul 2013
Posts: 333
Default

You don't need to fix anything.

There are 3 levels of messages to the user in R: message, warning and stop/error. A warning is a non-fatal problem. In this case, I am warning the user that the default method for fitting the dispersion trend line didn't work so the software is doing something else instead.

The way to read the warning is that "algorithm did not converge" within the function estimateDispersionsFit. Then it says that the parametric fit failed, and that a local fit will be tried instead.

If you look up ?estimateDispersionsFit it will link over to ?estimateDispersions, here I describe the 3 options for the fitType argument, and the default is parametric. The local fit usually does well as a substitute for the parametric fit, but I also warn the user that they should manually examine the plot generated by plotDispEsts(dds). You should check this plot to make sure the local fit is capturing the trend in the black points, as it should be. If the local fit does not seem to capture the trend in the black points, you can use the last option fitType="mean" instead. Note that the fitType arguments can be provided to DESeq() as well as to estimateDispersions().

For changing columns names, check ?colnames.

Last edited by Michael Love; 10-17-2013 at 06:13 AM.
Michael Love is offline   Reply With Quote
Old 10-17-2013, 04:25 PM   #33
alisrpp
Member
 
Location: New York

Join Date: Dec 2010
Posts: 40
Default

Quote:
The local fit usually does well as a substitute for the parametric fit, but I also warn the user that they should manually examine the plot generated by plotDispEsts(dds). You should check this plot to make sure the local fit is capturing the trend in the black points, as it should be.
This is my dispersion plot, is not exactly like in the manual... so i don't know what to think.
Can you give me your opinion, please?

Thanks,
Attached Images
File Type: jpeg Rplotdispersion_SPvsLR.jpeg (53.5 KB, 95 views)
alisrpp is offline   Reply With Quote
Old 10-17-2013, 07:19 PM   #34
Michael Love
Senior Member
 
Location: Boston

Join Date: Jul 2013
Posts: 333
Default

As there is not a clear dependence of the dispersion estimates on the mean of normalized counts, I would recommend to use fitType="mean".
Michael Love is offline   Reply With Quote
Old 10-19-2013, 06:06 AM   #35
sindrle
Senior Member
 
Location: Norway

Join Date: Aug 2013
Posts: 266
Default

This is what I got with default setting (parametric), does it look ok or should I try mean or local fit?

One question:
What happens when I increase the "maxit" value? From 100 to 200, 159 was reduced to 110 rows that did not converge in beta. I dont know what this means.

Thanks!

Rplot.pdf
sindrle is offline   Reply With Quote
Old 10-19-2013, 06:28 AM   #36
Michael Love
Senior Member
 
Location: Boston

Join Date: Jul 2013
Posts: 333
Default

This dispersion plot looks good/typical.

re:maxit: Unlike with linear regression, there is not a closed form solution for generalized linear models (the model used by DESeq2). Instead there is an iterative algorithm to find the optimal betas (the log fold changes). 'maxit' is the maximum number of iterations to run, and if the convergence criteria has not been met then the software prints a warnings and tells you how to find these genes for which this was the case.

However, I believe most datasets would have all genes converge if you used DESeq2 v1.2 which was just released.

This involves upgrading R to 3.0.2 and Bioconductor to v2.13. Information here:

http://www.bioconductor.org/install/
Michael Love is offline   Reply With Quote
Old 10-19-2013, 05:45 PM   #37
sindrle
Senior Member
 
Location: Norway

Join Date: Aug 2013
Posts: 266
Default

I upgraded all as you said, but still 159 not fitted at maxit 100.
sindrle is offline   Reply With Quote
Old 10-20-2013, 01:49 PM   #38
Michael Love
Senior Member
 
Location: Boston

Join Date: Jul 2013
Posts: 333
Default

Then these might just be difficult rows for some reason.

You can examine them with:

head(counts(dds[!mcols(dds)$betaConv,]))

You can exclude them from the results table with:

res <- results(dds[mcols(dds)$betaConv,])
Michael Love is offline   Reply With Quote
Old 10-21-2013, 08:04 AM   #39
alisrpp
Member
 
Location: New York

Join Date: Dec 2010
Posts: 40
Default

Quote:
Originally Posted by Michael Love View Post
As there is not a clear dependence of the dispersion estimates on the mean of normalized counts, I would recommend to use fitType="mean".
Hi,

By doing what you suggested:

Code:
dds<- estimateSizeFactors(dds)
ddsMean <- estimateDispersions(dds, fitType="mean")
ddsMean <- nbinomWaldTest(ddsMean)
plotDispEsts(ddsMean)
I'm obtaining the subsequent plot.
It's really weird, right?
Should i continue with fitType="mean"?

Thanks.
Attached Files
File Type: pdf SPvsLR_plotDispEsts(ddsMean).pdf (4.39 MB, 90 views)
alisrpp is offline   Reply With Quote
Old 10-21-2013, 01:50 PM   #40
Michael Love
Senior Member
 
Location: Boston

Join Date: Jul 2013
Posts: 333
Default

There is a lot of shrinkage of dispersion estimates (the blue points are close to the red line) presumably because you have few degrees of freedom (number of samples minus number of parameters to estimate). Yes, I would use fitType="mean" for this dataset as there is not a clear trend.
Michael Love is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:07 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO