Seqanswers Leaderboard Ad

**alisrpp** · 10-16-2013, 01:35 PM

Now i'm getting this error:

Code:

Warning messages:
1: glm.fit: algorithm did not converge 
2: In estimateDispersionsFit(object, fitType = fitType, quiet = quiet) :
  parametric fit failed, trying local fit. use plotDispEsts() to check the quality of the local fit

This is my code:

Code:

#Count matrix input
Cele_SPvsEB = read.csv (file.choose(), header=TRUE, row.names=1)
CeleDesign <- data.frame(
  row.names = colnames(Cele_SPvsEB$target_id),
  condition = factor(c("SP", "SP", "EB", "EB")))
dds <- DESeqDataSetFromMatrix(countData = Cele_SPvsEB,
                              colData = CeleDesign,
                              design = ~ condition)
dds

I obtained this:

Code:

class: DESeqDataSet 
dim: 198821 4 
exptData(0):
assays(1): counts
rownames(198821): comp100_c0_seq1 comp100004_c0_seq1 ... comp99994_c0_seq1 comp99999_c0_seq1
rowData metadata column names(0):
colnames(4): 1 2 3 4
colData names(1): condition

And after running:

Code:

#Differential expression analysis
dds <- DESeq(dds)

(...) the error warning is appearing.

Any idea how to fix that?

I also want to change the colnames, how can i do that?

Thanks.

**Michael Love** · 10-17-2013, 05:11 AM

You don't need to fix anything.

There are 3 levels of messages to the user in R: message, warning and stop/error. A warning is a non-fatal problem. In this case, I am warning the user that the default method for fitting the dispersion trend line didn't work so the software is doing something else instead.

The way to read the warning is that "algorithm did not converge" within the function estimateDispersionsFit. Then it says that the parametric fit failed, and that a local fit will be tried instead.

If you look up ?estimateDispersionsFit it will link over to ?estimateDispersions, here I describe the 3 options for the fitType argument, and the default is parametric. The local fit usually does well as a substitute for the parametric fit, but I also warn the user that they should manually examine the plot generated by plotDispEsts(dds). You should check this plot to make sure the local fit is capturing the trend in the black points, as it should be. If the local fit does not seem to capture the trend in the black points, you can use the last option fitType="mean" instead. Note that the fitType arguments can be provided to DESeq() as well as to estimateDispersions().

For changing columns names, check ?colnames.

**alisrpp** · 10-17-2013, 04:25 PM

The local fit usually does well as a substitute for the parametric fit, but I also warn the user that they should manually examine the plot generated by plotDispEsts(dds). You should check this plot to make sure the local fit is capturing the trend in the black points, as it should be.

This is my dispersion plot, is not exactly like in the manual... so i don't know what to think.
Can you give me your opinion, please?

Thanks,

Attached Files

Rplotdispersion_SPvsLR.jpeg (53.5 KB, 95 views)

**Michael Love** · 10-17-2013, 07:19 PM

As there is not a clear dependence of the dispersion estimates on the mean of normalized counts, I would recommend to use fitType="mean".

**sindrle** · 10-19-2013, 06:06 AM

This is what I got with default setting (parametric), does it look ok or should I try mean or local fit?

One question:
What happens when I increase the "maxit" value? From 100 to 200, 159 was reduced to 110 rows that did not converge in beta. I dont know what this means.

Thanks!

Rplot.pdf

**Michael Love** · 10-19-2013, 06:28 AM

This dispersion plot looks good/typical.

re:maxit: Unlike with linear regression, there is not a closed form solution for generalized linear models (the model used by DESeq2). Instead there is an iterative algorithm to find the optimal betas (the log fold changes). 'maxit' is the maximum number of iterations to run, and if the convergence criteria has not been met then the software prints a warnings and tells you how to find these genes for which this was the case.

However, I believe most datasets would have all genes converge if you used DESeq2 v1.2 which was just released.

This involves upgrading R to 3.0.2 and Bioconductor to v2.13. Information here:

Bioconductor - Install

http://www.bioconductor.org/install/

The Bioconductor project aims to develop and share open source software for precise and repeatable analysis of biological data. We foster an inclusive and collaborative community of developers and data scientists.

**sindrle** · 10-19-2013, 05:45 PM

I upgraded all as you said, but still 159 not fitted at maxit 100.

**Michael Love** · 10-20-2013, 01:49 PM

Then these might just be difficult rows for some reason.

You can examine them with:

head(counts(dds[!mcols(dds)$betaConv,]))

You can exclude them from the results table with:

res <- results(dds[mcols(dds)$betaConv,])

**alisrpp** · 10-21-2013, 08:04 AM

Originally posted by Michael Love View Post

As there is not a clear dependence of the dispersion estimates on the mean of normalized counts, I would recommend to use fitType="mean".

Hi,

By doing what you suggested:

Code:

dds<- estimateSizeFactors(dds)
ddsMean <- estimateDispersions(dds, fitType="mean")
ddsMean <- nbinomWaldTest(ddsMean)
plotDispEsts(ddsMean)

I'm obtaining the subsequent plot.
It's really weird, right?
Should i continue with fitType="mean"?

Thanks.

Attached Files

SPvsLR_plotDispEsts(ddsMean).pdf (4.39 MB, 90 views)

**Michael Love** · 10-21-2013, 01:50 PM

There is a lot of shrinkage of dispersion estimates (the blue points are close to the red line) presumably because you have few degrees of freedom (number of samples minus number of parameters to estimate). Yes, I would use fitType="mean" for this dataset as there is not a clear trend.

**alisrpp** · 10-21-2013, 08:06 PM

Originally posted by Michael Love View Post

There is a lot of shrinkage of dispersion estimates (the blue points are close to the red line) presumably because you have few degrees of freedom (number of samples minus number of parameters to estimate). Yes, I would use fitType="mean" for this dataset as there is not a clear trend.

Hi Michael,

First of all, thank you for all the help.

I tried fitType="mean" with the code:

Code:

#Count matrix input
Cele_SPvsLR = read.csv (file.choose(), header=TRUE, row.names=1)
CeleDesign <- data.frame(
  row.names = colnames(Cele_SPvsLR),
  condition = factor(c("SP", "SP", "LR", "LR")))
dds <- DESeqDataSetFromMatrix(countData = Cele_SPvsLR,
                              colData = CeleDesign,
                              design = ~ condition)
dds


#Est size factor = normalize for library size
dds<- estimateSizeFactors(dds)
ddsMean <- estimateDispersions(dds, fitType="mean")
ddsMean <- nbinomWaldTest(ddsMean)
#For a parametric dispersion
dds <- DESeq(dds)
#plot dispersion
plotDispEsts(ddsMean)
plotDispEsts(dds)

#Differential expression analysis
resultsNames(ddsMean)
res <- results(ddsMean, name= "condition_SP_vs_LR") 
res <- res[order(res$padj),]
head(res)
plotMA(ddsMean)
sum(res$padj < .1, na.rm=TRUE)

This resulted in:

Code:

> sum(res$padj < .1, na.rm=TRUE)
[1] 0

But when I tried fitType="local" I get:

Code:

> sum(res$padj < .1, na.rm=TRUE)
[1] 337

(*see attached plots)

The other "weird" (at least for me) thing is that by filtering by overall count to my fitType="local" results I get less genes:

Code:

> plot(res$baseMean, pmin(-log10(res$pvalue), 50),
+ log="x", xlab="mean of normalized counts",
+ ylab=expression(-log[10](pvalue)))
Warning message:
In xy.coords(x, y, xlabel, ylabel, log) :
  19407 x values <= 0 omitted from logarithmic plot
> abline(v=10, col="red", lwd=1)
> use <- res$baseMean >= 10 & !is.na(res$pvalue)
> table(use)
use
 FALSE   TRUE 
 93445 105376 
> resFilt <- res [use, ]
> resFilt$padj <- p.adjust(resFilt$pvalue, method="BH")
> sum(resFilt$padj < .1, na.rm=TRUE)
[1] 152

Just as a refresher, I'm working with 4 samples (2 replicates/reproductive stages).

Do you think that it is a bad idea to use fitType="local" given that it is the fitType that gives me results?

Can you help me understand why by filtering my "local" results I am obtaining less genes?

Thanks, thanks, and thanks.

Attached Files

**Michael Love** · 10-22-2013, 04:46 AM

The local fit helps you out by dipping down at the high counts, giving you lower estimates of dispersion there than by taking the mean overall. The large circles are genes whose dispersion estimates are not shrunk because they deviate too far from the fitted line. The local fit is useable as well I would say.

I presume -- because you haven't posted the output of sessionInfo() -- that you could be using a version of DESeq2 which performs automatic independent filtering to optimize the number of rejections. Check the Details section of ?results and the argument 'independentFiltering'.

**sindrle** · 10-22-2013, 07:37 AM

Hi!
Quick question; what is the default normalisation method in DESeq2?
Cant find anything in the manual.

Thanks!

**Michael Love** · 10-22-2013, 07:45 AM

estimateSizeFactors() is the same function used for normalization as in DESeq, so the method is the same median ratio method as in DESeq. I can try to make this more clear in the man page for DESeq().

**sindrle** · 10-22-2013, 07:59 AM

Ok, but typing in "estimateSizeFactors()" to look for options gives no info just (object, ...)

But what is the default? Im curious because Im running edgeR with "upper quartile".

Thanks!

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 22 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 24 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 19 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News