Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    Now i'm getting this error:

    Code:
    Warning messages:
    1: glm.fit: algorithm did not converge 
    2: In estimateDispersionsFit(object, fitType = fitType, quiet = quiet) :
      parametric fit failed, trying local fit. use plotDispEsts() to check the quality of the local fit
    This is my code:

    Code:
    #Count matrix input
    Cele_SPvsEB = read.csv (file.choose(), header=TRUE, row.names=1)
    CeleDesign <- data.frame(
      row.names = colnames(Cele_SPvsEB$target_id),
      condition = factor(c("SP", "SP", "EB", "EB")))
    dds <- DESeqDataSetFromMatrix(countData = Cele_SPvsEB,
                                  colData = CeleDesign,
                                  design = ~ condition)
    dds
    I obtained this:

    Code:
    class: DESeqDataSet 
    dim: 198821 4 
    exptData(0):
    assays(1): counts
    rownames(198821): comp100_c0_seq1 comp100004_c0_seq1 ... comp99994_c0_seq1 comp99999_c0_seq1
    rowData metadata column names(0):
    colnames(4): 1 2 3 4
    colData names(1): condition
    And after running:

    Code:
    #Differential expression analysis
    dds <- DESeq(dds)
    (...) the error warning is appearing.

    Any idea how to fix that?

    I also want to change the colnames, how can i do that?

    Thanks.

    Comment


    • #32
      You don't need to fix anything.

      There are 3 levels of messages to the user in R: message, warning and stop/error. A warning is a non-fatal problem. In this case, I am warning the user that the default method for fitting the dispersion trend line didn't work so the software is doing something else instead.

      The way to read the warning is that "algorithm did not converge" within the function estimateDispersionsFit. Then it says that the parametric fit failed, and that a local fit will be tried instead.

      If you look up ?estimateDispersionsFit it will link over to ?estimateDispersions, here I describe the 3 options for the fitType argument, and the default is parametric. The local fit usually does well as a substitute for the parametric fit, but I also warn the user that they should manually examine the plot generated by plotDispEsts(dds). You should check this plot to make sure the local fit is capturing the trend in the black points, as it should be. If the local fit does not seem to capture the trend in the black points, you can use the last option fitType="mean" instead. Note that the fitType arguments can be provided to DESeq() as well as to estimateDispersions().

      For changing columns names, check ?colnames.
      Last edited by Michael Love; 10-17-2013, 06:13 AM.

      Comment


      • #33
        The local fit usually does well as a substitute for the parametric fit, but I also warn the user that they should manually examine the plot generated by plotDispEsts(dds). You should check this plot to make sure the local fit is capturing the trend in the black points, as it should be.
        This is my dispersion plot, is not exactly like in the manual... so i don't know what to think.
        Can you give me your opinion, please?

        Thanks,
        Attached Files

        Comment


        • #34
          As there is not a clear dependence of the dispersion estimates on the mean of normalized counts, I would recommend to use fitType="mean".

          Comment


          • #35
            This is what I got with default setting (parametric), does it look ok or should I try mean or local fit?

            One question:
            What happens when I increase the "maxit" value? From 100 to 200, 159 was reduced to 110 rows that did not converge in beta. I dont know what this means.

            Thanks!

            Rplot.pdf

            Comment


            • #36
              This dispersion plot looks good/typical.

              re:maxit: Unlike with linear regression, there is not a closed form solution for generalized linear models (the model used by DESeq2). Instead there is an iterative algorithm to find the optimal betas (the log fold changes). 'maxit' is the maximum number of iterations to run, and if the convergence criteria has not been met then the software prints a warnings and tells you how to find these genes for which this was the case.

              However, I believe most datasets would have all genes converge if you used DESeq2 v1.2 which was just released.

              This involves upgrading R to 3.0.2 and Bioconductor to v2.13. Information here:

              The Bioconductor project aims to develop and share open source software for precise and repeatable analysis of biological data. We foster an inclusive and collaborative community of developers and data scientists.

              Comment


              • #37
                I upgraded all as you said, but still 159 not fitted at maxit 100.

                Comment


                • #38
                  Then these might just be difficult rows for some reason.

                  You can examine them with:

                  head(counts(dds[!mcols(dds)$betaConv,]))

                  You can exclude them from the results table with:

                  res <- results(dds[mcols(dds)$betaConv,])

                  Comment


                  • #39
                    Originally posted by Michael Love View Post
                    As there is not a clear dependence of the dispersion estimates on the mean of normalized counts, I would recommend to use fitType="mean".
                    Hi,

                    By doing what you suggested:

                    Code:
                    dds<- estimateSizeFactors(dds)
                    ddsMean <- estimateDispersions(dds, fitType="mean")
                    ddsMean <- nbinomWaldTest(ddsMean)
                    plotDispEsts(ddsMean)
                    I'm obtaining the subsequent plot.
                    It's really weird, right?
                    Should i continue with fitType="mean"?

                    Thanks.
                    Attached Files

                    Comment


                    • #40
                      There is a lot of shrinkage of dispersion estimates (the blue points are close to the red line) presumably because you have few degrees of freedom (number of samples minus number of parameters to estimate). Yes, I would use fitType="mean" for this dataset as there is not a clear trend.

                      Comment


                      • #41
                        Originally posted by Michael Love View Post
                        There is a lot of shrinkage of dispersion estimates (the blue points are close to the red line) presumably because you have few degrees of freedom (number of samples minus number of parameters to estimate). Yes, I would use fitType="mean" for this dataset as there is not a clear trend.
                        Hi Michael,

                        First of all, thank you for all the help.

                        I tried fitType="mean" with the code:

                        Code:
                        #Count matrix input
                        Cele_SPvsLR = read.csv (file.choose(), header=TRUE, row.names=1)
                        CeleDesign <- data.frame(
                          row.names = colnames(Cele_SPvsLR),
                          condition = factor(c("SP", "SP", "LR", "LR")))
                        dds <- DESeqDataSetFromMatrix(countData = Cele_SPvsLR,
                                                      colData = CeleDesign,
                                                      design = ~ condition)
                        dds
                        
                        
                        #Est size factor = normalize for library size
                        dds<- estimateSizeFactors(dds)
                        ddsMean <- estimateDispersions(dds, fitType="mean")
                        ddsMean <- nbinomWaldTest(ddsMean)
                        #For a parametric dispersion
                        dds <- DESeq(dds)
                        #plot dispersion
                        plotDispEsts(ddsMean)
                        plotDispEsts(dds)
                        
                        #Differential expression analysis
                        resultsNames(ddsMean)
                        res <- results(ddsMean, name= "condition_SP_vs_LR") 
                        res <- res[order(res$padj),]
                        head(res)
                        plotMA(ddsMean)
                        sum(res$padj < .1, na.rm=TRUE)
                        This resulted in:

                        Code:
                        > sum(res$padj < .1, na.rm=TRUE)
                        [1] 0
                        But when I tried fitType="local" I get:

                        Code:
                        > sum(res$padj < .1, na.rm=TRUE)
                        [1] 337
                        (*see attached plots)

                        The other "weird" (at least for me) thing is that by filtering by overall count to my fitType="local" results I get less genes:

                        Code:
                        > plot(res$baseMean, pmin(-log10(res$pvalue), 50),
                        + log="x", xlab="mean of normalized counts",
                        + ylab=expression(-log[10](pvalue)))
                        Warning message:
                        In xy.coords(x, y, xlabel, ylabel, log) :
                          19407 x values <= 0 omitted from logarithmic plot
                        > abline(v=10, col="red", lwd=1)
                        > use <- res$baseMean >= 10 & !is.na(res$pvalue)
                        > table(use)
                        use
                         FALSE   TRUE 
                         93445 105376 
                        > resFilt <- res [use, ]
                        > resFilt$padj <- p.adjust(resFilt$pvalue, method="BH")
                        > sum(resFilt$padj < .1, na.rm=TRUE)
                        [1] 152
                        Just as a refresher, I'm working with 4 samples (2 replicates/reproductive stages).

                        Do you think that it is a bad idea to use fitType="local" given that it is the fitType that gives me results?

                        Can you help me understand why by filtering my "local" results I am obtaining less genes?

                        Thanks, thanks, and thanks.
                        Attached Files

                        Comment


                        • #42
                          The local fit helps you out by dipping down at the high counts, giving you lower estimates of dispersion there than by taking the mean overall. The large circles are genes whose dispersion estimates are not shrunk because they deviate too far from the fitted line. The local fit is useable as well I would say.

                          I presume -- because you haven't posted the output of sessionInfo() -- that you could be using a version of DESeq2 which performs automatic independent filtering to optimize the number of rejections. Check the Details section of ?results and the argument 'independentFiltering'.

                          Comment


                          • #43
                            Hi!
                            Quick question; what is the default normalisation method in DESeq2?
                            Cant find anything in the manual.

                            Thanks!

                            Comment


                            • #44
                              estimateSizeFactors() is the same function used for normalization as in DESeq, so the method is the same median ratio method as in DESeq. I can try to make this more clear in the man page for DESeq().

                              Comment


                              • #45
                                Ok, but typing in "estimateSizeFactors()" to look for options gives no info just (object, ...)

                                But what is the default? Im curious because Im running edgeR with "upper quartile".

                                Thanks!

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Current Approaches to Protein Sequencing
                                  by seqadmin


                                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                  04-04-2024, 04:25 PM
                                • seqadmin
                                  Strategies for Sequencing Challenging Samples
                                  by seqadmin


                                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                  03-22-2024, 06:39 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 04-11-2024, 12:08 PM
                                0 responses
                                22 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 10:19 PM
                                0 responses
                                24 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 09:21 AM
                                0 responses
                                19 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-04-2024, 09:00 AM
                                0 responses
                                52 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X