Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • plotDEXseq failure- all log2fold values set to NA or NaN

    Dear all,

    I am having issues with my DEXSeq analysis. The symptom appears with plotDEXSeq but I think this is a deeper issue with the underlying computation. The error message I am getting is with plotDEXSeq is:

    Error in plot.window(xlim = c(0, 1), ylim = c(0, max(matr))) :
    need finite 'ylim' values

    The issue is that in my Exon Count set object the log2fold values are all either NA or NaN:
    head(featureData(DexSeqExons.loc)@data$log2fold)
    [1] NaN NaN NaN NaN NaN NaN

    I suspect, but I am not sure, that the following warning points to the problem:
    In fitDispersionFunction(DexSeqExons.loc) :
    Negative intercept value in the dispersion function, it will be set to 0. Check fit diagnostics plot section from the vignette.

    Indeed I see:
    > DexSeqExons.loc@dispFitCoefs
    (Intercept) I(1/means[good])
    0.0000000 0.9973345

    which does not seem right. I can somehow get around the error by changing the function fitDispersionFunction and set
    coefs[1] <- 0.001
    instead of
    coefs[1] <- 0
    when the estimated coefficient is negative. But tinkering with the underlying numerical computations hardly seems like a good thing to do.


    I had never seen such an issue before, and I had been working on the same dataset. It is not clear to me what I may have changed that caused this odd behaviour but I cannot find a way to fix it. There could be something wrong with my gff file which I changed recently but I don't think it is very likely.

    Has anyone experienced a similar issue before? Any tip to fix the problem would be very welcome,

    Thanks,

    Vincent
    Last edited by vplagnol; 01-17-2013, 05:19 PM.

  • #2
    Can you post the plot of dispersion versus mean. Use, e.g.

    Code:
    plot( 
       rowMeans( counts( ecs, normalized=TRUE ) ), 
       fData(ecs)$dispBeforeSharing, 
       log="xy", pch=19, cex=.3, col="#00000040" )

    Comment


    • #3
      Thank you for your quick reply Simon.

      So if I don't use my "fudge", the coefficients are:
      (Intercept) I(1/means[good])
      0.0000000 0.9973345

      I attach my mean vs dispersion plot, which hopefully will make sense to you.
      Note that this is only for a small subset of 300 exons which I use for debugging but the same exact thing happens for the full genome-wide set.
      Attached Files

      Comment


      • #4
        This plot seems perfectly fine to me. Is the red line now done with the original or the fudged coefficients? The intecept (y value of the red line where x=exp(0)=1) looks positive and correct. In any case, it seem to trace the point well, so it should be correct to continue the calculation with this fit.

        By the way, you have called the estimateLog2FoldChanges function, have you? Otherwise, it would be no wonder that you don't have fold change estimates.

        Comment


        • #5
          Argh sorry... this plot was in fact using my fudge. Just tired I suppose. Now here is the problematic plot, I think you will see the issue.

          And here are the messed up coefficients for the fit:
          > DexSeqExons.loc@dispFitCoefs
          (Intercept) I(1/means[good])
          -0.0001751386 0.9973344545

          Basically my understanding is that the first iteration results in a negative intercept. If I set it back to a positive value and let the algorithm iterate, I get the reasonable graph you saw above. But as it stands the fitting gives up right away as soon as such a negative intercept occurs.
          Attached Files

          Comment


          • #6
            It's annoying that these negative intercepts happen, and we need to figure out how to get rid of them. Your fudge, however, seems to do the trick for your data, because with it, the fit looks reasonable.

            After you have modified the dispersion coefficients, use the following code to recalculate the dispersion values from it:

            Code:
            fData(ecs)$dispFitted <- ecs@dispFitCoefs[1] + 
               ecs@dispFitCoefs[2] / colMeans( t(counts(ecs)) / sizeFactors(ecs) )
            fData(ecs)$dispersion <- pmin( pmax( fData(ecs)$dispBeforeSharing, 
               fData(ecs)$dispFitted, na.rm = TRUE), 1e+08)
            (These are the last two lines from 'fitDispersionFunction'.)

            Comment


            • #7
              OK thanks, this is what I needed to hear.

              For what it's worth my solution is not very elegant but because I remove the "break" that occurs after the negative coefficients, the fit continues with more iterations and I do not think I need these extra 2 lines. In my situation the first iteration gives negative values, but resetting the starting coefficients after that seems to converge toward proper parameter values after that. This solves that dataset a least.

              See below my modification to your fitDispersion function, again no claim to have a fix, just a rough idea:

              if (coefs[1] < 0) {
              coefs[1] <- 0.001
              warning("Negative intercept value in the dispersion function, it will be set to 0. Check fit diagnostics plot section from the vignette.")
              #break
              }

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Essential Discoveries and Tools in Epitranscriptomics
                by seqadmin




                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                04-22-2024, 07:01 AM
              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Today, 11:49 AM
              0 responses
              13 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, Yesterday, 08:47 AM
              0 responses
              16 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              61 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              60 views
              0 likes
              Last Post seqadmin  
              Working...
              X