Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • DESeq 1.5.30 - estimateDispersions

    Dear all,

    I have a dataset consisting of a matrix of gene counts for two different conditions with 3 biological replicates each and I want to call diff-expressed genes in response to the treatment.
    However, I am wondering which parameters are most appropriate for me to use in the estimateDispersions function:
    1. method: per-condition or pooled
    2. fitType: parametric, local

    1. I don't exactly understand which difference both functions have computationally; in general per-condition seems more logical to me. I have calculated the amount of diffex genes with both methods and contrary to my expectation the per-condition parameter resulted in more sig. diffex genes.
    Can anyone give me some advise on what to choose and WHY it might be more realistic?

    2. Here I also don't know which setting makes more sense and why. I'd be thankful for suggestions!

    Thanks a lot!

  • #2
    Sometimes, both condition have unequal variance (for example, knock-down samples might differ strongly from each other than untreated control samples because knock-down efficiency is so hard to keep constant), and then, "per-condition" can give more power. This is why this was the default. However, I realized recently that our way of avoiding outliers (see the discussion of 'sharingMode="maximum"' in the vignette) does not work as reliably as I hoped when using "per-condition" estimation. This is why I changed the default to "pooled" and added a note about this fact to the help page. I have some ideas on how to improve this matter but pending that i recommend "pooled".

    For fitType, both ways should give good results, and so far, this does not seem to make much of a difference. If you plot the dispersions against the means, as shown in the vignette, you can see which of the two fit types gives a fit that seem to follow the data more closely.

    Comment


    • #3
      Hi Simon,
      thanks a lot for your answer.

      However, I still don't understand how one single pooled empirical dispersion value "pooled" versus an empirical dispersion value for each condition with biol. replicates "per-condition" is applied for the subsequent calculation, which could help me understand in which case I'd expect more/fewer diffex genes.
      In my case using two different examples (each with 3 biol. repl. per condition) the pooled option reduced the amount of diffex genes. Is this what you would have expected?

      I'm sorry if this is already answered in the threat you mentioned (see the discussion of 'sharingMode="maximum"' in the vignette), which I unfortunately couldn't find (would be great if you could post the link).

      Thanks lots!

      Comment


      • #4
        Additionally, I have added the "funnel" plots of the results of the diffex assessment with the respective # of identified genes of two treatments each with 3 biol. repl. using two different parameter setting for the estimateDispersions function:

        cds.1 <- estimateDispersions( cds.1, sharingMode="maximum", method="per-condition", fitType="local" ); s="max"; m="per-cond"; f="local"
        --> # diffex genes: 415


        cds.1 <- estimateDispersions( cds.1, sharingMode="maximum", method="pool", fitType="local" ); s="max"; m="pool"; f="local"
        --> # diffex genes: 214

        Is it to be expected that a lot of genes with "high" log2FCs and "high" mean expression are not identified as significant?!
        Do these plots look "normal" to you?!

        Thanks a lot!
        Attached Files

        Comment


        • #5
          I'm sorry if this is already answered in the threat you mentioned (see the discussion of 'sharingMode="maximum"' in the vignette), which I unfortunately couldn't find (would be great if you could post the link).
          I mean the vignette, not a thread. See pages 4 to 6 here.

          Originally posted by horizon View Post
          Is it to be expected that a lot of genes with "high" log2FCs and "high" mean expression are not identified as significant?!
          Do these plots look "normal" to you?!
          Use the 'identify' function of R to get the gene IDs of some of those black point with high mean and high log FC and then look at the individual normalized counts. I expect that you will find that they vary a lot form replicate to replicate and this why DESeq (at least the new version) does not call them as differentially expressed.

          Comment


          • #6
            Simon,

            I have a similar question. I have miRNA data and am looking for differentially expressed and get a lot more D.E miRNAs from the previous version of DESeq as compared to the newer version.

            I would like to understand whether, the newer version could be more conservative for lower # of reads data as compared to the older version?

            FYI, the sizefactors for our datasets are:

            u_1 u_2 s_1 s_2
            1.4265463 1.0675662 0.6081645 1.1458061


            where u and s are the conditions and 1 and 2 are the replicates.

            Thanks,
            Praful

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Essential Discoveries and Tools in Epitranscriptomics
              by seqadmin


              The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
              Yesterday, 07:01 AM
            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            39 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            41 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 09:21 AM
            0 responses
            35 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-04-2024, 09:00 AM
            0 responses
            55 views
            0 likes
            Last Post seqadmin  
            Working...
            X