Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Question about DESeq2 LFC shrinkage estimation.

    Currently working my way through the paper: Moderated estimation of fold change and dispersion for RNA-Seq data with DESeq2 courtesy of Love, Huber and Anders.

    In the model specification, in eqn(2) there's a term used βir , that I the supplemental tables list as being the logarithmic fold change for gene i and covariate r. This may be a silly question but what exactly is r referring to? My first thought was that it was the various replicates of a treatment extracted from the design matrix. If it's meant to refer to something that a measure of co-variation can be calculated from does that mean that it's referring to each of the different conditions?

    I ask because if my first thought was correct then the model in eqn(10) doesn't make sense - it only makes sense to me if βir is the LFC between two conditions, i.e. all the LFC's are distributed around 0. It's all rather confusing really.

    Is anyone able to make this clearer for me?

    Cheers
    Ben.

  • #2
    Perhaps an example is the simplest approach:
    Code:
    design = ~age+gender
    "age" and "gender" would then be different "r"s. In the case of "age", there may in fact be multiple "r"s, if for example age is a factor with more than two levels (e.g., "young", "middleAged" and "old"). So yes, it's saying that the coefficients are distributed about 0.

    Comment


    • #3
      Okay. That makes sense. Thanks. A further question if I may though.

      The definition of βir provided is the LFC for gene i, covariate r. Does this mean that βir is the LFC between the replicates within a given r or is it between two r's?

      If it's the former, isn't that was what the NB distribution was being fitted on in eqn(1)/(2)? In which case, if I have 3 replicates variance between those three replicates fits a NB but the LFC's between the 3 are normally distributed?

      If it's the later, which two (I have multiple) r's? In which case, is the assumption in eqn(10) that the LFC of all possible comparisons of r's are normally distributed?

      There feels like there's a conceptual thing here that I'm not getting.

      Cheers
      Ben.

      Comment


      • #4
        Well, it's the LFC due to that particular coefficient. Whether it's versus the mean across samples or a traditional intercept (i.e., one of the samples) will depend on whether the expanded model matrix (something particular to DESeq2) is being used.

        In the traditional method (i.e., with no extended model matrix), R will select the alphabetically first factor in a coefficient as the base level for further comparisons, so chose that wisely. This is actually one of the clever things about DESeq2, since the expanded model matrix allows shrinkage with a prior while maintaining constant log2FC due to a contrast regardless of the base level of a factor you chose. You might read "help(nbinomWaldTest)" for some more information.

        Comment


        • #5
          help(nbinomWaldTest) appears to be a much more succinct summary of the paper. It helps, thanks.

          Now that you mention it, I do recall the DESeq2 manual somewhere saying that uses the first factor as the base level somewhere. I appear to have completely failed to connect the dots betwixt the practical instructions in the manual and the theory in the paper.

          Many thanks.
          Ben.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM
          • seqadmin
            Strategies for Sequencing Challenging Samples
            by seqadmin


            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
            03-22-2024, 06:39 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          30 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          32 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 09:21 AM
          0 responses
          28 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-04-2024, 09:00 AM
          0 responses
          52 views
          0 likes
          Last Post seqadmin  
          Working...
          X