Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Flipped Results

    I have a total of twelve samples

    3 replicates per condition

    Mutant(3) and Wild-Type(3) in exponential

    Mutant(3) and Wild-Type(3) in stationary

    Before I submit the raw counts to DESeq2 for Mutant/WT(exponential) comparison, in one of the genes the counts are more for the Mutants in the exponential condition than in Wild-Type

    After running DESeq2 and getting the results and normalized counts, the counts are flipped and the change is insignificant.


    In comparing ALL FOUR in contrasting I find the normalized counts are flipped from the original raw counts but the results this time are significant.


    Which one do I go with? Why would the raw counts be completely flipped?

    It's literally the difference between positive and negative regulation...


    I took the normalized raw counts from the comparison between all four conditions and did my own glm.nb analysis and found that this is a statistically significant change... I would not have found this if it was from the normalized counts from the comparison between two conditions... Moreover, the counts are completely flipped... Any thoughts on this?

    I guess flipping of counts happens a lot as I look at it more closely which has a lot to do with the size factor but which comparison is correct? Do I go with significant or insignificant... I am sort of lost on this....
    Last edited by TheSeqGeek; 04-24-2014, 06:17 PM.

  • #2
    You'd have to show us the code you used for us to be able to follow along. The normalized counts aren't actually used in calculations (in fact, they're not even stored!) but the normalization factor is, which is presumably different between your usage and DESeq2's (I would suspect that this is leading to the sign reversal). Further, you're just looking at one gene with no information sharing, which can make a big difference in terms of significance and reliability (the dispersion shrinkage is a really good idea when you have a limited number of samples). Then there's the multiple testing difference, though perhaps you're comparing raw p-values.

    Comment


    • #3
      Originally posted by dpryan View Post
      You'd have to show us the code you used for us to be able to follow along. The normalized counts aren't actually used in calculations (in fact, they're not even stored!) but the normalization factor is, which is presumably different between your usage and DESeq2's (I would suspect that this is leading to the sign reversal). Further, you're just looking at one gene with no information sharing, which can make a big difference in terms of significance and reliability (the dispersion shrinkage is a really good idea when you have a limited number of samples). Then there's the multiple testing difference, though perhaps you're comparing raw p-values.

      If I do a log fold change with the normalized values I get roughly the same log fold change with the same sign as what is being spat out in the end by DESeq in the final result. Those normalized counts must be used for the final results otherwise I would get a sign reversal.

      In the code below I am comparing just the exponential samples.

      As far as I understand DESeq2 computes the "sizeFactor" for each sample and then basically weighs each sample by that weight factor. In my opinion its almost no different than weighing by the total library size. That's where the reversal comes from.


      Code:
      
      library(DESeq2)
      
      #import the data
      directory<-"/Users/Nme/Documents/R/Files//New-Counts"
      sampleFiles <- grep("E",list.files(directory),value=TRUE)
      
      #Give names to experimental conditions
      sampleCondition<-c("Mutant","Mutant","Mutant","WT","WT","WT")
      sampleTable<-data.frame(sampleName=sampleFiles, fileName=sampleFiles, condition=sampleCondition)
      sampleTable
      
      ddsHTSeq<-DESeqDataSetFromHTSeqCount(sampleTable=sampleTable, directory=directory, design=~condition)
      colData(ddsHTSeq)$condition<-factor(colData(ddsHTSeq)$condition, levels=c("WT","Mutant"))
      
      
      dds<-DESeq(ddsHTSeq)
      res<-results(dds)
      res<-res[order(res$padj),]
      head(res)
      resultsNames(dds)
      sizeFactors(dds)
      normalizedCounts <- t( t(counts(dds)) / sizeFactors(dds) )
      
      write.csv(normalizedCounts, "Exponential_normalized_counts.csv")
      write.csv(res, "Exponential_results.csv")

      Comment


      • #4
        I think if you switch to this you’ll get what you want. In the conditions table, the one listed first is the base from which the fold changes are calculated relative to.

        Code:
        sampleCondition<-c("WT","WT",”WT”,"Mutant","Mutant","Mutant")

        Comment


        • #5
          Originally posted by Wallysb01 View Post
          I think if you switch to this you’ll get what you want. In the conditions table, the one listed first is the base from which the fold changes are calculated relative to.

          Code:
          sampleCondition<-c("WT","WT",”WT”,"Mutant","Mutant","Mutant")
          It doesn't change the fact that Gene1 had more counts for the mutant than the WT and Gene2 had more counts for mutant than the WT but after nornaliation Gene2 was flipped and Gene1 was not.

          It has all to do with the size factor...

          If I weigh my mutants by 1.3, 1,2, 1.1 and my WTs by 0.9, 0.95, 0.87 then depending on the counts some genes will be flipped...

          If you looks at the sample table it is correct... so changing WT relative to Mutant in sampleCondition won't change any of this...

          Comment


          • #6
            Originally posted by dpryan View Post
            You'd have to show us the code you used for us to be able to follow along. The normalized counts aren't actually used in calculations (in fact, they're not even stored!) but the normalization factor is, which is presumably different between your usage and DESeq2's (I would suspect that this is leading to the sign reversal). Further, you're just looking at one gene with no information sharing, which can make a big difference in terms of significance and reliability (the dispersion shrinkage is a really good idea when you have a limited number of samples). Then there's the multiple testing difference, though perhaps you're comparing raw p-values.

            I used a Bonferroni adjustment. The most stringent one as I understand it to be.

            Comment


            • #7
              Well, I guess I should have said that it uses them for some things but not everything (after all, the negative binomial distribution itself needs integers). The normalized values will give you a decent guesstimate about the fold-change you'll get, at least it's sign. However they'll never give you the exact value, particularly since the fold-change is shrunken (that's what the "moderated estimation of fold-change" part of the title of the DESeq2 paper is referring to). How similar the size factor used by DESeq2 is to that produced by just normalizing by library size will be experiment dependent, though just using library-size normalization should be avoided for the reason layed out in the original DESeq and edgeR papers.

              Of course the size factors may cause some fold-changes to change sign. Not using a size factor (or using an incorrect one) will produce meaningless results.

              Comment


              • #8
                Bonferroni is better referred to as the most conservative and really shouldn't be used anymore since you'd just be throwing away meaningful results with it. The "BH" method has supplanted it for good reasons.

                Comment


                • #9
                  Originally posted by dpryan View Post
                  Bonferroni is better referred to as the most conservative and really shouldn't be used anymore since you'd just be throwing away meaningful results with it. The "BH" method has supplanted it for good reasons.
                  Sure I can get the ranks of the p-values by running a loop to for all the genes and adjust the pvalue by number of samples divided by the rank.

                  I guess maybe now that I know how it works I'm not quite sure what the real benefits are of DESEq2 compared to doing this on your own by applying a more sophisticated normalization technique and extracting differentially expressed genes.

                  Comment


                  • #10
                    It depends a bit on what you have in mind in terms of methodological difference. DESeq2 can accept any size factors you give it (including different factors for each gene), so if that's your main complaint then just don't use that particular function. Other than that the methods are generally quite good, so I'm curious what you find lacking (after all, why reinvent the wheel if the one you have is doing what you want).

                    Comment


                    • #11
                      Originally posted by dpryan View Post
                      It depends a bit on what you have in mind in terms of methodological difference. DESeq2 can accept any size factors you give it (including different factors for each gene), so if that's your main complaint then just don't use that particular function. Other than that the methods are generally quite good, so I'm curious what you find lacking (after all, why reinvent the wheel if the one you have is doing what you want).

                      Not reinvent... adjust... or mold to my expectations as they grow...

                      I guess I don't find anything special in normalization data with a fudge factors... I can do that upfront with my library size as one of the factors... I compared the data with the way I normalize it to DESeq2 there's hardly a difference... I guess I see it as "I know more variables than a statistical package".

                      Anyway, it's a good program for some applications, I just figured out its insides and moving on.

                      Comment


                      • #12
                        Library size normalization is known to not be robust to things like differences in rRNA depletion between samples, which is a common occurrence, so I'd advise you to not just use that as a factor. It is, of course, good to not just blindly accept the values produced by DESeq2, it's certainly not a magic wand (I've certainly had to tweak things on occasion).

                        Comment


                        • #13
                          Originally posted by dpryan View Post
                          Library size normalization is known to not be robust to things like differences in rRNA depletion between samples, which is a common occurrence, so I'd advise you to not just use that as a factor. It is, of course, good to not just blindly accept the values produced by DESeq2, it's certainly not a magic wand (I've certainly had to tweak things on occasion).
                          Completely agree. I am trying to develop something one my own with information that I ALREADY KNOW about certain genes, but anyway not sure I have the patience for it.

                          What do you think about conditional quantile normalization by Irizzary?

                          Comment


                          • #14
                            I've needed to use cqn once, when there seemed to be a weird difference introduced in the library prep (it never reoccurred, so who knows why). For most of my datasets it doesn't really change anything since I don't normally have much in the way of the biases that it's normally intended to control for. For the one case when I needed it it seemed to perform nicely (I think I was using it with limma, which is also a convenient package).

                            Comment


                            • #15
                              Originally posted by dpryan View Post
                              I've needed to use cqn once, when there seemed to be a weird difference introduced in the library prep (it never reoccurred, so who knows why). For most of my datasets it doesn't really change anything since I don't normally have much in the way of the biases that it's normally intended to control for. For the one case when I needed it it seemed to perform nicely (I think I was using it with limma, which is also a convenient package).
                              Neat thanks!

                              schöne Wochenende

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Essential Discoveries and Tools in Epitranscriptomics
                                by seqadmin


                                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
                                Yesterday, 07:01 AM
                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              55 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              52 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              45 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              55 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X