Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Filtering out genes with a very low expression value / ANOVA.

    Hi All,

    I've been analysing RNA-Seq data for differentially expressed genes with DESeq & Partek, and I have a few questions:-

    i) Is it a good idea to filter out genes which are expressed at a very low level? - Sometimes a gene is flagged as being differentially expressed and then when I go back to the raw data I find that the read count is only, say, 5.

    ii) Partek appears to use ANOVA. Having read some of the discussion about using the poisson or negative binomial distribution for count data, is ANOVA a valid statistical test for this kind of data?

    Many thanks in advance.

    Adam

  • #2
    Hi,

    ad i) No. the whole point of using the negative binomial distribution is that it automatically takes into account that for genes will low count rates, higher fold changes are requires to reach significance. So, if something is differentially expressed with only 5 or so counts in one condition, it will be much more in the other one.

    as ii) ANOVA means "Analysis of Variance" and makes sense only normally distributed data, fitted with ordinary least square (OLS) regression. Here, we assume negative binomial instead of a normal distribution, and hence have to use the proper generalizations of OLS and ANOVA, namely generalized linear models (GLMs) and analysis of deviance (ANODEV). Of course, many people use the term "ANOVA" (rather than ANODEV) for GLMs as well, but if Partek assumes normality, I wouldn't trust it. Poisson is even worse, of course.

    NB GLMs and ANODEV are now available in DESeq.

    Simon

    Comment


    • #3
      Hi Adam and Simon,

      two clarifications:

      ad i) - from a type-I error control perspective, Simon is right: if DESeq computes a small p-value, that will have taken the relatively higher variability at low counts into consideration, so the observed difference is significant at that level. From a power point of view, it can (and often does) make sense to filter out genes that have low counts throughout (assessed, e.g., by the average count across all replicates). See Pubmed ID 20460310 (Bourgon et al., PNAS 2010).

      ad ii) - the applicability of ANOVA and least squares (LS) fitting is of course much wider than normal distributed data. However, LS does not do well when the data are very skewed (assymmetric), which is what happens with count data esp. in the lower range. Indeed, GLM / ANODEV is they way go.

      Best wishes
      Wolfgang
      Last edited by Wolfgang Huber; 12-21-2010, 06:52 AM.
      Wolfgang Huber
      EMBL

      Comment


      • #4
        Thanks guys - these answers were very useful.

        Regards, Adam

        Comment


        • #5
          Hi,

          First, I want to be clear that Partek does not assume normality, but some of the statistical tests within our software do make that assumption. Also, what is called "ANOVA" in Partek is the General Linear Model, GLM. While GLM does assume normality, it does not mean that GLM is not valid for non-normal data, it only means that there may be a more powerful test.

          As I mentioned, Partek has many more statistical tests than just GLM, including a wide variety of parametric and non-parametric tests that address the vast majority of medical research studies, including multi-factor studies, repeated measures time series, survival analysis, and prediction based on multiple biomarkers. For simple one factor NGS studies without replicates, the default test in Partek software is Pearson's Chi-Square for alternative splicing in RNA-Seq studies, and log-likelihood ratio test for differential expression, neither of which assume a normal distribution.

          Our philosophy at Partek is to provide easy access to a wide variety of statistical methods, as we don't believe that one single method is best for all data types and all experiment designs.

          Regards,

          Xiaowen

          Comment


          • #6
            Hi Xiaowen,

            many papers on RNA-Seq, when talking about log likelihood ratio (LLR) tests, mean a chi^2 test on the log ratio of two model fits with Poisson GLMs. Is this what Partek does, too? I am asking because this method is as widely used as it is completely inappropriate because it implicitly assumes the absence of biological variation.

            Simon

            Comment


            • #7
              So, I also have a question about using ANOVA. If the statistical test you are performing (lets say between 10 cancers and 10 normals) is using normalized count data (RPKM) instead of raw counts, and if the statistical test is calculated on RPKM values on a gene-by-gene basis (Which for each individual gene is usually normally distributed) how is ANOVA ineffective? Just curious, would love to hear your thoughts.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM
              • seqadmin
                Techniques and Challenges in Conservation Genomics
                by seqadmin



                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                Avian Conservation
                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                03-08-2024, 10:41 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Yesterday, 06:37 PM
              0 responses
              10 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, Yesterday, 06:07 PM
              0 responses
              9 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-22-2024, 10:03 AM
              0 responses
              49 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-21-2024, 07:32 AM
              0 responses
              67 views
              0 likes
              Last Post seqadmin  
              Working...
              X