Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Statistical tests for differential gene expression in RNA-Seq

    Dear all,

    I´m a beginner in the RNA-Seq world who recently got some results to analyse and process. The data was analized by two pipelines in parallel: Tophat/Bowtie-->HTSeq count-->DESeq2 and in the CLC Genomics Workbench. So now I have 3 different outcomes from 3 statistical approaches, the one from DESeq2, EDGE and Baggerley´s test from CLC Genomics. Then I tried to find coherences among them, so I filtered the adjusted p-values (with the same threshold) from each test and compare the filtered genes lists to see how similar they are.
    What I got seems not very consistent to me. From DESeq2 there are around 1500 differential expressed genes, while from EDGE there are around 2000 and finally from Baggerley I got around 3000. I have read that the data for DESeq2 and EDGE should follow a Negative Binomial distribution while the data for Baggerley´s should follow a Beta-Binomial.

    Any clue about why I got so much difference in significantly differential expressed genes among those 3 statistical approaches? Which one should I use?

    Thanks a lot in advanced
    regards

  • #2
    I'm not sure there is even an answer for that other than maybe to run several tests, as you have, and take the intersection of the genes or only take genes that show up in the majority of the tests (so if gene X is DE in 2 of the 3 tools keep that one). Keep in mind that these tools like to try to avoid reporting false-positives however their false-negative rates can be pretty bad. The following paper is relevant:

    Background RNA sequencing (RNA-Seq) has emerged as a powerful approach for the detection of differential gene expression with both high-throughput and high resolution capabilities possible depending upon the experimental design chosen. Multiplex experimental designs are now readily available, these can be utilised to increase the numbers of samples or replicates profiled at the cost of decreased sequencing depth generated per sample. These strategies impact on the power of the approach to accurately identify differential expression. This study presents a detailed analysis of the power to detect differential expression in a range of scenarios including simulated null and differential expression distributions with varying numbers of biological or technical replicates, sequencing depths and analysis methods. Results Differential and non-differential expression datasets were simulated using a combination of negative binomial and exponential distributions derived from real RNA-Seq data. These datasets were used to evaluate the performance of three commonly used differential expression analysis algorithms and to quantify the changes in power with respect to true and false positive rates when simulating variations in sequencing depth, biological replication and multiplex experimental design choices. Conclusions This work quantitatively explores comparisons between contemporary analysis tools and experimental design choices for the detection of differential expression using RNA-Seq. We found that the DESeq algorithm performs more conservatively than edgeR and NBPSeq. With regard to testing of various experimental designs, this work strongly suggests that greater power is gained through the use of biological replicates relative to library (technical) replicates and sequencing depth. Strikingly, sequencing depth could be reduced as low as 15% without substantial impacts on false positive or true positive rates.


    Figures 3 and 4 are kind of telling in that even with 12 replicates per condition their true positive rate is still less than 50%.
    /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
    Salk Institute for Biological Studies, La Jolla, CA, USA */

    Comment


    • #3
      Did you select differentially expressed genes solely by a statistical threshold? What if you simultaneously add a fold change threshold as well - do you get more consistent lists? You can look at the rank order correlation of fold change to see how well it behaves across the different analyses.

      Look at some MA plots from each analyses and see if one or the other shows some skew that might indicate a normalization bias.
      Michael Black, Ph.D.
      ScitoVation LLC. RTP, N.C.

      Comment


      • #4
        Thanks for your answers and paper suggestion.

        Actually, I did both, first I selected those genes with adjusted p-value below 0.05 and afterwards sorted them looking for those upregulated and downregulated. The comparison among the top ranked ones, in each statistical approach, resulted in 33% of genes being significantly differential expressed in just one test, 22% in 2 tests and 5% in the 3 statistical approaches. That seemed quite inconsistent to me.

        There is some chance that in CLC Genomics Workbench the assembly of the sequences was done against Rnor_5.0 instead of the UCSC rn4. Do you think this could introduce such a big inconsistency in the filtered/sorted gene lists, among the 3 different statistical approaches?

        I will also look in more detail the MA plots

        kind regards

        Comment


        • #5
          Hello,

          Just for the record, I confirmed the assembly of the sequences was done against Rnor 5_0 in one dataset and UCSC rn4 for the other analyses. Therefore it is quite probable the majority of the inconsistencies I´m having it´s due to the different reference genomes assemblies. Then I will just focus on one genome assembly for all the statistical analyses.
          In respect of the QC, in my opinion the MA plots of our dataset doesn´t shows normalization biases (see attached figure). On the other hand the PCA plots (see attached figure) shows a separation between replicates in 4 of the experimental groups (red, green, pink and dark blue), while in the other 2 groups it seems quite acceptable for me. Moreover there is an evident separation between red, green and pink groups, in respect of the dark blue, light blue and light green ones. All of these is in line with the observed differentially expressed genes.
          Attached Files

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Strategies for Sequencing Challenging Samples
            by seqadmin


            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
            03-22-2024, 06:39 AM
          • seqadmin
            Techniques and Challenges in Conservation Genomics
            by seqadmin



            The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

            Avian Conservation
            Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
            03-08-2024, 10:41 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, Yesterday, 06:37 PM
          0 responses
          10 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, Yesterday, 06:07 PM
          0 responses
          9 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-22-2024, 10:03 AM
          0 responses
          51 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-21-2024, 07:32 AM
          0 responses
          67 views
          0 likes
          Last Post seqadmin  
          Working...
          X