Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Statistical tests for differential gene expression in RNA-Seq

    Dear all,

    I´m a beginner in the RNA-Seq world who recently got some results to analyse and process. The data was analized by two pipelines in parallel: Tophat/Bowtie-->HTSeq count-->DESeq2 and in the CLC Genomics Workbench. So now I have 3 different outcomes from 3 statistical approaches, the one from DESeq2, EDGE and Baggerley´s test from CLC Genomics. Then I tried to find coherences among them, so I filtered the adjusted p-values (with the same threshold) from each test and compare the filtered genes lists to see how similar they are.
    What I got seems not very consistent to me. From DESeq2 there are around 1500 differential expressed genes, while from EDGE there are around 2000 and finally from Baggerley I got around 3000. I have read that the data for DESeq2 and EDGE should follow a Negative Binomial distribution while the data for Baggerley´s should follow a Beta-Binomial.

    Any clue about why I got so much difference in significantly differential expressed genes among those 3 statistical approaches? Which one should I use?

    Thanks a lot in advanced
    regards

  • #2
    I'm not sure there is even an answer for that other than maybe to run several tests, as you have, and take the intersection of the genes or only take genes that show up in the majority of the tests (so if gene X is DE in 2 of the 3 tools keep that one). Keep in mind that these tools like to try to avoid reporting false-positives however their false-negative rates can be pretty bad. The following paper is relevant:

    Background RNA sequencing (RNA-Seq) has emerged as a powerful approach for the detection of differential gene expression with both high-throughput and high resolution capabilities possible depending upon the experimental design chosen. Multiplex experimental designs are now readily available, these can be utilised to increase the numbers of samples or replicates profiled at the cost of decreased sequencing depth generated per sample. These strategies impact on the power of the approach to accurately identify differential expression. This study presents a detailed analysis of the power to detect differential expression in a range of scenarios including simulated null and differential expression distributions with varying numbers of biological or technical replicates, sequencing depths and analysis methods. Results Differential and non-differential expression datasets were simulated using a combination of negative binomial and exponential distributions derived from real RNA-Seq data. These datasets were used to evaluate the performance of three commonly used differential expression analysis algorithms and to quantify the changes in power with respect to true and false positive rates when simulating variations in sequencing depth, biological replication and multiplex experimental design choices. Conclusions This work quantitatively explores comparisons between contemporary analysis tools and experimental design choices for the detection of differential expression using RNA-Seq. We found that the DESeq algorithm performs more conservatively than edgeR and NBPSeq. With regard to testing of various experimental designs, this work strongly suggests that greater power is gained through the use of biological replicates relative to library (technical) replicates and sequencing depth. Strikingly, sequencing depth could be reduced as low as 15% without substantial impacts on false positive or true positive rates.


    Figures 3 and 4 are kind of telling in that even with 12 replicates per condition their true positive rate is still less than 50%.
    /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
    Salk Institute for Biological Studies, La Jolla, CA, USA */

    Comment


    • #3
      Did you select differentially expressed genes solely by a statistical threshold? What if you simultaneously add a fold change threshold as well - do you get more consistent lists? You can look at the rank order correlation of fold change to see how well it behaves across the different analyses.

      Look at some MA plots from each analyses and see if one or the other shows some skew that might indicate a normalization bias.
      Michael Black, Ph.D.
      ScitoVation LLC. RTP, N.C.

      Comment


      • #4
        Thanks for your answers and paper suggestion.

        Actually, I did both, first I selected those genes with adjusted p-value below 0.05 and afterwards sorted them looking for those upregulated and downregulated. The comparison among the top ranked ones, in each statistical approach, resulted in 33% of genes being significantly differential expressed in just one test, 22% in 2 tests and 5% in the 3 statistical approaches. That seemed quite inconsistent to me.

        There is some chance that in CLC Genomics Workbench the assembly of the sequences was done against Rnor_5.0 instead of the UCSC rn4. Do you think this could introduce such a big inconsistency in the filtered/sorted gene lists, among the 3 different statistical approaches?

        I will also look in more detail the MA plots

        kind regards

        Comment


        • #5
          Hello,

          Just for the record, I confirmed the assembly of the sequences was done against Rnor 5_0 in one dataset and UCSC rn4 for the other analyses. Therefore it is quite probable the majority of the inconsistencies I´m having it´s due to the different reference genomes assemblies. Then I will just focus on one genome assembly for all the statistical analyses.
          In respect of the QC, in my opinion the MA plots of our dataset doesn´t shows normalization biases (see attached figure). On the other hand the PCA plots (see attached figure) shows a separation between replicates in 4 of the experimental groups (red, green, pink and dark blue), while in the other 2 groups it seems quite acceptable for me. Moreover there is an evident separation between red, green and pink groups, in respect of the dark blue, light blue and light green ones. All of these is in line with the observed differentially expressed genes.
          Attached Files

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Essential Discoveries and Tools in Epitranscriptomics
            by seqadmin




            The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
            04-22-2024, 07:01 AM
          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, Today, 08:47 AM
          0 responses
          10 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          60 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          57 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 09:21 AM
          0 responses
          53 views
          0 likes
          Last Post seqadmin  
          Working...
          X