Seqanswers Leaderboard Ad

**sdriscoll** · 11-12-2014, 11:56 AM

I'm not sure there is even an answer for that other than maybe to run several tests, as you have, and take the intersection of the genes or only take genes that show up in the majority of the tests (so if gene X is DE in 2 of the 3 tools keep that one). Keep in mind that these tools like to try to avoid reporting false-positives however their false-negative rates can be pretty bad. The following paper is relevant:

Efficient experimental design and analysis strategies for the detection of differential expression using RNA-Sequencing - BMC Genomics

http://www.biomedcentral.com/1471-2164/13/484

Background RNA sequencing (RNA-Seq) has emerged as a powerful approach for the detection of differential gene expression with both high-throughput and high resolution capabilities possible depending upon the experimental design chosen. Multiplex experimental designs are now readily available, these can be utilised to increase the numbers of samples or replicates profiled at the cost of decreased sequencing depth generated per sample. These strategies impact on the power of the approach to accurately identify differential expression. This study presents a detailed analysis of the power to detect differential expression in a range of scenarios including simulated null and differential expression distributions with varying numbers of biological or technical replicates, sequencing depths and analysis methods. Results Differential and non-differential expression datasets were simulated using a combination of negative binomial and exponential distributions derived from real RNA-Seq data. These datasets were used to evaluate the performance of three commonly used differential expression analysis algorithms and to quantify the changes in power with respect to true and false positive rates when simulating variations in sequencing depth, biological replication and multiplex experimental design choices. Conclusions This work quantitatively explores comparisons between contemporary analysis tools and experimental design choices for the detection of differential expression using RNA-Seq. We found that the DESeq algorithm performs more conservatively than edgeR and NBPSeq. With regard to testing of various experimental designs, this work strongly suggests that greater power is gained through the use of biological replicates relative to library (technical) replicates and sequencing depth. Strikingly, sequencing depth could be reduced as low as 15% without substantial impacts on false positive or true positive rates.

Figures 3 and 4 are kind of telling in that even with 12 replicates per condition their true positive rate is still less than 50%.

**mbblack** · 11-12-2014, 12:05 PM

Did you select differentially expressed genes solely by a statistical threshold? What if you simultaneously add a fold change threshold as well - do you get more consistent lists? You can look at the rank order correlation of fold change to see how well it behaves across the different analyses.

Look at some MA plots from each analyses and see if one or the other shows some skew that might indicate a normalization bias.

**LauGP** · 11-12-2014, 01:47 PM

Thanks for your answers and paper suggestion.

Actually, I did both, first I selected those genes with adjusted p-value below 0.05 and afterwards sorted them looking for those upregulated and downregulated. The comparison among the top ranked ones, in each statistical approach, resulted in 33% of genes being significantly differential expressed in just one test, 22% in 2 tests and 5% in the 3 statistical approaches. That seemed quite inconsistent to me.

There is some chance that in CLC Genomics Workbench the assembly of the sequences was done against Rnor_5.0 instead of the UCSC rn4. Do you think this could introduce such a big inconsistency in the filtered/sorted gene lists, among the 3 different statistical approaches?

I will also look in more detail the MA plots

kind regards

**LauGP** · 11-28-2014, 05:00 AM

Hello,

Just for the record, I confirmed the assembly of the sequences was done against Rnor 5_0 in one dataset and UCSC rn4 for the other analyses. Therefore it is quite probable the majority of the inconsistencies I´m having it´s due to the different reference genomes assemblies. Then I will just focus on one genome assembly for all the statistical analyses.
In respect of the QC, in my opinion the MA plots of our dataset doesn´t shows normalization biases (see attached figure). On the other hand the PCA plots (see attached figure) shows a separation between replicates in 4 of the experimental groups (red, green, pink and dark blue), while in the other 2 groups it seems quite acceptable for me. Moreover there is an evident separation between red, green and pink groups, in respect of the dark blue, light blue and light green ones. All of these is in line with the observed differentially expressed genes.

Attached Files

Topics	Statistics	Last Post
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Today, 08:47 AM	0 responses 10 views 0 likes	Last Post by seqadmin Today, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 57 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 53 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM

Seqanswers Leaderboard Ad

Announcement

Statistical tests for differential gene expression in RNA-Seq

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News