So... Trying to get an overview of Limma, LimmaVoom, EdgeR, DESeq2, NPEBseq etc. I'm getting the feeling, that the task of differential gene expression analysis is being over-complicated...?
I'm currently looking at a count matrix derived from 95 RNAseq samples from Illumina HiSeq2000 (Illumina TruSeq stranded kit). Raw reads mapped to hg19 using STAR and then counted using HTSeq.
The result is a count matrix with 25369 rows and 95 columns, then I have two groups classic case(n=15)/control(n=80). I then perform the following steps:
1. Use the edgeR package to perform TMM normalisation of the raw counts
2. Foreach gene do a case vs. control t-test and a Wilcoxon test on the TMM values
3. Apply FDR correction
4. Sort on ascending FDR-value for the t-test and use the Wilcoxon p-value to get an idea of whether the difference is "outlier-driven"
Please enlighten me as to why this simple approach is not sufficient?
Cheers,
Leon
I'm currently looking at a count matrix derived from 95 RNAseq samples from Illumina HiSeq2000 (Illumina TruSeq stranded kit). Raw reads mapped to hg19 using STAR and then counted using HTSeq.
The result is a count matrix with 25369 rows and 95 columns, then I have two groups classic case(n=15)/control(n=80). I then perform the following steps:
1. Use the edgeR package to perform TMM normalisation of the raw counts
2. Foreach gene do a case vs. control t-test and a Wilcoxon test on the TMM values
3. Apply FDR correction
4. Sort on ascending FDR-value for the t-test and use the Wilcoxon p-value to get an idea of whether the difference is "outlier-driven"
Please enlighten me as to why this simple approach is not sufficient?
Cheers,
Leon
Comment