SEQanswers

Go Back   SEQanswers > Applications Forums > RNA Sequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
yeast chip seq - completely new to our lab! bombardior Sample Prep / Library Generation 2 10-08-2014 06:35 PM
Completely renaming Fasta headers nickv General 12 05-20-2014 10:49 PM
Completely new to this and out of my depth Fwip 454 Pyrosequencing 14 12-06-2011 03:02 PM
Completely new to NGS NGS_user General 1 11-05-2010 06:35 PM
MosaikBuild entry point missing ursaan Bioinformatics 0 04-23-2010 06:08 AM

Reply
 
Thread Tools
Old 10-10-2014, 10:38 AM   #1
LeonDK
Member
 
Location: Denmark

Join Date: Sep 2014
Posts: 69
Default Am I completely missing the point?

So... Trying to get an overview of Limma, LimmaVoom, EdgeR, DESeq2, NPEBseq etc. I'm getting the feeling, that the task of differential gene expression analysis is being over-complicated...?

I'm currently looking at a count matrix derived from 95 RNAseq samples from Illumina HiSeq2000 (Illumina TruSeq stranded kit). Raw reads mapped to hg19 using STAR and then counted using HTSeq.

The result is a count matrix with 25369 rows and 95 columns, then I have two groups classic case(n=15)/control(n=80). I then perform the following steps:

1. Use the edgeR package to perform TMM normalisation of the raw counts
2. Foreach gene do a case vs. control t-test and a Wilcoxon test on the TMM values
3. Apply FDR correction
4. Sort on ascending FDR-value for the t-test and use the Wilcoxon p-value to get an idea of whether the difference is "outlier-driven"

Please enlighten me as to why this simple approach is not sufficient?

Cheers,
Leon
LeonDK is offline   Reply With Quote
Old 10-10-2014, 01:42 PM   #2
kopi-o
Senior Member
 
Location: Stockholm, Sweden

Join Date: Feb 2008
Posts: 319
Default

It may be sufficient - after all, you have quite a few samples. With a small number of samples it can be hard to achieve the necessary statistical power without "borrowing variance across genes".

Or you could use SAMSeq which is very simple to use and understand. It's based on non-parametrics stats.
kopi-o is offline   Reply With Quote
Old 10-13-2014, 10:32 AM   #3
Michael Love
Senior Member
 
Location: Boston

Join Date: Jul 2013
Posts: 333
Default

to echo Kopi-o, these methods each give the motivation fairly early on in the corresponding paper:

edgeR

"Various tests of differential expression have been proposed for replicated DGE data using binomial, Poisson, negative binomial or pseudo-likelihood (PL) models for the counts, but none of the these are usable when the number of replicates is very small."

DESeq

"Typically, the number of replicates is small, and further modelling assumptions need to be made in order to obtain useful estimates."

Voom

"Borrowing information between genes is a crucial feature of the genome-wide statistical methods, as it allows for gene-specific variation while still providing reliable inference with small sample sizes."

I'd also recommend checking out SAMseq paper and method.
Michael Love is offline   Reply With Quote
Old 10-27-2014, 10:17 PM   #4
sdriscoll
I like code
 
Location: San Diego, CA, USA

Join Date: Sep 2009
Posts: 438
Default

Run your statistical tests on log2 values. That's all I have to add. With those sample sizes you could even do permutation tests and avoid any distribution assumptions all together.
__________________
/* Shawn Driscoll, Gene Expression Laboratory, Pfaff
Salk Institute for Biological Studies, La Jolla, CA, USA */
sdriscoll is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:04 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO