
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
Hidden Markov Models  water  Bioinformatics  0  07142015 06:42 PM 
Chipseq and statistical models  Rivalyn  Bioinformatics  0  05272014 05:43 AM 
gene models  polijana  Bioinformatics  2  03302013 12:08 PM 

Thread Tools 
02122016, 06:38 AM  #1 
Member
Location: Germany Join Date: Jan 2014
Posts: 39

Statistical models for DE
Hey guys,
I have some problems understanding the need for statistical models when dealing with differential expression in RNASeq. Of course I already used tools like DESeq2 or NOISeq. Nevertheless, I also want at least partially understand what these tools are doing. Unfortunately I don't have a good statistical background and found no tutorial, which is explaining the usage of statistical models in a for me understandable manner. I think the best would be if one could explain it for the Poisson model as this one seems to be easier to understand than a NB. So what I know is that after sequencing I align my reads to the reference genome, followed by generation of read counts for each annotated gene. Of course I cannot directly use these counts for testing DE cause of different library sizes as well as technical and biological variation. So what I read most of the time is that people fit statistical models to the count data. Like for example a Poisson model (as this one is accounting for the technical variance). Question 1: Is the model fitted on the read counts of all genes? Or is each gene getting its own model? Question 2: In the case of the Poisson model, where do I get the lambda? Should be calculated from my count data? Question 3: If I have constructed my Poisson model. What is it now used for? Do I use it to change my count data? Is it used in the statistical test? This is the step where I have absolutely no clue what is going on. I tried to read different publications including the DESeq publications or in the case of Poisson the Marioni paper from 2008. But with my little statistical knowledge I do not get the key idea of these statistical models and how they can help me when dealing with DE in RNASeq. I really hope someone can explain this general concept in a really easy way so I can understand it. Cheers Mario 
02132016, 02:09 AM  #2 
Devon Ryan
Location: Freiburg, Germany Join Date: Jul 2011
Posts: 3,473

N.B., I'm going to completely ignore the empirical Bayes parts of this for the sake of simplicity.
1. The model is fit to each gene, one at a time. The actual model used is the same for all of them. 2. The lambda is part of the fit. Note that there is a lambda per group. 3. The model is used for a statistical test, which is typically of the form, "Do groups A and B have different lambdas?" 
02142016, 10:16 PM  #3 
Member
Location: Germany Join Date: Jan 2014
Posts: 39

First of all, thanks for your quick answer dpryan.
But I still do not get where the lambda is coming from and what you mean by "group". 
02142016, 11:43 PM  #4 
Devon Ryan
Location: Freiburg, Germany Join Date: Jul 2011
Posts: 3,473

A group is a group (eine "Gruppe" auf Deutsch), it has no special meaning in this context
Regarding lambda, each gene has some sort of expression count associated with it, normally in the form of counts per sample. These counts are then used to estimate lambda. 
02152016, 12:09 AM  #5 
Member
Location: Germany Join Date: Jan 2014
Posts: 39

This is actually where I have a problem. I estimate lambda by the read counts of a gene (lambda = read count) and then I test the nullhypothesis that Condition A and B have the same lambda. So why do I use the Poisson model and not just test if A and B have the same read count?
Does anyone know a tutorial or lecture with examples? 
02152016, 12:59 AM  #6 
Devon Ryan
Location: Freiburg, Germany Join Date: Jul 2011
Posts: 3,473

You essentially are testing whether A and B have the same read count. The question is simply how you test that. One option is assuming Poisson variance, which requires estimating lambda and then doing a test. In most real cases, you'd have multiple groups of samples, so you couldn't just compare two numbers, but would need to come up with group estimates, likely accounting for differences in sequencing depth for each sample.

02262016, 01:17 AM  #7 
Member
Location: Germany Join Date: Jan 2014
Posts: 39

Anyone out there who can explain the lambda estimation in more detail (probably with an example) or knows a nice tutorial?

Thread Tools  

