SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Hidden Markov Models water Bioinformatics 0 07-14-2015 07:42 PM
Chipseq and statistical models Rivalyn Bioinformatics 0 05-27-2014 06:43 AM
gene models polijana Bioinformatics 2 03-30-2013 01:08 PM

Reply
 
Thread Tools
Old 02-12-2016, 07:38 AM   #1
Mchicken
Member
 
Location: Germany

Join Date: Jan 2014
Posts: 39
Default Statistical models for DE

Hey guys,

I have some problems understanding the need for statistical models when dealing with differential expression in RNA-Seq. Of course I already used tools like DESeq2 or NOISeq. Nevertheless, I also want at least partially understand what these tools are doing. Unfortunately I don't have a good statistical background and found no tutorial, which is explaining the usage of statistical models in a for me understandable manner. I think the best would be if one could explain it for the Poisson model as this one seems to be easier to understand than a NB.

So what I know is that after sequencing I align my reads to the reference genome, followed by generation of read counts for each annotated gene. Of course I cannot directly use these counts for testing DE cause of different library sizes as well as technical and biological variation.

So what I read most of the time is that people fit statistical models to the count data. Like for example a Poisson model (as this one is accounting for the technical variance).

Question 1: Is the model fitted on the read counts of all genes? Or is each gene getting its own model?

Question 2: In the case of the Poisson model, where do I get the lambda? Should be calculated from my count data?

Question 3: If I have constructed my Poisson model. What is it now used for? Do I use it to change my count data? Is it used in the statistical test? This is the step where I have absolutely no clue what is going on.

I tried to read different publications including the DESeq publications or in the case of Poisson the Marioni paper from 2008. But with my little statistical knowledge I do not get the key idea of these statistical models and how they can help me when dealing with DE in RNA-Seq.

I really hope someone can explain this general concept in a really easy way so I can understand it.

Cheers
Mario
Mchicken is offline   Reply With Quote
Old 02-13-2016, 03:09 AM   #2
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,476
Default

N.B., I'm going to completely ignore the empirical Bayes parts of this for the sake of simplicity.

1. The model is fit to each gene, one at a time. The actual model used is the same for all of them.
2. The lambda is part of the fit. Note that there is a lambda per group.
3. The model is used for a statistical test, which is typically of the form, "Do groups A and B have different lambdas?"
dpryan is offline   Reply With Quote
Old 02-14-2016, 11:16 PM   #3
Mchicken
Member
 
Location: Germany

Join Date: Jan 2014
Posts: 39
Default

First of all, thanks for your quick answer dpryan.

But I still do not get where the lambda is coming from and what you mean by "group".
Mchicken is offline   Reply With Quote
Old 02-15-2016, 12:43 AM   #4
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,476
Default

A group is a group (eine "Gruppe" auf Deutsch), it has no special meaning in this context

Regarding lambda, each gene has some sort of expression count associated with it, normally in the form of counts per sample. These counts are then used to estimate lambda.
dpryan is offline   Reply With Quote
Old 02-15-2016, 01:09 AM   #5
Mchicken
Member
 
Location: Germany

Join Date: Jan 2014
Posts: 39
Default

This is actually where I have a problem. I estimate lambda by the read counts of a gene (lambda = read count) and then I test the null-hypothesis that Condition A and B have the same lambda. So why do I use the Poisson model and not just test if A and B have the same read count?

Does anyone know a tutorial or lecture with examples?
Mchicken is offline   Reply With Quote
Old 02-15-2016, 01:59 AM   #6
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,476
Default

You essentially are testing whether A and B have the same read count. The question is simply how you test that. One option is assuming Poisson variance, which requires estimating lambda and then doing a test. In most real cases, you'd have multiple groups of samples, so you couldn't just compare two numbers, but would need to come up with group estimates, likely accounting for differences in sequencing depth for each sample.
dpryan is offline   Reply With Quote
Old 02-26-2016, 02:17 AM   #7
Mchicken
Member
 
Location: Germany

Join Date: Jan 2014
Posts: 39
Default

Anyone out there who can explain the lambda estimation in more detail (probably with an example) or knows a nice tutorial?
Mchicken is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 02:59 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO