SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
data inclusion for dispersion estimates in DEseq? amcloon Bioinformatics 0 06-19-2013 01:44 AM
DESeq estimates of mu,var bruce01 Bioinformatics 0 01-10-2012 06:53 AM
Transcript coverage estimates? tboothby General 2 12-14-2011 03:42 AM
Empty FPKM estimates after cuffcompare henning RNA Sequencing 2 09-21-2011 08:53 AM
C value estimates from NGS data? plattsa Bioinformatics 0 01-31-2011 05:47 AM

Reply
 
Thread Tools
Old 03-07-2014, 11:33 PM   #1
ysnapus
Member
 
Location: Mid-Atlantic

Join Date: Jun 2013
Posts: 22
Default Understanding DESeq2 dispersion estimates

I construct a toy dataset with
Code:
> m
      Control  Case
GeneA    3891  4591
GeneB   69543 72122
> colData
             cc
Case       Case
Control Control
> d <- DESeqDataSetFromMatrix(m, colData=colData, design =~cc)
Then, I get the gene-wise dispersion
Code:
> d <- estimateSizeFactors(d)
> d <- estimateDispersionsGeneEst(d)
> mcols(d)
DataFrame with 2 rows and 5 columns
   baseMean     baseVar   allZero dispGeneEst dispGeneEstConv
  <numeric>   <numeric> <logical>   <numeric>       <logical>
1  4228.732    37181.63     FALSE 0.001842478            TRUE
2 70857.604 10439534.06     FALSE 0.002065126            TRUE
Now, I want to check i the dispersion estimates are just fitting a negative binomial on each gene's count (pooled conditions).
Code:
> fitdistr(as.vector(round(counts(d, normalize=T)[1,])),"negative binomial")
     size         mu    
  1207.3898   4228.5000 
 (1543.6506) (  97.5636)
> fitdistr(as.vector(round(counts(d, normalize=T)[2,])),"negative binomial")
      size          mu    
    970.5831   70857.5000 
 (  982.6550) ( 1616.2617)

Now the dispersion is the inverse of size, so it should be
Code:
> c(1/1207.3898, 1/970.5831)
[1] 0.0008282329 0.0010303085
But this is different from
Code:
> mcols(d)$dispGeneEst
[1] 0.001842478 0.002065126
ysnapus is offline   Reply With Quote
Old 03-08-2014, 03:16 AM   #2
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994
Default

No, it does not work like that. First of all, DESeq2's model is that the counts are NB-distributed. You have fit an NB distribution to the rounded normalized counts instead. Why would you do that? Also, you don't have any residual degrees of freedom in your setting, so your example is ill-defined anyway.

Maybe have a look at our preprint, with all the mathematical details in the Methods section: http://doi.org/10.1101/002832
Simon Anders is offline   Reply With Quote
Old 03-10-2014, 04:00 PM   #3
Dario1984
Senior Member
 
Location: Sydney, Australia

Join Date: Jun 2011
Posts: 166
Default

Last month, Gordon Smyth wrote on the Bioconductor list that edgeR with rounded counts would be an acceptable input for the analysis. EBseq, however, handles this kind of data without rounding.
Dario1984 is offline   Reply With Quote
Old 03-10-2014, 04:07 PM   #4
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994
Default

Of course, all these tools will accept any kind of integer data, no matter whether it is real counts or some other data that has been rounded to make it look like counts. Whether the latter makes any sense depends a lot on circumstances. (By definition of the English word, a "count" is an integer. If you need to round it first, it was not a count in the first place. Hence, "rounded counts" is a contradiction in terms -- what Gordon meant is simply that if you have non-integer data which are for some reason more or less close to the counts that you should have if you had counted correctly, the rounding is fine.)

But what does all this have to do with the OP's question, and with my answer, neither of which was about using rounded something as input?
Simon Anders is offline   Reply With Quote
Old 03-10-2014, 05:00 PM   #5
Dario1984
Senior Member
 
Location: Sydney, Australia

Join Date: Jun 2011
Posts: 166
Default

Thank you for clarifying that. I hadn't read the code.
Dario1984 is offline   Reply With Quote
Reply

Tags
deseq2, differential expression, dispersion, negative binomial

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 04:33 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO