![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
data inclusion for dispersion estimates in DEseq? | amcloon | Bioinformatics | 0 | 06-19-2013 12:44 AM |
DESeq estimates of mu,var | bruce01 | Bioinformatics | 0 | 01-10-2012 05:53 AM |
Transcript coverage estimates? | tboothby | General | 2 | 12-14-2011 02:42 AM |
Empty FPKM estimates after cuffcompare | henning | RNA Sequencing | 2 | 09-21-2011 07:53 AM |
C value estimates from NGS data? | plattsa | Bioinformatics | 0 | 01-31-2011 04:47 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: Mid-Atlantic Join Date: Jun 2013
Posts: 22
|
![]()
I construct a toy dataset with
Code:
> m Control Case GeneA 3891 4591 GeneB 69543 72122 > colData cc Case Case Control Control > d <- DESeqDataSetFromMatrix(m, colData=colData, design =~cc) Code:
> d <- estimateSizeFactors(d) > d <- estimateDispersionsGeneEst(d) > mcols(d) DataFrame with 2 rows and 5 columns baseMean baseVar allZero dispGeneEst dispGeneEstConv <numeric> <numeric> <logical> <numeric> <logical> 1 4228.732 37181.63 FALSE 0.001842478 TRUE 2 70857.604 10439534.06 FALSE 0.002065126 TRUE Code:
> fitdistr(as.vector(round(counts(d, normalize=T)[1,])),"negative binomial") size mu 1207.3898 4228.5000 (1543.6506) ( 97.5636) > fitdistr(as.vector(round(counts(d, normalize=T)[2,])),"negative binomial") size mu 970.5831 70857.5000 ( 982.6550) ( 1616.2617) Now the dispersion is the inverse of size, so it should be Code:
> c(1/1207.3898, 1/970.5831) [1] 0.0008282329 0.0010303085 Code:
> mcols(d)$dispGeneEst [1] 0.001842478 0.002065126 |
![]() |
![]() |
![]() |
#2 |
Senior Member
Location: Heidelberg, Germany Join Date: Feb 2010
Posts: 994
|
![]()
No, it does not work like that. First of all, DESeq2's model is that the counts are NB-distributed. You have fit an NB distribution to the rounded normalized counts instead. Why would you do that? Also, you don't have any residual degrees of freedom in your setting, so your example is ill-defined anyway.
Maybe have a look at our preprint, with all the mathematical details in the Methods section: http://doi.org/10.1101/002832 |
![]() |
![]() |
![]() |
#3 |
Senior Member
Location: Sydney, Australia Join Date: Jun 2011
Posts: 166
|
![]()
Last month, Gordon Smyth wrote on the Bioconductor list that edgeR with rounded counts would be an acceptable input for the analysis. EBseq, however, handles this kind of data without rounding.
|
![]() |
![]() |
![]() |
#4 |
Senior Member
Location: Heidelberg, Germany Join Date: Feb 2010
Posts: 994
|
![]()
Of course, all these tools will accept any kind of integer data, no matter whether it is real counts or some other data that has been rounded to make it look like counts. Whether the latter makes any sense depends a lot on circumstances. (By definition of the English word, a "count" is an integer. If you need to round it first, it was not a count in the first place. Hence, "rounded counts" is a contradiction in terms -- what Gordon meant is simply that if you have non-integer data which are for some reason more or less close to the counts that you should have if you had counted correctly, the rounding is fine.)
But what does all this have to do with the OP's question, and with my answer, neither of which was about using rounded something as input? |
![]() |
![]() |
![]() |
#5 |
Senior Member
Location: Sydney, Australia Join Date: Jun 2011
Posts: 166
|
![]()
Thank you for clarifying that. I hadn't read the code.
|
![]() |
![]() |
![]() |
Tags |
deseq2, differential expression, dispersion, negative binomial |
Thread Tools | |
|
|