I have normalized my RNAseq read counts in EdgeR. If I try to look at the normalized data:
total=cpm(w, normalized.lib.size=TRUE)
And specifically the CPM values for my cell line 1 (in triplicates) for the gene Hspa5
total["Hspa5",1:3]
7492.944 6750.397 5727.190
If I find the fold change between cell line 1 and my control cell line 2 I get:
total["Hspa5",1:3]/total["Hspa5",27:29]
3.409239 3.399253 2.910913
So by this very basic test I find that a fold change of about 3.3 fold is found between my two cell lines
Then I use EdgeR to find the same
et <- exactTest(w, pair=c(13,1))
et["Hspa5",]
logFC logCPM PValue FDR
Hspa5 1.80454 12.11638 2.519341e-65 4.262793e-63
FC = exp(1.80454) = 6.077175
CPM = exp(12.11638) = 182842
So after EdgeR has made the comparison the CPM value is suddenly 27x higher and (more importantly for me) the fold change is 2x higher. Is there someone out there that can explain this difference for me?
I have read this thread (http://seqanswers.com/forums/showthread.php?t=23722) stating that log(CPM) is actually taking into account the estimated dispersions and the library sizes... but 27x difference? and why the 2x difference in the fold change?
Which of the 2 results would you state in a table showing your RNAseq results in a publication?
total=cpm(w, normalized.lib.size=TRUE)
And specifically the CPM values for my cell line 1 (in triplicates) for the gene Hspa5
total["Hspa5",1:3]
7492.944 6750.397 5727.190
If I find the fold change between cell line 1 and my control cell line 2 I get:
total["Hspa5",1:3]/total["Hspa5",27:29]
3.409239 3.399253 2.910913
So by this very basic test I find that a fold change of about 3.3 fold is found between my two cell lines
Then I use EdgeR to find the same
et <- exactTest(w, pair=c(13,1))
et["Hspa5",]
logFC logCPM PValue FDR
Hspa5 1.80454 12.11638 2.519341e-65 4.262793e-63
FC = exp(1.80454) = 6.077175
CPM = exp(12.11638) = 182842
So after EdgeR has made the comparison the CPM value is suddenly 27x higher and (more importantly for me) the fold change is 2x higher. Is there someone out there that can explain this difference for me?
I have read this thread (http://seqanswers.com/forums/showthread.php?t=23722) stating that log(CPM) is actually taking into account the estimated dispersions and the library sizes... but 27x difference? and why the 2x difference in the fold change?
Which of the 2 results would you state in a table showing your RNAseq results in a publication?