I have a question regarding to analyze the RNAseq data.
My data is exactly looks like this.(total 12 samples)
disease1 patient1 pre-hormone
disease1 patient1 post-hormone
disease1 patient2 pre-hormone
disease1 patient2 post-hormone
disease1 patient3 pre-hormone
disease1 patient3 post-hormone
disaese2 patient4 pre-hormone
disaese2 patient4 post-hormone
dissease3 patient5 pre-hormone
disease3 patient5 post-hormone
disease3 patient6 pre-hormone
disease3 patient6 post-hormone..
I would like to find the genes that are affected after hormone treatment ( such as differentially expressed genes).
Since, it is not easy to handle multiple diseases, multiple patients within disease, and hormone, I could not create the design matrix for GLM analysis.
The easist way to handle this might separate the data per each disease.
(e.g. disease1, disease2, disease3)
Then, I made three different experiment set (for disease 1, disease2, disease3) and test with patient and hormone factor. Does it make sense?? Three different experiment set is called data1(disease1), data2(disease2), data3(disease3).
data1
disease1 patient1 pre-hormone
disease1 patient1 post-hormone
disease1 patient2 pre-hormone
disease1 patient2 post-hormone
disease1 patient3 pre-hormone
disease1 patient3 post-hormone
data2
disaese2 patient4 pre-hormone
disaese2 patient4 post-hormone
data3
dissease3 patient5 pre-hormone
disease3 patient5 post-hormone
disease3 patient6 pre-hormone
disease3 patient6 post-hormone..
so, to test the differentially expressed genes after hormone treatment for disease 2 with DESeq
d_hunter <- estimateDispersions(data2, method="blind", sharingMode="fit-only") // I put this one since I only have one sample for data2(disease2)
plotDispEsts(data2)
plotPCA(varianceStabilizingTransformation(data2), intgroup=c("patient", "hormone"))
dh_fit1 = fitNbinomGLMs(data2, count ~ patient + hormone)
dh_fit0 = fitNbinomGLMs(data2, count ~ patient)
dh_pvalsGLM = nbinomGLMTest( dh_fit1, dh_fit0 )
dh_padjGLM = p.adjust( dh_pvalsGLM, method="BH" )
===============================
Same for EdgeR
dg_edesign <- model.matrix(~patient+hormone)
dg_e <- DGEList(counts=data2)
dg_e <- calcNormFactors(dg_e)
dg_e <- estimateGLMCommonDisp(dg_e, dg_edesign)
dg_e <- estimateGLMTrendedDisp(dg_e, dg_edesign)
dg_e <- estimateGLMTagwiseDisp(dg_e, dg_edesign)
dg_efit <- glmFit(dg_e, dg_edesign)
dg_efit <- glmLRT(dg_efit, coef=3)
==================================
This code is right???????????? I tried to do the same thing for DESeq and EdgeR. So, I expect these two codes did the same thing... even though the result could be different!!
=================================================
In addition I woud like to consider all samples at the same time instead of separating the data by disease, how can I do it?
design2 <- model.matrix (~disease+diseaseatient+disease:hormone)
would be a one way.. or
design2 <- model.matrix (~0+disease+diseaseatient+disease:hormone)
Does it make sense?
Could you please somebody give comments?
Thanks in advance
My data is exactly looks like this.(total 12 samples)
disease1 patient1 pre-hormone
disease1 patient1 post-hormone
disease1 patient2 pre-hormone
disease1 patient2 post-hormone
disease1 patient3 pre-hormone
disease1 patient3 post-hormone
disaese2 patient4 pre-hormone
disaese2 patient4 post-hormone
dissease3 patient5 pre-hormone
disease3 patient5 post-hormone
disease3 patient6 pre-hormone
disease3 patient6 post-hormone..
I would like to find the genes that are affected after hormone treatment ( such as differentially expressed genes).
Since, it is not easy to handle multiple diseases, multiple patients within disease, and hormone, I could not create the design matrix for GLM analysis.
The easist way to handle this might separate the data per each disease.
(e.g. disease1, disease2, disease3)
Then, I made three different experiment set (for disease 1, disease2, disease3) and test with patient and hormone factor. Does it make sense?? Three different experiment set is called data1(disease1), data2(disease2), data3(disease3).
data1
disease1 patient1 pre-hormone
disease1 patient1 post-hormone
disease1 patient2 pre-hormone
disease1 patient2 post-hormone
disease1 patient3 pre-hormone
disease1 patient3 post-hormone
data2
disaese2 patient4 pre-hormone
disaese2 patient4 post-hormone
data3
dissease3 patient5 pre-hormone
disease3 patient5 post-hormone
disease3 patient6 pre-hormone
disease3 patient6 post-hormone..
so, to test the differentially expressed genes after hormone treatment for disease 2 with DESeq
d_hunter <- estimateDispersions(data2, method="blind", sharingMode="fit-only") // I put this one since I only have one sample for data2(disease2)
plotDispEsts(data2)
plotPCA(varianceStabilizingTransformation(data2), intgroup=c("patient", "hormone"))
dh_fit1 = fitNbinomGLMs(data2, count ~ patient + hormone)
dh_fit0 = fitNbinomGLMs(data2, count ~ patient)
dh_pvalsGLM = nbinomGLMTest( dh_fit1, dh_fit0 )
dh_padjGLM = p.adjust( dh_pvalsGLM, method="BH" )
===============================
Same for EdgeR
dg_edesign <- model.matrix(~patient+hormone)
dg_e <- DGEList(counts=data2)
dg_e <- calcNormFactors(dg_e)
dg_e <- estimateGLMCommonDisp(dg_e, dg_edesign)
dg_e <- estimateGLMTrendedDisp(dg_e, dg_edesign)
dg_e <- estimateGLMTagwiseDisp(dg_e, dg_edesign)
dg_efit <- glmFit(dg_e, dg_edesign)
dg_efit <- glmLRT(dg_efit, coef=3)
==================================
This code is right???????????? I tried to do the same thing for DESeq and EdgeR. So, I expect these two codes did the same thing... even though the result could be different!!
=================================================
In addition I woud like to consider all samples at the same time instead of separating the data by disease, how can I do it?
design2 <- model.matrix (~disease+diseaseatient+disease:hormone)
would be a one way.. or
design2 <- model.matrix (~0+disease+diseaseatient+disease:hormone)
Does it make sense?
Could you please somebody give comments?
Thanks in advance