View Single Post
Old 06-30-2015, 01:16 AM   #3
EVELE
Junior Member
 
Location: France

Join Date: Jun 2015
Posts: 9
Default

Hi Michael,

First of all many thanks for your prompt answer.
I tried to apply your suggestions.
As I am not sure for my understanding and the interpretations I give to my results, I expose (in bold) some of my considerations and questions arising through the steps.
I would appreciate it a lot , if you could answer some of them.

# times*: 0,1,2 represent stages*: vegetative, early, late respectively (early and late in a sexual cycle)
> times=factor(c(rep(1,2), rep(2,3), 1, 0, 1, rep(2,3), 1, 2, 0, 1, rep(2,2), 1, rep(2,2), 1, rep(0,2), 1, rep(2,5), 0), levels=c(0,1,2))

> condition = factor(c(rep("A", 14), rep("Controls",9), rep("B",7)), levels=c("Controls","A","B"))

Consideration 1
I consider that the log2 Fold Change results will be*: treatment A over Controls and B over Controls* (the first element of the vector levels=c("Controls","A","B") is always the denominator)...


> colData = data.frame( condition=condition, times=times)

# quarteto is my data frame containing the counts
> rownames(colData)=colnames(quarteto)
> dds<-DESeqDataSetFromMatrix(countData= as.matrix(quarteto), colData= colData ,design=formula(~times+condition+condition:times))
> colData
condition times
ND7.T0 A 1
ND7.T10 A 1
ND7.T20 A 2
ND7.T30 A 2
ND7.T40 A 2
ND7.T5 A 1
ND7.V A 0
ICL7.T0 A 1
ICL7.T10 A 2
ICL7.T20 A 2
ICL7.T35 A 2
ICL7.T5 A 1
ICL7.T50 A 2
ICL7.veg A 0
T0.2 Controls 1
T20.2 Controls 2
T20.3 Controls 2
T3 Controls 1
T30 Controls 2
T45 Controls 2
T5.2 Controls 1
V1.2 Controls 0
VK1 Controls 0
EZL1.T0 B 1
EZL1.T10 B 2
EZL1.T20 B 2
EZL1.T35 B 2
EZL1.T5 B 2
EZL1.T50 B 2
EZL1.veg B 0
>
>
> full.mm <- model.matrix(~ times + condition + condition:times, data=colData(dds))
> head(full.mm)
(Intercept) times1 times2 conditionA conditionB times1:conditionA
ND7.T0 1 1 0 1 0 1
ND7.T10 1 1 0 1 0 1
ND7.T20 1 0 1 1 0 0
ND7.T30 1 0 1 1 0 0
ND7.T40 1 0 1 1 0 0
ND7.T5 1 1 0 1 0 1
times2:conditionA times1:conditionB times2:conditionB
ND7.T0 0 0 0
ND7.T10 0 0 0
ND7.T20 1 0 0
ND7.T30 1 0 0
ND7.T40 1 0 0
ND7.T5 0 0 0

> # contruction of 2 reduced matrices for treatment A and treatment B, by removing the columns for the interaction terms which contain treatment A and B respectively.

Consideration 2
Do I consider right that the only columns concerning the interaction terms of conditionA are*:
times1:conditionA and times2:conditionA*?


> reducedA.mm <- full.mm[ ,!colnames(full.mm) %in% c("times1:conditionA", "times2:conditionA")]

> reducedB.mm <- full.mm[ ,!colnames(full.mm) %in% c("times1:conditionB", "times2:conditionB")]

> head(reducedA.mm)
(Intercept) times1 times2 conditionA conditionB times1:conditionB
ND7.T0 1 1 0 1 0 0
ND7.T10 1 1 0 1 0 0
ND7.T20 1 0 1 1 0 0
ND7.T30 1 0 1 1 0 0
ND7.T40 1 0 1 1 0 0
ND7.T5 1 1 0 1 0 0
times2:conditionB
ND7.T0 0
ND7.T10 0
ND7.T20 0
ND7.T30 0
ND7.T40 0
ND7.T5 0

> # treated_B vs Controls
> ddsB <- DESeq(dds, full=full.mm, reduced=reducedB.mm, test="LRT")
estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
-- replacing outliers and refitting for 4444 genes
-- DESeq argument 'minReplicatesForReplace' = 7
-- original counts are preserved in counts(dds)
estimating dispersions
fitting model and testing
>
> resultsNames(ddsB)
[1] "Intercept" "times1" "times2"
[4] "conditionA" "conditionB" "times1.conditionA"
[7] "times2.conditionA" "times1.conditionB" "times2.conditionB"
>
> resB_1 <- results(ddsB, name="times1.conditionB")
> head(resB_1)
log2 fold change (MLE): times1.conditionB
LRT p-value: full vs reduced
DataFrame with 6 rows and 6 columns
baseMean log2FoldChange lfcSE stat pvalue
<numeric> <numeric> <numeric> <numeric> <numeric>
PTET.51.1.G0010001 146.5136783 -0.62466508 0.7898300 0.6319728 0.7290694
PTET.51.1.G0010002 122.5083339 -0.68161198 0.8059814 1.1202526 0.5711369
PTET.51.1.G0010003 100.0616624 -0.53628223 0.5763633 2.2757097 0.3205058
PTET.51.1.G0010004 0.2590626 -5.89587066 10.5174621 0.4181308 0.8113422
PTET.51.1.G0010005 111.5651704 -0.13174595 0.6950362 1.1823154 0.5536859
PTET.51.1.G0010006 115.1214038 0.07501774 0.9922750 0.9200433 0.6312700
padj
<numeric>
PTET.51.1.G0010001 0.99998
PTET.51.1.G0010002 0.99998
PTET.51.1.G0010003 0.99998
PTET.51.1.G0010004 NA
PTET.51.1.G0010005 0.99998
PTET.51.1.G0010006 0.99998
>

Question 1
resB_1 <- results(ddsB, name="times1.conditionB")
How are the results resB_1 interpreted*?
Is it the differential expression from time 0 to time 1 specific to conditionB ??

Question 2
How is the *log2 fold change (MLE): times1.conditionB* interpreted at the results resB_1? How is it calculated (what are the 2 parts of comparison)*?
Is it something like*: times1.conditionB vs (Controls+times1.conditionA)*?


> # COUNTS PLOTS #
>
> data1 <- plotCounts(ddsB, which.min(resB_1$padj), intgroup=c("times","condition"), returnData=TRUE)
> pdf("counts_over_time.pdf")
> ggplot(data1, aes(x=times, y=count, color=condition, group=condition)) + geom_point() + stat_smooth(se=FALSE,method="loess") + scale_y_log10()
Error in predLoess(y, x, newx, s, weights, pars$robust, pars$span, pars$degree, :
NA/NaN/Inf dans un appel une fonction externe (argument 5)
Il y a eu 16 avis (utilisez warnings() pour les visionner)

> dev.off()
null device
1
> The plot is Attached

Question 3
Why error*? Is there a problem with the loess method*?


> resB_1 <- results(ddsB, name="times1.conditionB")
> resB_2 <- results(ddsB, name="times2.conditionB")
>
> min_fold_change=2
>
> ####
> resB_1 <- resB_1[!is.na(resB_1$padj) & resB_1$padj <=0.05,]
> resB_1 <- resB_1[resB_1$log2FoldChange >= log2(min_fold_change) | resB_1$log2FoldChange <= log2(1/min_fold_change),]
>
> resB_2 <- resB_2[!is.na(resB_2$padj) & resB_2$padj <=0.05,]
> resB_2 <- resB_2[resB_2$log2FoldChange >= log2(min_fold_change) | resB_2$log2FoldChange <= log2(1/min_fold_change),]
>
> list_of_differentially_expressed_genes_for_B_vs_Controls<-unique(c(rownames(resB_1),rownames(resB_2)))
> length(list_of_differentially_expressed_genes_for_B_vs_Controls)
[1] 33

Question 3
Is this the right way*:
list_of_differentially_expressed_genes_for_B_vs_Controls<-unique(c(rownames(resB_1),rownames(resB_2)))
to obtain the list of all the differentially expressed genes for treatment B vs Control*?
Attached Files
File Type: pdf counts_over_time.pdf (4.9 KB, 9 views)
EVELE is offline   Reply With Quote