Dear NGS community,
I am analysing RNA-seq data with deseq2 and I would truly appreciate any feedback regarding my current formula designs.
I have 3 biological replicates for each one of the following 7 experiments:
UND = Undifferentiated epidermal stem cells
DIF = Differentiated epidermal stem cells
HPLC = HAT family inhibitor
DMSO = DMSO dilution used as control against HLPC
siRNA1 = Targeting the same protein
siRNA2 = Targeting the same protein
siCTRL = siRNA Control
I'll be using only this 4 experiments to test this:
It is my understanding that here the genes with small padj values will correspond to genes changing expression in differentiation
and not because of the treatment. Is my design correct to address the question of which genes are DE during differentiation?
Should I remove the "stage:treatment" from the design formula? Or completely change my design formula?
2. DE genes in Differentiated cells after knocking down by siRNA
Null hypotesis: no changes in gene expression after treatment with siRNA in Differentiated cells.
So far I've been using this experiments:
At the end, to select for differentially expressed genes that are down regulated when treated with the siRNAs I use:
I would like to know your thoughts about whether this approach is right to know which genes are DE when knocking down with siRNAs? or if there is a better way to solve this question.
Also is it possible to control by DIF DMSO as well? with something like adding another column:
And try something like "full = ~ condition + treatmentGroup, reduce = ~ condition " to use DIF DMSO also in the control group.
But there is the problem of linear combination, so I though of using the edgeR trick "individuals in nested groups" from Deseq2 vignette to bypass this. Finally I decided to better ask for help.
I am not convinced about this last design as, intuitively, it make no sense to compare siRNAs vs DMSO treatment. But at the end my goal is to compare the DE genes from siBRD4 treatment against the DE in Differentiation; and perhaps there is a better way to make this comparison than just overlapping gene names from 2 independent test (like involving the other dataset in differentiated cells : DIF DMSO).
Thanks a lot in advance for the help and attention! Cheers!
Rob TM
I am analysing RNA-seq data with deseq2 and I would truly appreciate any feedback regarding my current formula designs.
I have 3 biological replicates for each one of the following 7 experiments:
UND = Undifferentiated epidermal stem cells
DIF = Differentiated epidermal stem cells
HPLC = HAT family inhibitor
DMSO = DMSO dilution used as control against HLPC
siRNA1 = Targeting the same protein
siRNA2 = Targeting the same protein
siCTRL = siRNA Control
stage......... treatment
UND ......... siCTRL
UND ......... DMSO
DIF ......... siCTRL
DIF ......... DMSO
DIF ......... siRNA1
DIF ......... siRNA2
DIF ......... HPLC
UND ......... siCTRL
UND ......... DMSO
DIF ......... siCTRL
DIF ......... DMSO
DIF ......... siRNA1
DIF ......... siRNA2
DIF ......... HPLC
- DE genes in Differentiation
I'll be using only this 4 experiments to test this:
stage......... treatment
UND ......... siCTRL
UND ......... DMSO
DIF ......... siCTRL
DIF ......... DMSO
UND ......... siCTRL
UND ......... DMSO
DIF ......... siCTRL
DIF ......... DMSO
Code:
design1 <- data.frame(experiment=colnames(data1), stage=c("UND","UND","UND", "UND","UND","UND", "DIF","DIF","DIF", "DIF","DIF","DIF"), treatment=c("siCTRL","siCTRL","siCTRL", "DMSO","DMSO","DMSO", "siCTRL","siCTRL","siCTRL", "DMSO", "DMSO", "DMSO") ) dLRT <- DESeqDataSetFromMatrix(countData = data1, colData = design1, design = ~ treatment + stage:treatment + stage ) dLRT <- DESeq(dLRT, test="LRT", full= ~ treatment + stage:treatment + stage, reduced= ~ treatment ) dLRT_res <- results(dLRT) dLRT_res$log2FoldChange <- dDif_res$log2FoldChange*-1 # To have in positive values in the L2FC for DIF
and not because of the treatment. Is my design correct to address the question of which genes are DE during differentiation?
Should I remove the "stage:treatment" from the design formula? Or completely change my design formula?
2. DE genes in Differentiated cells after knocking down by siRNA
Null hypotesis: no changes in gene expression after treatment with siRNA in Differentiated cells.
So far I've been using this experiments:
condition
DIF siCTRL
DIF siRNA1
DIF siRNA2
DIF siCTRL
DIF siRNA1
DIF siRNA2
Code:
design2 <- data.frame(experiment=colnames(data2), condition=c("siCTRL","siRNA1","siRNA2") ) dLRT <- DESeqDataSetFromMatrix(countData = data2, colData = design2, design = ~ condition ) dLRT <- DESeq(dLRT, test="LRT", reduced= ~ 1 ) dLRT_res <- results(dLRT) dDif_siRNA1 <-results(dDif, contrast=c("condition","siRNA1","siCTRL")) dDif_siRNA2 <-results(dDif, contrast=c("condition","siRNA2","siCTRL")) dDif_siRNAvs <-results(dDif, contrast=c("condition","siRNA1","siRNA2"))
Code:
select = which(dDif_res$padj<0.01 & dDif_siRNA1$log2FoldChange<(-1) & dDif_siRNA2$log2FoldChange<(-1) & abs(dDif_siRNAvs)<1 )
Also is it possible to control by DIF DMSO as well? with something like adding another column:
condition......... treatmentGroup
DIF DMSO ......... CNT
DIF siCTRL ......... CNT
DIF siRNA1 ......... TRT
DIF siRNA2 ......... TRT
DIF DMSO ......... CNT
DIF siCTRL ......... CNT
DIF siRNA1 ......... TRT
DIF siRNA2 ......... TRT
But there is the problem of linear combination, so I though of using the edgeR trick "individuals in nested groups" from Deseq2 vignette to bypass this. Finally I decided to better ask for help.
I am not convinced about this last design as, intuitively, it make no sense to compare siRNAs vs DMSO treatment. But at the end my goal is to compare the DE genes from siBRD4 treatment against the DE in Differentiation; and perhaps there is a better way to make this comparison than just overlapping gene names from 2 independent test (like involving the other dataset in differentiated cells : DIF DMSO).
Thanks a lot in advance for the help and attention! Cheers!
Rob TM
Comment