SEQanswers

Go Back   SEQanswers > Applications Forums > RNA Sequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
Problem: DESeq2 analysis with very unbalanced design Fischer Bioinformatics 15 09-16-2015 03:08 AM
DESeq2 experimental design/analysis crh Bioinformatics 1 05-05-2015 11:30 PM
Unbalanced block design analysis with DESeq2 youssefd Bioinformatics 6 03-13-2015 03:43 AM
Paired design versus unpaired design in DESeq2 KristenC RNA Sequencing 1 05-29-2014 11:05 AM
DESeq high withing group dispersion wdt RNA Sequencing 0 07-16-2012 03:54 PM

Reply
 
Thread Tools
Old 10-01-2015, 07:33 AM   #1
EVELE
Junior Member
 
Location: France

Join Date: Jun 2015
Posts: 9
Default DESeq2*: Dispersion in a DESeq analysis with design=~group

Hello,

I make a differential expression analysis between treated samples (factors) and untreated (controls) at different states (vegetative, early, inter, late).
The problem is that I lack replicates.

-For the vegetative state I have no replicates at all.
Contrast*:
EZL1.veg (factor) vs ICL7.veg (control)

-For the early state I have partially replicates (only 2 for the control)
Contrast*:
EZL1.T0 (factor) vs ICL7.T0 & ICL7.T5 (control)

-For the inter state I have partially replicates (only 2 for the factor)
Contrast*:
EZL1.T10 & EZL1.T20 vs ICL7.T20

-For the late state I have partially replicates (only 2 for the factor)
Contrast*:
EZL1.T35 & EZL1.T50 vs ICL7.T35

My questions are :
1) How Deseq calculates the dispersion in the case of the vegetative state where I have no replicates at all*? (given that I apply a DESEq analysis with design=~group and then I ask results(ddsEZL1_ICL7, contrast=c("group", "controls0", "factors0")))

2) How Deseq calculates the dispersion in all other cases where I have partially replicates*?


Thank you in advance for any ideas*!

Here is the code*:

# 0:Vegetative, 1:Early, 2:Inter, 3:Late
> times=factor(c(1, rep(2,2),rep(3,2),0,rep(1,2), rep(2,1),rep(3,1),0), levels=c(0,1,2,3))
>
> strain = factor(c(rep("factors", 6), rep("controls",5)), levels=c("controls","factors"))
>
> colData = data.frame( strain= strain, times=times)
> rownames(colData)=colnames(duo)
> colData$group <- factor(paste0(colData$strain,colData$times))
> colData
strain times group
EZL1.T0 factors 1 factors1
EZL1.T10 factors 2 factors2
EZL1.T20 factors 2 factors2
EZL1.T35 factors 3 factors3
EZL1.T50 factors 3 factors3
EZL1.veg factors 0 factors0
ICL7.T0 controls 1 controls1
ICL7.T5 controls 1 controls1
ICL7.T20 controls 2 controls2
ICL7.T35 controls 3 controls3
ICL7.veg controls 0 controls0
>
> ddsEZL1_ICL7 <-DESeqDataSetFromMatrix(countData= as.matrix(duo), colData= colData, design=~group)
> ddsEZL1_ICL7 <-DESeq(ddsEZL1_ICL7)
estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
>
>
> resVEG<-results(ddsEZL1_ICL7, contrast=c("group", "controls0", "factors0"))
> resVEG<- resVEG[!is.na(resVEG$padj) & resVEG$padj <= 0.05,]
> list_of_differentially_expressed_genes_for_EZL1vsICL7_VEG<-rownames(resVEG)
> length(list_of_differentially_expressed_genes_for_EZL1vsICL7_VEG)
[1] 15

>
> resEARLY <- results(ddsEZL1_ICL7, contrast=c("group", "controls1", "factors1"))
> resEARLY <- resEARLY[!is.na(resEARLY$padj) & resEARLY$padj <= 0.05,]
> list_of_differentially_expressed_genes_for_EZL1vsICL7_EARLY<-rownames(resEARLY)
> length(list_of_differentially_expressed_genes_for_EZL1vsICL7_EARLY)
[1] 412
>
> resINTER <- results(ddsEZL1_ICL7, contrast=c("group", "controls2", "factors2"))
> resINTER <- resINTER[!is.na(resINTER$padj) & resINTER$padj <= 0.05,]
> list_of_differentially_expressed_genes_for_EZL1vsICL7_INTER<-rownames(resINTER)
> length(list_of_differentially_expressed_genes_for_EZL1vsICL7_INTER)
[1] 1087
>
> resLATE <- results(ddsEZL1_ICL7, contrast= c("group", "controls3", "factors3"))
> resLATE <- resLATE[!is.na(resLATE$padj) & resLATE$padj <= 0.05,]
> list_of_differentially_expressed_genes_for_EZL1vsICL7_LATE<-rownames(resLATE)
> length(list_of_differentially_expressed_genes_for_EZL1vsICL7_LATE)
[1] 2921

Note*:
When I do a simple DESeq just for the vegetative state DESeq ignores the condition labels (factor and control) and estimates the variance by treating all samples (EZL1.veg and ICL7.veg) as if they were replicates of the same
condition, in order to estimate the dispersions. But this method gives me different results at the end (much more conservative which is normal):

f <- list("~/EZL1_results/EZL1-veg_mRNA_GCCAAT.TOPHAT.pt_51.pe.sorted.bam_fragments.csv",
"~/ICL7_results/ICL7-veg_RNA_ATCACG.TOPHAT.pt_51.pe.sorted.bam_fragments.csv")
.
.

> str(df_points_EARLY)
'data.frame': 41533 obs. of 2 variables:
$ EZL1.veg: int 104 105 44 0 58 84 111 22 0 105 ...
$ ICL7.veg: int 280 235 106 0 85 446 185 64 1 168 ...

> strain = factor(c(rep("EZL1",1), rep("ICL7",1)), levels=c("EZL1","ICL7"))
> colData = data.frame( strain= strain )
> rownames(colData)=colnames(df_points_EARLY)
> dds<-DESeqDataSetFromMatrix(countData= as.matrix(df_points_EARLY), colData= colData, design=~strain)
> dds <-DESeq(dds)
> resultsNames(dds)
[1] "Intercept" "strainEZL1" "strainICL7"
> resEZL1_ICL7 <- results(dds, contrast=list("strainICL7","strainEZL1"))
> resEZL1_ICL7<- resEZL1_ICL7[!is.na(resEZL1_ICL7$padj) & resEZL1_ICL7$padj <= 0.05,]
> list_of_differentially_expressed_genes_for_EZL1_ICL7_VFG<-rownames(resEZL1_ICL7)
> length(list_of_differentially_expressed_genes_for_EZL1_ICL7_VEG)
[1] 0
EVELE is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 01:57 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO