DESeq2*: Dispersion in a DESeq analysis with design=~group

EVELE

Junior Member

Join Date: Jun 2015

Posts: 9
- Share
- Tweet
#1

DESeq2*: Dispersion in a DESeq analysis with design=~group

10-01-2015, 07:33 AM

Hello,

I make a differential expression analysis between treated samples (factors) and untreated (controls) at different states (vegetative, early, inter, late).
The problem is that I lack replicates.

-For the vegetative state I have no replicates at all.
Contrast*:
EZL1.veg (factor) vs ICL7.veg (control)

-For the early state I have partially replicates (only 2 for the control)
Contrast*:
EZL1.T0 (factor) vs ICL7.T0 & ICL7.T5 (control)

-For the inter state I have partially replicates (only 2 for the factor)
Contrast*:
EZL1.T10 & EZL1.T20 vs ICL7.T20

-For the late state I have partially replicates (only 2 for the factor)
Contrast*:
EZL1.T35 & EZL1.T50 vs ICL7.T35

My questions are :
1) How Deseq calculates the dispersion in the case of the vegetative state where I have no replicates at all*? (given that I apply a DESEq analysis with design=~group and then I ask results(ddsEZL1_ICL7, contrast=c("group", "controls0", "factors0")))

2) How Deseq calculates the dispersion in all other cases where I have partially replicates*?

Thank you in advance for any ideas*!

Here is the code*:

# 0:Vegetative, 1:Early, 2:Inter, 3:Late
> times=factor(c(1, rep(2,2),rep(3,2),0,rep(1,2), rep(2,1),rep(3,1),0), levels=c(0,1,2,3))
>
> strain = factor(c(rep("factors", 6), rep("controls",5)), levels=c("controls","factors"))
>
> colData = data.frame( strain= strain, times=times)
> rownames(colData)=colnames(duo)
> colData$group <- factor(paste0(colData$strain,colData$times))
> colData
strain times group
EZL1.T0 factors 1 factors1
EZL1.T10 factors 2 factors2
EZL1.T20 factors 2 factors2
EZL1.T35 factors 3 factors3
EZL1.T50 factors 3 factors3
EZL1.veg factors 0 factors0
ICL7.T0 controls 1 controls1
ICL7.T5 controls 1 controls1
ICL7.T20 controls 2 controls2
ICL7.T35 controls 3 controls3
ICL7.veg controls 0 controls0
>
> ddsEZL1_ICL7 <-DESeqDataSetFromMatrix(countData= as.matrix(duo), colData= colData, design=~group)
> ddsEZL1_ICL7 <-DESeq(ddsEZL1_ICL7)
estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
>
>
> resVEG<-results(ddsEZL1_ICL7, contrast=c("group", "controls0", "factors0"))
> resVEG<- resVEG[!is.na(resVEG$padj) & resVEG$padj <= 0.05,]
> list_of_differentially_expressed_genes_for_EZL1vsICL7_VEG<-rownames(resVEG)
> length(list_of_differentially_expressed_genes_for_EZL1vsICL7_VEG)
[1] 15
>
> resEARLY <- results(ddsEZL1_ICL7, contrast=c("group", "controls1", "factors1"))
> resEARLY <- resEARLY[!is.na(resEARLY$padj) & resEARLY$padj <= 0.05,]
> list_of_differentially_expressed_genes_for_EZL1vsICL7_EARLY<-rownames(resEARLY)
> length(list_of_differentially_expressed_genes_for_EZL1vsICL7_EARLY)
[1] 412
>
> resINTER <- results(ddsEZL1_ICL7, contrast=c("group", "controls2", "factors2"))
> resINTER <- resINTER[!is.na(resINTER$padj) & resINTER$padj <= 0.05,]
> list_of_differentially_expressed_genes_for_EZL1vsICL7_INTER<-rownames(resINTER)
> length(list_of_differentially_expressed_genes_for_EZL1vsICL7_INTER)
[1] 1087
>
> resLATE <- results(ddsEZL1_ICL7, contrast= c("group", "controls3", "factors3"))
> resLATE <- resLATE[!is.na(resLATE$padj) & resLATE$padj <= 0.05,]
> list_of_differentially_expressed_genes_for_EZL1vsICL7_LATE<-rownames(resLATE)
> length(list_of_differentially_expressed_genes_for_EZL1vsICL7_LATE)
[1] 2921

Note*:
When I do a simple DESeq just for the vegetative state DESeq ignores the condition labels (factor and control) and estimates the variance by treating all samples (EZL1.veg and ICL7.veg) as if they were replicates of the same
condition, in order to estimate the dispersions. But this method gives me different results at the end (much more conservative which is normal):

f <- list("~/EZL1_results/EZL1-veg_mRNA_GCCAAT.TOPHAT.pt_51.pe.sorted.bam_fragments.csv",
"~/ICL7_results/ICL7-veg_RNA_ATCACG.TOPHAT.pt_51.pe.sorted.bam_fragments.csv")
….
….

> str(df_points_EARLY)
'data.frame': 41533 obs. of 2 variables:
$ EZL1.veg: int 104 105 44 0 58 84 111 22 0 105 ...
$ ICL7.veg: int 280 235 106 0 85 446 185 64 1 168 ...

> strain = factor(c(rep("EZL1",1), rep("ICL7",1)), levels=c("EZL1","ICL7"))
> colData = data.frame( strain= strain )
> rownames(colData)=colnames(df_points_EARLY)
> dds<-DESeqDataSetFromMatrix(countData= as.matrix(df_points_EARLY), colData= colData, design=~strain)
> dds <-DESeq(dds)
> resultsNames(dds)
[1] "Intercept" "strainEZL1" "strainICL7"
> resEZL1_ICL7 <- results(dds, contrast=list("strainICL7","strainEZL1"))
> resEZL1_ICL7<- resEZL1_ICL7[!is.na(resEZL1_ICL7$padj) & resEZL1_ICL7$padj <= 0.05,]
> list_of_differentially_expressed_genes_for_EZL1_ICL7_VFG<-rownames(resEZL1_ICL7)
> length(list_of_differentially_expressed_genes_for_EZL1_ICL7_VEG)
[1] 0
Tags: None

Previous template Next

Essential Discoveries and Tools in Epitranscriptomics

by seqadmin

The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
- Channel: Articles
Yesterday, 07:01 AM
Current Approaches to Protein Sequencing

by seqadmin

Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
- Channel: Articles
04-04-2024, 04:25 PM

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 55 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 51 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 45 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 55 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

DESeq2*: Dispersion in a DESeq analysis with design=~group

Latest Articles

ad_right_rmr

News