Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • DESeq2*: Dispersion in a DESeq analysis with design=~group

    Hello,

    I make a differential expression analysis between treated samples (factors) and untreated (controls) at different states (vegetative, early, inter, late).
    The problem is that I lack replicates.

    -For the vegetative state I have no replicates at all.
    Contrast*:
    EZL1.veg (factor) vs ICL7.veg (control)

    -For the early state I have partially replicates (only 2 for the control)
    Contrast*:
    EZL1.T0 (factor) vs ICL7.T0 & ICL7.T5 (control)

    -For the inter state I have partially replicates (only 2 for the factor)
    Contrast*:
    EZL1.T10 & EZL1.T20 vs ICL7.T20

    -For the late state I have partially replicates (only 2 for the factor)
    Contrast*:
    EZL1.T35 & EZL1.T50 vs ICL7.T35

    My questions are :
    1) How Deseq calculates the dispersion in the case of the vegetative state where I have no replicates at all*? (given that I apply a DESEq analysis with design=~group and then I ask results(ddsEZL1_ICL7, contrast=c("group", "controls0", "factors0")))

    2) How Deseq calculates the dispersion in all other cases where I have partially replicates*?


    Thank you in advance for any ideas*!

    Here is the code*:

    # 0:Vegetative, 1:Early, 2:Inter, 3:Late
    > times=factor(c(1, rep(2,2),rep(3,2),0,rep(1,2), rep(2,1),rep(3,1),0), levels=c(0,1,2,3))
    >
    > strain = factor(c(rep("factors", 6), rep("controls",5)), levels=c("controls","factors"))
    >
    > colData = data.frame( strain= strain, times=times)
    > rownames(colData)=colnames(duo)
    > colData$group <- factor(paste0(colData$strain,colData$times))
    > colData
    strain times group
    EZL1.T0 factors 1 factors1
    EZL1.T10 factors 2 factors2
    EZL1.T20 factors 2 factors2
    EZL1.T35 factors 3 factors3
    EZL1.T50 factors 3 factors3
    EZL1.veg factors 0 factors0
    ICL7.T0 controls 1 controls1
    ICL7.T5 controls 1 controls1
    ICL7.T20 controls 2 controls2
    ICL7.T35 controls 3 controls3
    ICL7.veg controls 0 controls0
    >
    > ddsEZL1_ICL7 <-DESeqDataSetFromMatrix(countData= as.matrix(duo), colData= colData, design=~group)
    > ddsEZL1_ICL7 <-DESeq(ddsEZL1_ICL7)
    estimating size factors
    estimating dispersions
    gene-wise dispersion estimates
    mean-dispersion relationship
    final dispersion estimates
    fitting model and testing
    >
    >
    > resVEG<-results(ddsEZL1_ICL7, contrast=c("group", "controls0", "factors0"))
    > resVEG<- resVEG[!is.na(resVEG$padj) & resVEG$padj <= 0.05,]
    > list_of_differentially_expressed_genes_for_EZL1vsICL7_VEG<-rownames(resVEG)
    > length(list_of_differentially_expressed_genes_for_EZL1vsICL7_VEG)
    [1] 15

    >
    > resEARLY <- results(ddsEZL1_ICL7, contrast=c("group", "controls1", "factors1"))
    > resEARLY <- resEARLY[!is.na(resEARLY$padj) & resEARLY$padj <= 0.05,]
    > list_of_differentially_expressed_genes_for_EZL1vsICL7_EARLY<-rownames(resEARLY)
    > length(list_of_differentially_expressed_genes_for_EZL1vsICL7_EARLY)
    [1] 412
    >
    > resINTER <- results(ddsEZL1_ICL7, contrast=c("group", "controls2", "factors2"))
    > resINTER <- resINTER[!is.na(resINTER$padj) & resINTER$padj <= 0.05,]
    > list_of_differentially_expressed_genes_for_EZL1vsICL7_INTER<-rownames(resINTER)
    > length(list_of_differentially_expressed_genes_for_EZL1vsICL7_INTER)
    [1] 1087
    >
    > resLATE <- results(ddsEZL1_ICL7, contrast= c("group", "controls3", "factors3"))
    > resLATE <- resLATE[!is.na(resLATE$padj) & resLATE$padj <= 0.05,]
    > list_of_differentially_expressed_genes_for_EZL1vsICL7_LATE<-rownames(resLATE)
    > length(list_of_differentially_expressed_genes_for_EZL1vsICL7_LATE)
    [1] 2921

    Note*:
    When I do a simple DESeq just for the vegetative state DESeq ignores the condition labels (factor and control) and estimates the variance by treating all samples (EZL1.veg and ICL7.veg) as if they were replicates of the same
    condition, in order to estimate the dispersions. But this method gives me different results at the end (much more conservative which is normal):

    f <- list("~/EZL1_results/EZL1-veg_mRNA_GCCAAT.TOPHAT.pt_51.pe.sorted.bam_fragments.csv",
    "~/ICL7_results/ICL7-veg_RNA_ATCACG.TOPHAT.pt_51.pe.sorted.bam_fragments.csv")
    ….
    ….

    > str(df_points_EARLY)
    'data.frame': 41533 obs. of 2 variables:
    $ EZL1.veg: int 104 105 44 0 58 84 111 22 0 105 ...
    $ ICL7.veg: int 280 235 106 0 85 446 185 64 1 168 ...

    > strain = factor(c(rep("EZL1",1), rep("ICL7",1)), levels=c("EZL1","ICL7"))
    > colData = data.frame( strain= strain )
    > rownames(colData)=colnames(df_points_EARLY)
    > dds<-DESeqDataSetFromMatrix(countData= as.matrix(df_points_EARLY), colData= colData, design=~strain)
    > dds <-DESeq(dds)
    > resultsNames(dds)
    [1] "Intercept" "strainEZL1" "strainICL7"
    > resEZL1_ICL7 <- results(dds, contrast=list("strainICL7","strainEZL1"))
    > resEZL1_ICL7<- resEZL1_ICL7[!is.na(resEZL1_ICL7$padj) & resEZL1_ICL7$padj <= 0.05,]
    > list_of_differentially_expressed_genes_for_EZL1_ICL7_VFG<-rownames(resEZL1_ICL7)
    > length(list_of_differentially_expressed_genes_for_EZL1_ICL7_VEG)
    [1] 0

Latest Articles

Collapse

  • seqadmin
    Essential Discoveries and Tools in Epitranscriptomics
    by seqadmin


    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
    Yesterday, 07:01 AM
  • seqadmin
    Current Approaches to Protein Sequencing
    by seqadmin


    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
    04-04-2024, 04:25 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 04-11-2024, 12:08 PM
0 responses
55 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 10:19 PM
0 responses
51 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 09:21 AM
0 responses
45 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-04-2024, 09:00 AM
0 responses
55 views
0 likes
Last Post seqadmin  
Working...
X