Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Question about different results generated by DESeq.

    Hi,

    I have problem on DESeq, it generates different results using same dataset (without replicate), and almost same code.


    Here is code for the 1st test:

    Code:
    conds <- factor( c( "c0","c6","c12","c24","c48","c72","c96","c120","t0","t6","t12","t24","t48","t72","t96","t120") )
    cds<-newCountDataSet(countsTable, conds)
    cds <- estimateSizeFactors( cds )
    sizeFactors( cds )
    cds <- estimateDispersions( cds, method="blind", sharingMode="fit-only" )
    test0 <- nbinomTest( cds, "c24", "t24" )
    write.table( test0, sep="\t", file="test0.txt" )
    it generates 225 differentially expressed genes with adjust pvalue < 0.05.

    But when i change the code for the 2nd test:

    Code:
    conds <- factor( c( "c0","c6","c12","c24","c48","c72","c96","c120","t0","t6","t12","t24","t48","t72","t96","t120") )
    cds<-newCountDataSet(countsTable, conds)
    cds <- estimateSizeFactors( cds )
    sizeFactors( cds )
    cds2 <- cds[ ,c( "c24", "t24" ) ]
    cds2 <- estimateDispersions( cds2, method="blind", sharingMode="fit-only" )
    test1 <- nbinomTest( cds2, "c24", "t24" )
    write.table( test1, sep="\t", file="test1.txt" )
    it just generates 84 differentially expressed genes.

    The 2nd test just estimate the two samples related to the comparison, but the 1st test estimate all the samples.
    I do not know why they are different, and I want to know which one is more reliable.

    Thanks a lot.

    By the way, before import dataset into DESeq, I usually to remove the genes without expression in each conditon.
    Here is a example.

    cond1 cond2 cond3 treat1 treat2 treat3
    gene1 1 2 3 2 2 0
    gene2 0 0 1 0 0 0
    gene3 0 0 0 0 0 0 (removed)

    For the dataset before, there are 225 DE genes identified at 1st test.
    After removing none-expressed genes, 220 DE genes were identified with adjust pvalue < 0.05.

    I want to know which method is correct.

    Thanks again

  • #2
    Dear Kentnf,

    in a nutshell, you are asking two different questions and so you get two different answers. In the first case, you estimate the dispersions from all samples, in the second, only from the two samples c24 and t24. The dispersions are parameters that DESeq estimates for each gene and that represent the variance that DESeq's error model expects between replicate measurements. It seems that in test0, the dispersions are estimated to be smaller than in test1 - thus slightly smaller fold changes are already deemed significant, leading to more genes.

    Which method is correct? The first point to note is that you are comparing between just two single samples, without replicates, so it is expected that the results are highly volatile. With sufficient number of replicates and good-quality data the differences should be less drastic. So, don't overinterprete your result from just these two samples, and try it also out for other samples and more replicates.

    Second, you are asking two different questions, and it is up to you to decide which is the one you care about. This is not a technical question, it is one that only you can decide.

    Third, ideally it should not matter much. Check the scatterplot of the -log10 (p-values) from the two tests against each other. I'd expect that they broadly agree (on the high values), and that much of the difference in number of genes that you see would be an effect of arbitrary thresholding (i.e. genes that just make it below p_adj<0.05 in one case might be just above in the other).

    About the filtering of genes with overall low counts (i.e. sum of counts across all samples is so small that the gene anyway could never be significantly differently expressed): this is good practice. See Section 5 of the DESeq vignette.
    Wolfgang Huber
    EMBL

    Comment


    • #3
      Hi, Wolfgang Huber

      I make a comparison between test0 and test1. All the DE genes in test1 are included in test0. I think that is the differnet variances on different estimations you metioned. Because our experiment has problem on design. So we can not add more replicates. Base on your suggestions and the current results, I'd like to use test1 code to perform analysis for it show more reliable than test0. And I should remove the genes without expression.

      If there is any problem, please give me some advice.

      Thanks again for your help.

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Strategies for Sequencing Challenging Samples
        by seqadmin


        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
        03-22-2024, 06:39 AM
      • seqadmin
        Techniques and Challenges in Conservation Genomics
        by seqadmin



        The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

        Avian Conservation
        Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
        03-08-2024, 10:41 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, Yesterday, 06:37 PM
      0 responses
      11 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, Yesterday, 06:07 PM
      0 responses
      10 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-22-2024, 10:03 AM
      0 responses
      51 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-21-2024, 07:32 AM
      0 responses
      68 views
      0 likes
      Last Post seqadmin  
      Working...
      X