Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • asuav
    Junior Member
    • May 2014
    • 2

    DESeq2 vs EdgeR

    Hello

    I am working with RNA-seq data and I am evaluating the results from the DESeq2 and EdgeR programs. The number of genes displaying significant differences in expression level between the two conditions is 143 for edgeR and 183 for DESeq2. From these genes, a total of 67 genes are found in common with both packages. My problem now is that I don’t know exactly which results are better in order to continue with the functional enrichment analysis, do I have to continue the analysis with the 67 genes found in common with both packages or do I have to select the result from one of the two packages? Why are we obtaining these different results? can anyone suggest me a solution.

    Thank you in advance.
  • dpryan
    Devon Ryan
    • Jul 2011
    • 3478

    #2
    Somewhat different algorithms will always produce somewhat different results (especially at the margins). To find which is modeling your data better you need to validate some of the candidates in independent samples.

    Comment

    • mbblack
      Senior Member
      • Aug 2009
      • 245

      #3
      The main aspect driving the differences in statistical results will be differences in the normalizations. Different normalizations will always give different statistical results, even when analyzed with the exact same statistical test.

      Personally, I see this as the single biggest issue with RNA-seq differential gene analysis right now - what exactly are the best or most appropriate normalizations amongst all the possible choices? I have no answer to that myself but such best practices take time to mature as they are based on the data and experiences of different researchers accumulated over time.

      If you really want to know, as dpryan says, you will need to independently validate some or all of those genes to actually know which algorithm made the best call with your data.

      And keep in mind too that they both in fact may be correct, just that one missed identifying significance for some genes that the other algorithm did detect, and vice versa (and they both independently have some probability of including false positives).
      Last edited by mbblack; 05-07-2014, 12:22 PM.
      Michael Black, Ph.D.
      ScitoVation LLC. RTP, N.C.

      Comment

      • Yvone
        Junior Member
        • Oct 2013
        • 6

        #4
        Just to compare notes, for me the two packages give quite similar results. DESeq2 gives more DE genes than EdgeR, but most of the DE genes found in EdgeR were also found in DESeq2 (e.g. 187 DEGs by DESeq2 and 128 by EdgeR, while 114 of them were shared by the two packages).

        Comment

        • asuav
          Junior Member
          • May 2014
          • 2

          #5
          I have taken a look to the tables from the genes only expressed in DESeq2 in EdgeR and the genes only expressed in EdgeR in DESeq2 and I obtain the following tables:

          If we look at the genes that are only expressed for DESeq2 in the EdgeR results (Table Results EdgeR with the genes only expressed in DESeq2) we can see that for these genes the pvalue is under 0.05 but the corrected pvalue (FDR) doesnt pass the threshold (the pvalue correction in EdgeR could be more strict that in DESeq2, and the normalization is not the same) .

          Table Results EdgeR with the genes only expressed in DESeq2 (I have only put the header):

          "logFC" "logCPM" "LR" "PValue" "FDR"
          "ABCB11" 2.70009501398543 -0.812950846181629 9.28826577686121 0.0023062639239043 0.120703796922402
          "ABHD3" 1.80726603821143 3.87569902849993 10.5103047792729 0.00118710674081598 0.0898251203253952
          "ACVR1B" 1.95071588040721 4.79012263716339 12.3942652054322 0.000430654427765324 0.0555072848762002
          "AP3M2" 2.09214292097354 3.54274479838376 12.5198574900624 0.000402649599527972 0.0533105203233444
          "APOBEC1" -4.42510474269716 -0.538114289215075 12.6757609252969 0.000370426433341501 0.0521556391770558
          "BSN" 3.08519024535684 -1.48696620016766 8.96423129320631 0.00275316513444176 0.130634155930371
          "CD22" 1.74415634003026 2.78254401675942 12.5164766543508 0.000403378852790388 0.0533105203233444
          "CDKN2C" -2.00314888159426 0.363422409035713 11.161939694721 0.000834925818871797 0.0767036283316015
          "CGREF1" 1.55829922963731 0.756009605568283 8.12893646242891 0.00435642941567818 0.159262899208925
          "CHST1" 2.95569845631565 0.947331235690645 10.1473146292035 0.00144511599622538 0.0964473130238196
          "CHST2" 1.84409597853766 1.69094471995424 9.14165529890621 0.00249854956440236 0.125689171970819
          "CHST7" 1.57683443348859 3.02322437030117 11.8155213765039 0.000587389814943334 0.0642088942436159
          "CLSPN" -1.63160079616297 2.71048287120957 12.0868460117261 0.000507788000662539 0.0594640460838832
          "CSF2RB" 1.78551118880298 12.4432622524187 10.4854578044528 0.00120317801089876 0.0906125951405643
          "CXCL12" 4.63394665930228 -1.80734480258569 12.1495005948364 0.000491010105153921 0.0588135770064421


          If we look at the genes that are only expressed for EdgeR in DESeq2 results (Table Results DESeq2 with the genes only expressed in edgeR) the results are a little strange, because I dont know why they put a NA in the padj value. Someone can explain it to me???.

          Table Results DESeq2 with the genes only expressed in edgeR (I have only put the header):

          baseMean log2FoldChange lfcSE stat pvalue padj
          BRCA2 1131,936674 -1,286625954 0,302162213 -4,258063714 2,06E-05 0,006165843
          COLEC11 2,97265546 5,792016822 1,694515916 3,418095262 0,00063061 NA
          CPLX2 14,14063581 7,571127774 1,986647905 3,811006346 NA NA
          CRISP3 439,2054731 5,183023709 1,244040615 4,166281748 NA NA
          CSF3R 2817,505736 5,246506906 1,157429328 4,532896115 NA NA
          CYP4F3 1463,288138 4,840302667 1,146021976 4,223568805 NA NA
          ENSOARG00000001857 5,329378382 7,092612267 1,862899117 3,807298099 NA NA
          ENSOARG00000002444 1861,652753 5,003687627 1,104419178 4,530605521 NA NA
          ENSOARG00000004774 5107,615576 4,872729512 1,136706275 4,286709436 NA NA
          ENSOARG00000006661 2988,708644 3,814853671 0,916926496 4,16047926 NA NA
          ENSOARG00000018404 5,707612456 -5,500830354 1,658048887 -3,317652693 0,000907773 0,053489179
          ENSOARG00000019477 14233,96538 5,568951629 1,175487519 4,737567638 NA NA
          ENSOARG00000019492 2079,134735 5,686778178 1,178773231 4,824319071 NA NA
          ENSOARG00000022100 6,736323581 5,403701239 1,397843402 3,865741493 NA NA
          EPS8 589,3540111 -1,166056327 0,280259158 -4,160635942 3,17E-05 0,007544404
          F2R 691,7335564 1,31670219 0,224258476 5,871359762 4,32E-09 2,16E-05
          FAM166A 745,7706859 -1,152588328 0,278839623 -4,133517026 3,57E-05 0,0080606

          Comment

          • Michael Love
            Senior Member
            • Jul 2013
            • 333

            #6
            because I dont know why they put a NA in the padj value
            this is answered in the Frequently Asked Questions in the DESeq2 vignette:

            The Bioconductor project aims to develop and share open source software for precise and repeatable analysis of biological data. We foster an inclusive and collaborative community of developers and data scientists.


            See section 1.4.2:

            "Note that some values in the results table can be set to NA, for either one of the following reasons..."

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Pathogen Surveillance with Advanced Genomic Tools
              by seqadmin




              The COVID-19 pandemic highlighted the need for proactive pathogen surveillance systems. As ongoing threats like avian influenza and newly emerging infections continue to pose risks, researchers are working to improve how quickly and accurately pathogens can be identified and tracked. In a recent SEQanswers webinar, two experts discussed how next-generation sequencing (NGS) and machine learning are shaping efforts to monitor viral variation and trace the origins of infectious...
              03-24-2025, 11:48 AM
            • seqadmin
              New Genomics Tools and Methods Shared at AGBT 2025
              by seqadmin


              This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

              The Headliner
              The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
              03-03-2025, 01:39 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 03-20-2025, 05:03 AM
            0 responses
            49 views
            0 reactions
            Last Post seqadmin  
            Started by seqadmin, 03-19-2025, 07:27 AM
            0 responses
            57 views
            0 reactions
            Last Post seqadmin  
            Started by seqadmin, 03-18-2025, 12:50 PM
            0 responses
            50 views
            0 reactions
            Last Post seqadmin  
            Started by seqadmin, 03-03-2025, 01:15 PM
            0 responses
            201 views
            0 reactions
            Last Post seqadmin  
            Working...