Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • DESeq2 vs EdgeR

    Hello

    I am working with RNA-seq data and I am evaluating the results from the DESeq2 and EdgeR programs. The number of genes displaying significant differences in expression level between the two conditions is 143 for edgeR and 183 for DESeq2. From these genes, a total of 67 genes are found in common with both packages. My problem now is that I don’t know exactly which results are better in order to continue with the functional enrichment analysis, do I have to continue the analysis with the 67 genes found in common with both packages or do I have to select the result from one of the two packages? Why are we obtaining these different results? can anyone suggest me a solution.

    Thank you in advance.

  • #2
    Somewhat different algorithms will always produce somewhat different results (especially at the margins). To find which is modeling your data better you need to validate some of the candidates in independent samples.

    Comment


    • #3
      The main aspect driving the differences in statistical results will be differences in the normalizations. Different normalizations will always give different statistical results, even when analyzed with the exact same statistical test.

      Personally, I see this as the single biggest issue with RNA-seq differential gene analysis right now - what exactly are the best or most appropriate normalizations amongst all the possible choices? I have no answer to that myself but such best practices take time to mature as they are based on the data and experiences of different researchers accumulated over time.

      If you really want to know, as dpryan says, you will need to independently validate some or all of those genes to actually know which algorithm made the best call with your data.

      And keep in mind too that they both in fact may be correct, just that one missed identifying significance for some genes that the other algorithm did detect, and vice versa (and they both independently have some probability of including false positives).
      Last edited by mbblack; 05-07-2014, 12:22 PM.
      Michael Black, Ph.D.
      ScitoVation LLC. RTP, N.C.

      Comment


      • #4
        Just to compare notes, for me the two packages give quite similar results. DESeq2 gives more DE genes than EdgeR, but most of the DE genes found in EdgeR were also found in DESeq2 (e.g. 187 DEGs by DESeq2 and 128 by EdgeR, while 114 of them were shared by the two packages).

        Comment


        • #5
          I have taken a look to the tables from the genes only expressed in DESeq2 in EdgeR and the genes only expressed in EdgeR in DESeq2 and I obtain the following tables:

          If we look at the genes that are only expressed for DESeq2 in the EdgeR results (Table Results EdgeR with the genes only expressed in DESeq2) we can see that for these genes the pvalue is under 0.05 but the corrected pvalue (FDR) doesnt pass the threshold (the pvalue correction in EdgeR could be more strict that in DESeq2, and the normalization is not the same) .

          Table Results EdgeR with the genes only expressed in DESeq2 (I have only put the header):

          "logFC" "logCPM" "LR" "PValue" "FDR"
          "ABCB11" 2.70009501398543 -0.812950846181629 9.28826577686121 0.0023062639239043 0.120703796922402
          "ABHD3" 1.80726603821143 3.87569902849993 10.5103047792729 0.00118710674081598 0.0898251203253952
          "ACVR1B" 1.95071588040721 4.79012263716339 12.3942652054322 0.000430654427765324 0.0555072848762002
          "AP3M2" 2.09214292097354 3.54274479838376 12.5198574900624 0.000402649599527972 0.0533105203233444
          "APOBEC1" -4.42510474269716 -0.538114289215075 12.6757609252969 0.000370426433341501 0.0521556391770558
          "BSN" 3.08519024535684 -1.48696620016766 8.96423129320631 0.00275316513444176 0.130634155930371
          "CD22" 1.74415634003026 2.78254401675942 12.5164766543508 0.000403378852790388 0.0533105203233444
          "CDKN2C" -2.00314888159426 0.363422409035713 11.161939694721 0.000834925818871797 0.0767036283316015
          "CGREF1" 1.55829922963731 0.756009605568283 8.12893646242891 0.00435642941567818 0.159262899208925
          "CHST1" 2.95569845631565 0.947331235690645 10.1473146292035 0.00144511599622538 0.0964473130238196
          "CHST2" 1.84409597853766 1.69094471995424 9.14165529890621 0.00249854956440236 0.125689171970819
          "CHST7" 1.57683443348859 3.02322437030117 11.8155213765039 0.000587389814943334 0.0642088942436159
          "CLSPN" -1.63160079616297 2.71048287120957 12.0868460117261 0.000507788000662539 0.0594640460838832
          "CSF2RB" 1.78551118880298 12.4432622524187 10.4854578044528 0.00120317801089876 0.0906125951405643
          "CXCL12" 4.63394665930228 -1.80734480258569 12.1495005948364 0.000491010105153921 0.0588135770064421


          If we look at the genes that are only expressed for EdgeR in DESeq2 results (Table Results DESeq2 with the genes only expressed in edgeR) the results are a little strange, because I dont know why they put a NA in the padj value. Someone can explain it to me???.

          Table Results DESeq2 with the genes only expressed in edgeR (I have only put the header):

          baseMean log2FoldChange lfcSE stat pvalue padj
          BRCA2 1131,936674 -1,286625954 0,302162213 -4,258063714 2,06E-05 0,006165843
          COLEC11 2,97265546 5,792016822 1,694515916 3,418095262 0,00063061 NA
          CPLX2 14,14063581 7,571127774 1,986647905 3,811006346 NA NA
          CRISP3 439,2054731 5,183023709 1,244040615 4,166281748 NA NA
          CSF3R 2817,505736 5,246506906 1,157429328 4,532896115 NA NA
          CYP4F3 1463,288138 4,840302667 1,146021976 4,223568805 NA NA
          ENSOARG00000001857 5,329378382 7,092612267 1,862899117 3,807298099 NA NA
          ENSOARG00000002444 1861,652753 5,003687627 1,104419178 4,530605521 NA NA
          ENSOARG00000004774 5107,615576 4,872729512 1,136706275 4,286709436 NA NA
          ENSOARG00000006661 2988,708644 3,814853671 0,916926496 4,16047926 NA NA
          ENSOARG00000018404 5,707612456 -5,500830354 1,658048887 -3,317652693 0,000907773 0,053489179
          ENSOARG00000019477 14233,96538 5,568951629 1,175487519 4,737567638 NA NA
          ENSOARG00000019492 2079,134735 5,686778178 1,178773231 4,824319071 NA NA
          ENSOARG00000022100 6,736323581 5,403701239 1,397843402 3,865741493 NA NA
          EPS8 589,3540111 -1,166056327 0,280259158 -4,160635942 3,17E-05 0,007544404
          F2R 691,7335564 1,31670219 0,224258476 5,871359762 4,32E-09 2,16E-05
          FAM166A 745,7706859 -1,152588328 0,278839623 -4,133517026 3,57E-05 0,0080606

          Comment


          • #6
            because I dont know why they put a NA in the padj value
            this is answered in the Frequently Asked Questions in the DESeq2 vignette:

            The Bioconductor project aims to develop and share open source software for precise and repeatable analysis of biological data. We foster an inclusive and collaborative community of developers and data scientists.


            See section 1.4.2:

            "Note that some values in the results table can be set to NA, for either one of the following reasons..."

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM
            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            27 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            31 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 09:21 AM
            0 responses
            27 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-04-2024, 09:00 AM
            0 responses
            52 views
            0 likes
            Last Post seqadmin  
            Working...
            X