Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • EdgeR question

    Hi all,

    I was using EdgeR to perform a DE analysis and following the document http://cgrlucb.wikispaces.com/file/v...R_Tutorial.pdf.

    Essentially edgeR tutorial.pdf.

    I did an exact test on my samples using cmn, tgw and poi.

    de.poi <- exactTest( cds , dispersion = 1e-06 , pair = c( "Control" , "Infected" ) )

    I have also created a toptags table for all three, cmn , tgw and poi.

    > resultsTbl.cmn <- topTags( de.cmn , n = nrow( de.cmn$table ) )$table
    > resultsTbl.tgw <- topTags( de.tgw , n = nrow( de.tgw$table ) )$table
    > resultsTbl.poi <- topTags( de.poi , n = nrow( de.poi$table ) )$table

    Here is my question or rather questions:

    1. My exact test values for cmn and tgw are exactly the same !!!!! I plotted a variance plot and the

    plot for cmn (solid blue line) and tgw (lt.blue dots) are exactly on each other (as if superimposed). So I believe I dont have a tagwise dispersion. Anyone has had this experience???

    2. when I try to perform a significance level matching to 0.05 I get nothing !!!

    > de.genes.cmn <- rownames( resultsTbl.cmn )[ resultsTbl.cmn$adj.P.Val <= 0.05 ]
    > de.genes.tgw <- rownames( resultsTbl.tgw )[ resultsTbl.tgw$adj.P.Val <= 0.05 ]
    > de.genes.poi <- rownames( resultsTbl.poi )[ resultsTbl.poi$adj.P.Val <= 0.05 ]
    > head(de.genes.cmn)
    character(0)
    > de.genes.cmn
    character(0)
    ### I do have values in my resultsTbl, see below !!!!! Also you can see how my resultsTbl values for cmn and tgw are exactly the same !! as discussed in Q1.
    > head(resultsTbl.cmn)
    logFC logCPM PValue FDR
    ENSBTAG00000029982 5.80 7.59 5.66e-11 1.30e-08
    ENSBTAG00000036418 7.88 5.63 1.56e-06 1.78e-04
    ENSBTAG00000036410 1.34 16.34 7.29e-06 5.57e-04
    ENSBTAG00000036423 2.29 10.97 7.59e-05 4.34e-03
    ENSBTAG00000029762 1.68 10.61 2.66e-04 1.22e-02
    ENSBTAG00000037319 -1.92 4.41 3.48e-04 1.33e-02
    > de.genes.tgw
    character(0)
    > head(resultsTbl.tgw)
    logFC logCPM PValue FDR
    ENSBTAG00000029982 5.80 7.59 5.66e-11 1.30e-08
    ENSBTAG00000036418 7.88 5.63 1.56e-06 1.78e-04
    ENSBTAG00000036410 1.34 16.34 7.29e-06 5.57e-04
    ENSBTAG00000036423 2.29 10.97 7.59e-05 4.34e-03
    ENSBTAG00000029762 1.68 10.61 2.66e-04 1.22e-02
    ENSBTAG00000037319 -1.92 4.41 3.48e-04 1.33e-02
    > head(resultsTbl.poi)
    logFC logCPM PValue FDR
    ENSBTAG00000036423 1.9671 11.0 0 0
    ENSBTAG00000029957 1.1314 15.6 0 0
    ENSBTAG00000036410 1.1289 16.3 0 0
    ENSBTAG00000029797 0.7093 15.7 0 0
    ENSBTAG00000029897 0.3867 15.5 0 0
    ENSBTAG00000029804 0.0894 18.3 0 0
    Can anyone throw some light on this please?

    Much appreciated,

    geneart.

  • #2
    1. Maybe a typo?
    2. There is no adj.P.Val column in your resultsTbl.cmn

    Comment


    • #3
      Cross-posted on biostars, where my reply was exactly the same as TiborNagy's (great minds think alike, as they say).

      Comment


      • #4
        dpryan ......you are correct
        I did post it on BioStars as well. Here is the same reply

        >group <- c(rep("Control", 7), rep("Infected", 8))

        > cds <- DGEList( counts , group = group )

        > cds <- estimateCommonDisp( cds )

        ## To estimate the gene-wise or tagwise dispersions:
        ## 50/(#samples - #groups) for me it will be 50/(15-2)=3.85
        ## None of the tagwise prior.n values work ! Error is unused argument(prior.n). So ran it without any prior.n

        > cds <- estimateTagwiseDisp( cds )

        ## Above the cds called after estimateCommonDisp is the exactly the same as called after tagwise disp. when I gave it as cdst <- estimateTagwiseDisp( cds ) and called cdst it is still the same. The cds is

        >de.cmn <- exactTest( cds , pair = c( "Control" , "Infected" ) )

        > de.tgw <- exactTest( cds , pair = c( "Control" , "Infected" ) )

        #thi sis what edgeR.Tutorial document says = " the codes below gives full tables of the adjusted p-values taking into the FDR into consideration". but the exact test table I have generated as above ha sno adj-pvalue column.
        > resultsTbl.cmn <- topTags( de.cmn , n = nrow( de.cmn$table ) )$table
        > resultsTbl.tgw <- topTags( de.tgw , n = nrow( de.tgw$table ) )$table
        > resultsTbl.poi <- topTags( de.poi , n = nrow( de.poi$table ) )$table

        So as per the suggestion this is what I did:

        > de.genes.cmn <- rownames( resultsTbl.cmn )[ resultsTbl.cmn$FDR <= 0.05 ] # command a

        > de.genes.tgw <- rownames( resultsTbl.tgw )[ resultsTbl.tgw$FDR <= 0.05 ]# command b

        I get 9 genes names whci is exactly the same in both of the above a and b commands.

        When I dont give a FDR cutoff I get just the first gene name but reapeated over 11 times.

        Any suggestions?

        Thanks for the quick reply !

        geneart

        Comment


        • #5
          dpryan from BioStars reply

          Sorry dpryan, my posting limit exceeded on BioStars. hence posting on seqanswers. Pertaining to your previous comment on BioStars here is what Idid:
          Yes I checked it again:

          > cds$tagwise.dispersion
          [1] 0.3405 0.1987 0.2989 0.3842 0.2163 0.4643 1.0963 0.1417 0.5641 0.1650 0.6199 0.1217 0.2758 0.2864 0.2829 0.3963 0.6174 0.8162 0.2958 0.9271 0.0795 0.1182 0.5734
          [24] 0.2117 0.3894 0.1424 0.1387 0.2847 0.2087 0.3717 0.1219 0.1863 0.2097 0.1347 0.8008 0.7646 0.4709 0.1363 0.6349 0.2105 0.1794 0.6508 0.6073 0.1820 0.1976 0.3433
          [47] 0.1194 0.2371 0.1393 0.2510 0.3435 0.3500 0.2621 0.1364 0.1260 0.1752 0.4128 0.1982 0.0961 0.5742 0.8530 0.2668 0.1359 0.1633 0.2440 0.9827 0.2126 0.1233 0.2120
          [70] 0.5806 0.2625 0.2943 0.3761 0.7025 0.1424 0.2013 0.5551 0.2082 0.3417 0.3503 0.1326 0.2509 0.3540 0.8389 0.3465 0.0977 0.2410 0.3152 0.1930 0.1794 0.1508 0.1216
          [93] 0.2277 0.2091 0.1532 0.1557 0.4655 0.6894 0.1791 0.1657 0.3632 0.2460 0.4759 0.5413 1.1285 0.1907 0.9906 0.2538 0.4302 0.3361 0.2570 0.6049 1.9337 0.1939 0.6012
          [116] 0.1012 0.3754 0.1280 0.2109 1.1705 0.1423 0.2056 0.1245 0.3114 0.6784 0.4208 0.1177 0.1848 0.3889 0.3704 0.3906 0.1103 0.5318 0.4882 0.0990 0.3076 0.4613 0.1877
          [139] 0.1805 0.3438 0.3041 0.2196 0.4297 1.2599 0.4450 0.1208 0.3876 0.1750 1.2860 0.5807 0.4078 0.1837 0.5938 0.1830 1.1755 0.8721 0.1360 0.1828 0.1683 0.1546 1.1303
          [162] 0.8419 0.2644 0.1508 0.1451 0.1587 0.5744 0.2952 0.5174 0.2671 0.9109 1.1452 0.4556 0.2462 0.5903 0.1165 0.4633 0.3236 0.1999 0.7068 1.0595 0.6913 0.2623 0.4903
          [185] 0.7383 0.3196 0.2542 0.6282 0.1982 0.7881 0.6785 0.2566 0.2859 0.3139 0.4506 0.4187 0.4028 1.6156 1.4601 0.2744 0.3683 0.4396 0.9248 0.2180 0.3605 0.5359 0.8855
          [208] 0.2104 1.4975 1.9078 0.6379 0.1108 0.3028 0.1635 0.2775 0.3327 0.6321 0.2295 1.6538 0.7243 0.2548 0.2582 0.7662 0.7847 0.8835 0.3193 0.8189 0.3797 0.6885
          > cds$common.dispersion
          [1] 0.272

          so it looks like it is around the common dispersion. I dont see any HUGE jump of any sort in the value except for a few whcih are 1.05 1.17 etc which accounts to 11 of them being > 1.1 value.

          However i had also tried doing the code below, but got an error.Hence took off the common.disp=F and reran the same.

          > de.tgw <- exactTest( cdst , common.disp = FALSE , pair = c( "Control" , "Infected" ) )
          Error in exactTest(cds, common.disp = FALSE, pair = c("Control", "Infected")) :
          unused argument (common.disp = FALSE)

          So it does look like it is using the common.disp estimate. Am I worng?

          geneart

          Comment


          • #6
            Compare the results of:
            Code:
            res.cds <- exactTest(cds, dispersion="common")
            res.tgw <- exactTest(cds)
            If they're the same then you know that the dispersion estimate method isn't having a big effect.

            Comment


            • #7
              Ok. I just checked . They are exactly the same.
              > res.cds <- exactTest(cds, dispersion="common")
              > res.cds
              An object of class "DGEExact"
              $table
              logFC logCPM PValue
              ENSBTAG00000029762 1.678 10.61 4.87e-05
              ENSBTAG00000029763 -0.360 12.59 3.57e-01
              ENSBTAG00000029764 0.539 6.88 3.28e-01
              ENSBTAG00000029765 0.394 2.64 4.19e-01
              ENSBTAG00000029768 1.187 11.53 3.31e-03
              224 more rows ...

              $comparison
              [1] "Control" "Infected"

              $genes
              NULL

              > res.tgw <- exactTest(cds)
              > res.tgw
              An object of class "DGEExact"
              $table
              logFC logCPM PValue
              ENSBTAG00000029762 1.681 10.61 0.000266
              ENSBTAG00000029763 -0.360 12.59 0.281573
              ENSBTAG00000029764 0.533 6.88 0.331116
              ENSBTAG00000029765 0.504 2.64 0.431863
              ENSBTAG00000029768 1.186 11.53 0.000991
              224 more rows ...

              $comparison
              [1] "Control" "Infected"

              $genes
              NULL
              if that is the case why di dI get only 8 gene names to be same earlier when I ran the command I have previously uploaded.
              Also does this mean that for my dataset I cannot use classic edgeR method? has this happened to anyone before? I am trying to understand the concepts here , so please bear with me
              Thanks
              geneart

              Comment


              • #8
                Note the p-values, they're quite different, as one would expect! Thus, the dispersion has a notable effect, which again one would expect. Now whether this translates to an effect at the adjusted p-value level is another matter...

                By "classic edgeR" I assume you mean exactTest() vs. using the GLM-based methods. With a 2-group design it'd make more sense to just use exactTest().

                Comment


                • #9
                  Ok thanks very much for all your advice and help ! Appreciate it !
                  geneart.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Essential Discoveries and Tools in Epitranscriptomics
                    by seqadmin




                    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                    04-22-2024, 07:01 AM
                  • seqadmin
                    Current Approaches to Protein Sequencing
                    by seqadmin


                    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                    04-04-2024, 04:25 PM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, Yesterday, 11:49 AM
                  0 responses
                  13 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-24-2024, 08:47 AM
                  0 responses
                  16 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-11-2024, 12:08 PM
                  0 responses
                  61 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 10:19 PM
                  0 responses
                  60 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X