Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How does cuffdiff call a difference "significant"?

    I have run cuffdiff with FDR of 5% and a lot (~7%) of genes (in gene_exp.diff) whose "p_value" column is < 0.05 have the "significant" column set to "no".

    Can somebody explain the discrepancy and how does cufflink call a difference "significant" if not based on p value?

    Thanx

  • #2
    Originally posted by marcora View Post
    I have run cuffdiff with FDR of 5% and a lot (~7%) of genes (in gene_exp.diff) whose "p_value" column is < 0.05 have the "significant" column set to "no".

    Can somebody explain the discrepancy and how does cufflink call a difference "significant" if not based on p value?

    Thanx
    Never mind. By re-reading cuffdiff documentation I realize that the "p_value" column is not adjusted for multiple testing.

    I am not sure what the "test_stat" column represent, but it would nice if cuffdiff could also output a "p_adj" column with the p value adjusted for multiple testing (the same way DESeq does). That way the FDR parameter of cuffdiff becomes obsolete and one could compute significance at various FDRs without having to rerun cuffdiff (which can take a long time with multiple replicates).

    Thanx!

    Comment


    • #3
      Originally posted by marcora View Post
      That way the FDR parameter of cuffdiff becomes obsolete and one could compute significance at various FDRs without having to rerun cuffdiff (which can take a long time with multiple replicates).
      You could, of course, read in cuffdiff's output file with R (with the read.table function) and then run p.adjust (with method="BH") on the p value column. (I don't have a cuffdiff output file at hand, so I can't tell you the exact commands, but it should be only two lines.)

      If you do that and compare the output of DESeq and cuffdiff from your data set, I'd be interested to hear about your findings.

      Simon

      Comment


      • #4
        Thanx Simon, I am still a newbie to R!

        I will apply the p.adjust function to cuffdiff output and let you know. So far (by leaving cuffdiff FDR param to its default of 0.05 and comparing its output with deseq output where significant is padj < 0.05) it appears that deseq is either more sensitive (or more sloppy) at assigning significance... since deseq significantly DE genes are about double those found significant by cuffdiff.

        I will get back to you asap once I get to the padj in cuffdiff!

        FYI: I am comparing WT vs KO with 4 biological replicates each, one run for each replica at 25-30 million reads each

        Comment


        • #5
          Originally posted by Simon Anders View Post
          If you do that and compare the output of DESeq and cuffdiff from your data set, I'd be interested to hear about your findings.

          Simon
          Ok, here is a portion of the output from my experiment (in this case comparing cortical and striatal tissues).

          Columns named ds.something are from deseq and columns named cd.something are from cuffdiff. log2 means "log2 of fold change" and is_de means "is differentially expressed?". For cd.is_de is derived from the "significant" column in cuffdiff output (FDR = 0.05) whereas ds.is_de is set to TRUE if padj < 0.05. As I said before, deseq is picking up 5247 DE genes whereas cuffdiff "only" 2581. The problem now is... which one do I believe?

          p.s.: count tables were generated using htseq-count with the default params (except stranded was set to no). I am mapping using refseq GTF.

          Code:
                   gene_id sample_1 sample_2 ds.count_mean   ds.count_1   ds.count_2      ds.log2      ds.pval      ds.padj ds.is_de cd.rpkm_mean   cd.rpkm_1   cd.rpkm_2     cd.log2     cd.pval      cd.padj cd.is_de cd.status
          1  0610007C21Rik   WT_CTX   WT_STR    935.699817  912.9658495  958.4337850  0.070117869 6.769093e-01 8.434674e-01    FALSE  286.0375000  282.264000 289.8110000  0.03806722 7.12039e-01 1.000000e+00    FALSE        OK
          2  0610007L01Rik   WT_CTX   WT_STR   1077.903466 1126.8688274 1028.9381054 -0.131163388 3.868356e-01 6.107844e-01    FALSE   83.1274500   86.692100  79.5628000 -0.12380648 5.80477e-01 1.000000e+00    FALSE        OK
          3  0610007P08Rik   WT_CTX   WT_STR    361.737434  310.8019255  412.6729433  0.409003410 1.100391e-02 4.048507e-02     TRUE   18.0155000   15.706700  20.3243000  0.37182556 4.43234e-01 1.000000e+00    FALSE        OK
          4  0610007P14Rik   WT_CTX   WT_STR    538.174203  521.8831629  554.4652434  0.087370170 6.063084e-01 7.952604e-01    FALSE  129.7845000  125.109000 134.4600000  0.10399148 5.61750e-01 1.000000e+00    FALSE        OK
          5  0610007P22Rik   WT_CTX   WT_STR    255.162151  242.2781506  268.0461508  0.145817114 4.424584e-01 6.661605e-01    FALSE   65.9967000   64.061600  67.9318000  0.08462725 6.02477e-01 1.000000e+00    FALSE        OK
          6  0610008F07Rik   WT_CTX   WT_STR      0.000000    0.0000000    0.0000000          NaN           NA           NA       NA    0.0000000    0.000000   0.0000000         NaN 1.00000e+00 1.000000e+00    FALSE    NOTEST
          7  0610009B22Rik   WT_CTX   WT_STR    287.940410  269.6218230  306.2589960  0.183814950 2.882917e-01 5.029754e-01    FALSE  105.1805000   99.531000 110.8300000  0.15513061 4.36171e-01 1.000000e+00    FALSE        OK
          8  0610009D07Rik   WT_CTX   WT_STR    552.125775  574.8647477  529.3868027 -0.118900334 4.069566e-01 6.315958e-01    FALSE  131.2285000  136.009000 126.4480000 -0.10515790 5.55169e-01 1.000000e+00    FALSE        OK
          9  0610009O20Rik   WT_CTX   WT_STR    620.064119  631.2186041  608.9096334 -0.051911589 7.072760e-01 8.625523e-01    FALSE   75.0930500   78.360800  71.8253000 -0.12564001 5.75646e-01 1.000000e+00    FALSE        OK
          10 0610010B08Rik   WT_CTX   WT_STR    153.249086  177.7929930  128.7051785 -0.466128365 1.905782e-02 6.407251e-02    FALSE    0.1193041    0.154184   0.0844243 -0.86892284 3.12403e-01 9.433862e-01    FALSE        OK
          11 0610010F05Rik   WT_CTX   WT_STR    553.428073  649.6641055  457.1920414 -0.506893707 8.304265e-04 4.341557e-03     TRUE   37.3380500   43.441700  31.2344000 -0.47594474 1.59650e-01 6.200598e-01    FALSE        OK
          12 0610010K06Rik   WT_CTX   WT_STR      0.000000    0.0000000    0.0000000          NaN           NA           NA       NA    0.0000000    0.000000   0.0000000         NaN 1.00000e+00 1.000000e+00    FALSE    NOTEST
          13 0610010K14Rik   WT_CTX   WT_STR    424.381595  393.6500420  455.1131489  0.209311636 2.079545e-01 4.008593e-01    FALSE  154.6805000  147.346000 162.0150000  0.13691949 4.04846e-01 1.000000e+00    FALSE        OK
          14 0610010O12Rik   WT_CTX   WT_STR    326.885047  235.0416320  418.7284612  0.833098664 1.029962e-06 9.647450e-06     TRUE  166.2965000  124.019000 208.5740000  0.74999817 4.54720e-06 5.504784e-05     TRUE        OK
          15 0610011F06Rik   WT_CTX   WT_STR    284.057229  325.3490038  242.7654545 -0.422424995 1.389533e-02 4.926620e-02     TRUE   84.6365500   99.345800  69.9273000 -0.50660317 2.44772e-02 1.448273e-01    FALSE        OK
          16 0610011L14Rik   WT_CTX   WT_STR    808.657660  775.6992835  841.6160366  0.117664726 4.245830e-01 6.487025e-01    FALSE   65.6863500   63.549800  67.8229000  0.09388489 7.09336e-01 1.000000e+00    FALSE        OK
          17 0610012G03Rik   WT_CTX   WT_STR    779.810482  831.9280844  727.6928788 -0.193129128 1.844250e-01 3.692676e-01    FALSE  153.7870000  165.151000 142.4230000 -0.21360356 1.95402e-01 7.102127e-01    FALSE        OK
          18 0610012H03Rik   WT_CTX   WT_STR      0.695231    0.8933491    0.4971129 -0.845650475 7.758772e-01 9.064772e-01    FALSE    0.0000000    0.000000   0.0000000         NaN 1.00000e+00 1.000000e+00    FALSE    NOTEST
          19 0610030E20Rik   WT_CTX   WT_STR    310.523656  277.3059886  343.7413225  0.309844522 6.718424e-02 1.744929e-01    FALSE   18.1972000   16.034600  20.3598000  0.34453502 4.74456e-01 1.000000e+00    FALSE        OK
          20 0610031J06Rik   WT_CTX   WT_STR    700.901890  694.8920664  706.9117126  0.024741136 8.712990e-01 9.675598e-01    FALSE  130.7375000  129.688000 131.7870000  0.02316307 8.96752e-01 1.000000e+00    FALSE        OK
          21 0610037L13Rik   WT_CTX   WT_STR    487.195039  526.9240726  447.4660056 -0.235817007 1.236183e-01 2.758822e-01    FALSE   86.2563500   94.855100  77.6576000 -0.28859822 1.91155e-01 7.009124e-01    FALSE        OK
          22 0610037P05Rik   WT_CTX   WT_STR    584.447956  614.6863103  554.2096011 -0.149418651 3.179981e-01 5.364983e-01    FALSE  117.7430000  123.556000 111.9300000 -0.14256831 4.48852e-01 1.000000e+00    FALSE        OK
          23 0610038B21Rik   WT_CTX   WT_STR     12.200731   14.3718580   10.0296031 -0.518982068 2.017784e-01 3.930761e-01    FALSE    4.1970300    4.104560   4.2895000  0.06358192 7.10849e-01 1.000000e+00    FALSE        OK
          24 0610039K10Rik   WT_CTX   WT_STR      2.605257    2.6718016    2.5387117 -0.073716344 1.000000e+00 1.000000e+00    FALSE    2.7197650    2.585730   2.8538000  0.14231259 5.31952e-03 3.906288e-02     TRUE        OK
          25 0610040B10Rik   WT_CTX   WT_STR     28.625780   27.7157017   29.5358574  0.091763956 8.204038e-01 9.351362e-01    FALSE   18.7165000   18.332400  19.1006000  0.05922229 9.00091e-01 1.000000e+00    FALSE    NOTEST
          26 0610040J01Rik   WT_CTX   WT_STR     73.746550   76.0438579   71.4492426 -0.089913007 7.063682e-01 8.620646e-01    FALSE   11.1230000   11.760400  10.4856000 -0.16552771 7.87046e-01 1.000000e+00    FALSE    NOTEST
          27 0910001L09Rik   WT_CTX   WT_STR    622.792970  624.5168912  621.0690494 -0.007986919 9.604603e-01 1.000000e+00    FALSE  199.9800000  203.704000 196.2560000 -0.05373755 7.09578e-01 1.000000e+00    FALSE        OK
          28     100043387   WT_CTX   WT_STR      0.000000    0.0000000    0.0000000          NaN           NA           NA       NA    3.1342550    3.984580   2.2839300 -0.80290923 1.00267e-01 4.438044e-01    FALSE        OK
          29 1100001G20Rik   WT_CTX   WT_STR      0.000000    0.0000000    0.0000000          NaN           NA           NA       NA    0.0000000    0.000000   0.0000000         NaN 1.00000e+00 1.000000e+00    FALSE    NOTEST
          30 1110001A16Rik   WT_CTX   WT_STR    108.268572  106.1078410  110.4293022  0.057591769 8.087825e-01 9.269120e-01    FALSE   33.5556000   33.307700  33.8035000  0.02131690 8.88356e-01 1.000000e+00    FALSE        OK
          31 1110001J03Rik   WT_CTX   WT_STR    380.915474  384.6957999  377.1351472 -0.028636469 8.432276e-01 9.499539e-01    FALSE  271.8160000  279.029000 264.6030000 -0.07658565 5.36175e-01 1.000000e+00    FALSE        OK
          32 1110002B05Rik   WT_CTX   WT_STR    403.212109  340.7319387  465.6922791  0.450739778 5.357559e-03 2.194162e-02     TRUE   97.0667500   80.364500 113.7690000  0.50147725 1.70380e-02 1.068005e-01    FALSE        OK
          33 1110002L01Rik   WT_CTX   WT_STR     55.458279   49.6295732   61.2869855  0.304380699 2.101707e-01 4.042917e-01    FALSE    9.3329900    9.067020   9.5989600  0.08224964 3.85805e-01 1.000000e+00    FALSE        OK
          34 1110002N22Rik   WT_CTX   WT_STR     82.380559   70.5097791   94.2513397  0.418689765 7.379767e-02 1.881942e-01    FALSE   17.7519000   15.137900  20.3659000  0.42799049 3.82018e-01 1.000000e+00    FALSE    NOTEST
          35 1110003E01Rik   WT_CTX   WT_STR   1789.809936 1608.3440643 1971.2758077  0.293553575 3.691183e-02 1.097126e-01    FALSE  236.3740000  210.537000 262.2110000  0.31665441 1.76991e-02 1.102483e-01    FALSE        OK
          36 1110004E09Rik   WT_CTX   WT_STR    497.160323  519.9127593  474.4078860 -0.132141570 3.439250e-01 5.632470e-01    FALSE  123.2995000  131.067000 115.5320000 -0.18201199 3.22852e-01 9.607464e-01    FALSE        OK
          37 1110004F10Rik   WT_CTX   WT_STR   1392.103405 1387.2404634 1396.9663464  0.010079381 9.789078e-01 1.000000e+00    FALSE  246.9975000  245.782000 248.2130000  0.01419941 9.12898e-01 1.000000e+00    FALSE        OK
          38 1110005A03Rik   WT_CTX   WT_STR    213.408256  190.6752001  236.1413124  0.308533247 9.350721e-02 2.247199e-01    FALSE   54.6171000   49.285500  59.9487000  0.28256521 3.08383e-01 9.374689e-01    FALSE    NOTEST
          39 1110006O24Rik   WT_CTX   WT_STR     12.761912   12.3603136   13.1635104  0.090828926 9.489838e-01 1.000000e+00    FALSE    4.9958700    4.896020   5.0957200  0.05767656 9.49631e-01 1.000000e+00    FALSE    NOTEST
          40 1110007A13Rik   WT_CTX   WT_STR    513.407035  463.1737657  563.6403051  0.283221240 6.890213e-02 1.780701e-01    FALSE   33.9980500   30.795500  37.2006000  0.27260634 4.37987e-01 1.000000e+00    FALSE        OK
          41 1110007C09Rik   WT_CTX   WT_STR     98.760385   78.4237556  119.0970150  0.602774618 7.425269e-03 2.895403e-02     TRUE   32.9244000   26.300200  39.5486000  0.58855285 1.04939e-01 4.589295e-01    FALSE    NOTEST
          42 1110008F13Rik   WT_CTX   WT_STR    282.709685  306.5951622  258.8242068 -0.244362380 1.500948e-01 3.175511e-01    FALSE  227.0865000  251.782000 202.3910000 -0.31503001 1.95787e-02 1.202755e-01    FALSE        OK
          43 1110008J03Rik   WT_CTX   WT_STR    292.881715  332.6363462  253.1270830 -0.394083941 1.961378e-02 6.565629e-02    FALSE   51.4404000   60.036700  42.8441000 -0.48674813 9.16539e-02 4.145877e-01    FALSE        OK
          44 1110008L16Rik   WT_CTX   WT_STR    127.199331  121.4666759  132.9319864  0.130127723 5.150700e-01 7.199462e-01    FALSE   11.6279500   11.282100  11.9738000  0.08584545 7.94210e-01 1.000000e+00    FALSE        OK
          45 1110008P14Rik   WT_CTX   WT_STR   1563.010576 2457.9200664  668.1010858 -1.879299690 6.665296e-41 6.156280e-39     TRUE  727.4040000 1150.210000 304.5980000 -1.91691891 0.00000e+00 0.000000e+00     TRUE        OK
          46 1110012D08Rik   WT_CTX   WT_STR    122.782896  105.5299740  140.0358170  0.408143043 5.517697e-02 1.496728e-01    FALSE    0.0000000    0.000000   0.0000000         NaN 1.00000e+00 1.000000e+00    FALSE      FAIL
          47 1110012J17Rik   WT_CTX   WT_STR   3075.319187 2437.0331756 3713.6051975  0.607694556 7.142308e-06 5.838703e-05     TRUE  113.3399500   92.707900 133.9720000  0.53116732 6.42377e-03 4.619319e-02     TRUE        OK
          48 1110012L19Rik   WT_CTX   WT_STR    125.530325  135.6056124  115.4550383 -0.232085758 2.490602e-01 4.554857e-01    FALSE   31.8994500   34.773000  29.0259000 -0.26062674 4.72427e-01 1.000000e+00    FALSE    NOTEST
          49 1110014N23Rik   WT_CTX   WT_STR   1240.024162 1284.7308428 1195.3174809 -0.104072285 4.551265e-01 6.782383e-01    FALSE  131.4685000  136.553000 126.3840000 -0.11164718 4.97922e-01 1.000000e+00    FALSE        OK
          50 1110017D15Rik   WT_CTX   WT_STR     43.093988   28.0077011   58.1802758  1.054706564 2.199708e-04 1.333690e-03     TRUE   16.7687500   10.869500  22.6680000  1.06037153 4.65028e-02 2.447505e-01    FALSE    NOTEST

          Comment


          • #6
            Originally posted by marcora View Post
            The problem now is... which one do I believe?
            As a test, I ran cuffdiff and deseq comparing, e.g., cortical tissue to cortical tissue (instead of cortical tissue to striatal tissue like in the aforementioned experiment). Interestingly, deseq is more parsimonious than cuffdiff, finding only 30 differentially expressed genes vs the 300 reported by cuffdiff.
            In this case, of course, I would expect very few differences (due to biological variability and false positives) to be reported.

            Comment


            • #7
              tophat, cufflink, cuffcompare, and cuffdiff

              Originally posted by marcora View Post
              Thanx Simon, I am still a newbie to R!

              FYI: I am comparing WT vs KO with 4 biological replicates each, one run for each replica at 25-30 million reads each
              I have similar samples as yours. I don't know how to deal with the replicates with Tophat and Cufflink. Can you share your detail command lines for the tophat, cufflink, cuffcompare, and cuffdiff?

              Thank you for the help
              R

              Comment


              • #8
                Originally posted by Robin View Post
                I have similar samples as yours. I don't know how to deal with the replicates with Tophat and Cufflink. Can you share your detail command lines for the tophat, cufflink, cuffcompare, and cuffdiff?
                The only tool where you can input replicates is cuffdiff. I use the following bash script to run it (but it is build based on my personal convention of naming and organizing the input files). In brief, you just need to input replicates as paths to the corresponding 'accepted_hits.bam' files (generated by tophat) separated by commas. You can also specify labels (with the -L optional argument, for example, WT for wild-type and MU for mutant) to name your replicate groups. The labels too need to be separated by commas. All the options etc are well explained in the cuffdiff manual.

                Code:
                #!/bin/bash
                 
                GENOME=mm9
                 
                WT=$(find . -name 'accepted_hits.bam' | grep WT_ | tr '\n' ',' | sed 's/,$//g')
                MU=$(find . -name 'accepted_hits.bam' | grep MU_ | tr '\n' ',' | sed 's/,$//g')
                 
                mkdir cuffdiff
                cuffdiff -p 8 --library-type fr-unstranded -N -r $BOWTIE_INDEXES/$GENOME.fa cuffcompare.combined.gtf -L WT,MU $WT $MU -o cuffdiff

                Comment


                • #9
                  tophat, cufflink, cuffcompare, and cuffdiff

                  Hi Marcora:

                  Thanks for your reply.
                  I understand that only cuffdiff can take the replicates. You still run the Tophat to genereate about 8 accepted_hits.bam files for WT and KO of the four biological replicates. Then, you run the cufflink to generated 8 .gtf files, and the cuffcompare to generate a combined.gft file. Am I right?

                  Did you using the UCSC reflat table for the supplied reference annotation GTF file? I noticed your output with gene_id.

                  R

                  Comment


                  • #10
                    Originally posted by Robin View Post
                    Hi Marcora:

                    Thanks for your reply.
                    I understand that only cuffdiff can take the replicates. You still run the Tophat to genereate about 8 accepted_hits.bam files for WT and KO of the four biological replicates. Then, you run the cufflink to generated 8 .gtf files, and the cuffcompare to generate a combined.gft file. Am I right?
                    You are correct. I use the transcripts.gtf files (only one of them really, since I don't do de-novo transcript discovery but only use a reference annotation) from cufflinks to generate a cuffcompare.combined.gtf file, which I then feed to cuffdiff to detect differential expression. However, I also use the DESeq R package to detect differential expression between WT and KO.

                    Originally posted by Robin View Post
                    Did you using the UCSC reflat table for the supplied reference annotation GTF file? I noticed your output with gene_id.
                    R
                    I initially use the UCSC refFlat table because I was having troubles making tophat swallow the ENSEMBL annotation. Please refer to this thread for a solution to those problems.

                    Hope this helps!

                    Comment


                    • #11
                      Hi Marcora;

                      I would also like to use DESeq package to detect differential expression between my treated and control samples. However, I am a bit confused as to how to convert the FKPM values in the Cuffcompare output into raw read counts. Could you please explain how you did this?

                      Many thanks,

                      Seanna

                      Comment


                      • #12
                        Originally posted by SMcTaggart View Post
                        Hi Marcora;

                        I would also like to use DESeq package to detect differential expression between my treated and control samples. However, I am a bit confused as to how to convert the FKPM values in the Cuffcompare output into raw read counts. Could you please explain how you did this?
                        You don't need cufflinks/cuffcompare output to calculate the raw read counts, just tophat (or another read mapper) output and an annotation file (containing the features you want to apply the count on). Just use htseq-count or bedtools. I personally used htseq-count, which you can find at http://www-huber.embl.de/users/ander...doc/count.html, and I feed it the accepted_hits.bam from tophat converted into sam (unfortunately htseq-count at the moment doesn't handle bam files as far as I know) and the ensembl genome annotation in GTF format.

                        Hope this helps!

                        Comment


                        • #13
                          Hi Marcora;

                          Thank you very much for your rapid reply- I really appreciate it. Your suggested strategy is great!

                          Cheers,

                          Seanna

                          Comment


                          • #14
                            Hi guys,
                            I ran the Cuffdiff to test for DE genes. I have 2 samples with 6 replicates in total. Now I want to run DESeq to compare the result. But I dont know how to generate a count table input file for DEseq. I got 6 BAM file from TopHat output. How can I count mapped read using Htseq-count? Should I count each BAM file separately and then merge them into 1 file for DESeq's input. And how to do that?
                            Thanx so much
                            Thanh

                            Comment

                            Latest Articles

                            Collapse

                            • seqadmin
                              Recent Advances in Sequencing Analysis Tools
                              by seqadmin


                              The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
                              05-06-2024, 07:48 AM
                            • seqadmin
                              Essential Discoveries and Tools in Epitranscriptomics
                              by seqadmin




                              The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                              04-22-2024, 07:01 AM

                            ad_right_rmr

                            Collapse

                            News

                            Collapse

                            Topics Statistics Last Post
                            Started by seqadmin, Today, 07:03 AM
                            0 responses
                            10 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 05-10-2024, 06:35 AM
                            0 responses
                            31 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 05-09-2024, 02:46 PM
                            0 responses
                            41 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 05-07-2024, 06:57 AM
                            0 responses
                            33 views
                            0 likes
                            Last Post seqadmin  
                            Working...
                            X