Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • skipping cufflinks-->cuffcompare ... straight to cuffdiff?

    I see that most workflows includes tophat --> cufflinks --> cuffcompare --> cuffdiff.

    If I want to perform differential expression analysis on RNASEQ samples based on a known annotation (e.g. Ensembl GTF), can I simply do tophat --> cuffdiff (with the known gtf)?

    What would be the difference if I were to do tophat --> cufflinks --> cuffcompare and use that output gtf in cuffdiff?

  • #2
    depends on your experimental design

    I compared several methods for DE with a 12 vs 12 paired data-set and found cuffdiff to produce by far the fewest significant genes.

    the ascending order was:
    cuffdiff
    Noiseq
    DESeq
    baySeq
    edgeR
    npSeq
    SAMseq
    poissonSeq

    therefore, if you have a design with biological replicates, every approach beside cuffdiff seems to be more adequate...

    Comment


    • #3
      Originally posted by dietmar13 View Post
      I compared several methods for DE with a 12 vs 12 paired data-set and found cuffdiff to produce by far the fewest significant genes.

      the ascending order was:
      cuffdiff
      Noiseq
      DESeq
      baySeq
      edgeR
      npSeq
      SAMseq
      poissonSeq

      therefore, if you have a design with biological replicates, every approach beside cuffdiff seems to be more adequate...
      Just because something produces a shorter list of genes doesn't mean it is a worse approach surely..

      Comment


      • #4
        of course,

        but I analysed the same biological question 12 vs 12 paired (colon cancer vs. normal tissue) with microarray and got ~6000 significant genes.

        I would say, 2 significant genes (as I got with cuffdiff) are a little-bit to few and useless for further examinations, thus worse than other approaches.

        I also compared the gene lists derived from the other approaches with the gene list which I got from microarrays, and there was no big difference concerning overlap (and I know, that microarray is not the truth).

        Furthermore, I estimated robustness of obtained lists with bootstrap validation, and got acceptable validations (even though decreasing values with increasing numbers of significant genes).

        therefore, I would say, all gene lists are more or less plausible, also regarding expected differences between cancer and normal tissue expressions.

        The only way to validate all genes for sure, would be to make RT-qPCR with the same samples with all genes...

        Comment


        • #5
          Thanks for elaborating, without figures or reference to a microarray experiment it was rather hard to take on faith

          Comment


          • #6
            I'm a bit skeptical about that Cuffdiff run - we've compared the lists produced by Cuffdiff against the lists produced by arrays run on *exactly* the same RNA, and found that not only does Cuffdiff return a superset of the genes returned by the array analysis, the Cuffdiff lists are highly concordant with DESeq and edgeR (like 90% overlap). Are you sure you're running Cuffdiff correctly, and are you using a recent version?

            Comment


            • #7
              used syntax for cuffdiff

              i have used cuffdiff coming with cufflinks 1.3.

              in $GTF is a gtf prepared the following way:
              (in $genome is a link to the chromosomes)

              cuffcompare -s $genome -CG -r Homo_sapiens.hg19.gtf Homo_sapiens.hg19.gtf

              cuffdiff -o $outdir -p 12 -N -u $GTF $DIR/SRR317086.sam,$DIR/SRR317087.sam,$DIR/SRR317088.sam,$DIR/SRR317089.sam,$DIR/SRR317090.sam,$DIR/SRR317091.sam,$DIR/SRR317092.sam,$DIR/SRR317093.sam,$DIR/SRR317094.sam,$DIR/SRR317095.sam,$DIR/SRR317096.sam,$DIR/SRR317097.sam $DIR/SRR317098.sam,$DIR/SRR317099.sam,$DIR/SRR317100.sam,$DIR/SRR317101.sam,$DIR/SRR317102.sam,$DIR/SRR317103.sam,$DIR/SRR317104.sam,$DIR/SRR317105.sam,$DIR/SRR317106.sam,$DIR/SRR317107.sam,$DIR/SRR317108.sam,$DIR/SRR317109.sam

              Comment


              • #8
                Hmm, it certainly looks OK. Do you have a lot of genes with status FAIL? Where'd you get the GTF?

                Also, how did you map the reads?

                Comment


                • #9
                  i created the gtf from an ENSEMBL gtf (to make it hg19 compatible, I added chr in front of chromosom numbers).

                  gawk 'BEGIN { FS = "\t"; OFS="\t" } ; $1 ~ /^([0-9]+|X|Y|MT)$/ { print "chr" $1 , $2 , $3 , $4 , $5 , $6 , $7 , $8 , $9 }' $in > cuffdiff/${out}.tmp

                  sed 's/chrMT/chrM/' cuffdiff/${out}.tmp > cuffdiff/${out}

                  rm cuffdiff/${out}.tmp

                  I mapped the reads with Tophat and used these mapped reads also for analysis with the other R-packages for DE (HTseq-count unique).

                  gene_exp.diff
                  11050 FAIL
                  27 HIDATA
                  75 LOWDATA
                  35526 NOTEST
                  2995 OK


                  why does it make so many tests (on gene basis or on transcript basis)?

                  Comment


                  • #10
                    Hi Cole,
                    I see something similar with my data too. cuffdiff produces very few significant (Significant: Yes) genes although I have not compared with other methods. I am using cufflinks v1.3.0 but the Tophat version I used was v1.1.4.

                    I ran Tophat more than an year ago and I do know that Tophat has evolved a lot from then. Do you think I would see a significant difference between the latest version of Tophat and v1.1.4?

                    gene_exp.diff
                    ---------------------------------
                    248975 FAIL
                    811 HIDATA
                    22270 LOWDATA
                    141410 NOTEST
                    134335 OK
                    80 YES (SIGNIFICANT)
                    Last edited by DineshCyanam; 03-01-2012, 09:34 AM.

                    Comment


                    • #11
                      cuffdiff 2.0.2 beta

                      i have now repeated my analysis with cuffdiff 2.0.2 beta and got even zero significant transcripts or genes (CDS). cuffdiff 1.3 found two genes (see above).

                      the design was a 12 versus 12 matched pairs experiment (normal colon mucosa vs. colon cancer tissue) with only median 2.5 mio reads per sample.

                      mapper: tophat.

                      cuffdiff was provided with an ENSEMBL gtf-file, and the analysis run without any error.

                      SAMseq, edgeR, and limma/voom found > 4,000 genes, DESeq > 2,500 using the raw digital count data (HTseq-count).

                      I think cuffdiff (1.3 and 2.0.2 beta) is not the right choice for statistical analysis of experimental designs with many disperse biological replicates and low reading depth.

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Strategies for Sequencing Challenging Samples
                        by seqadmin


                        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                        03-22-2024, 06:39 AM
                      • seqadmin
                        Techniques and Challenges in Conservation Genomics
                        by seqadmin



                        The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                        Avian Conservation
                        Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                        03-08-2024, 10:41 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, Yesterday, 06:37 PM
                      0 responses
                      10 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, Yesterday, 06:07 PM
                      0 responses
                      9 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 03-22-2024, 10:03 AM
                      0 responses
                      49 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 03-21-2024, 07:32 AM
                      0 responses
                      67 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X