Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Checking Cuffdiff

    I am using an interesting dataset to "test" differential isoform expression programs.

    Unfortunately, I am not an expert in every (any?) program, so I could use some sanity checking.

    I have 3 separate tissues, ABC. I want to use (in this case) cuffdiff to identify isoforms which are uniquely expressed in A/B/C, as I can use other "ground truth" runs to verify these claims.

    I ran the program as follows, alternating A, B, and C:
    Code:
     cuffdiff -p 8 -c 10 <ucsc.gtf> A1,A2,A3 B1,B2,B3,C1,C2,C3 -o outdir
    I'm not using a cufflinks-derived gtf or (exclusively) tophat-mapped reads. I imagine I'm doing it all wrong. I have two main questions:

    1) Can I get away with not using the entire cufflinks pathway here? (If not, why doesn't the program complain?)
    2) Am I properly comparing the 3 tissues? Does A vs B,C return transcripts DE in only A, as i intend it to?

  • #2
    Hello jparsons,

    I used cufflinks and cuffdiff with GSNAP alignments and it worked fine, so you do not need to stick to TopHat necessarily as long as the sam/bam-files have all required columns.
    However, I used the cufflinks -> cuffmerge -> cuffdiff variant to check my genes, since that way was suggested by the authors (but not very successful for me).

    After following some discussions in this forum, see
    http://seqanswers.com/forums/showthread.php?t=20702
    and
    http://seqanswers.com/forums/showthread.php?t=16528

    I concluded that cufflinks/cuffdiff have a problem in their correction for variance. For my analysis, the bigger my sample groups were, the fewer genes were found significantly DE until none were left. Therefore I assume that pooling group B and C will result in a similar problem due to high variance between both groups.

    Besides that, your command looks fine, so please keep us posted on your progress.

    Comment


    • #3
      Rboettcher,

      Thanks for the response. I eventually compared the output from tophat->cufflinks->cuffmerge->cuffdiff to that from only cuffdiff and found that they were (mostly) identical. I am content using cuffdiff without going through the entire pipeline.

      I got results for cuffdiff and finally managed to get RSEM to like me for long enough to spit out quantitations. When compared to the "truth" set (sadly only available on the gene level for now), the RSEM/cuffdiff lists are 'decent' individually, coming close to the expected ratio on average, but having numerous outliers. Taking the overlap set of genes called by both RSEM and cuffdiff makes for a much cleaner picture, with far less deviation from the ratio, and fewer false positives.

      I'm still working on making metrics that make sense, so 'decent' and 'cleaner' is the best i can offer for now. I imagine I will develop permissive and restrictive "true positive" lists at each ratio and then generate ROCs for each algorithm I can successfully test.

      I'm currently worried about algorithms making calls for downregulated genes or calling them as differentially expressed in cases where the assumption that "A>>B+C or A<<B+C" doesn't hold. I don't know how to handle that yet, and it may be the source of the outliers I mentioned before.

      Overall, I am actually impressed with cuffdiff's performance, given how much grief it gets here. Neither algorithm is even remotely perfect, neither is obviously superior.

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Advancing Precision Medicine for Rare Diseases in Children
        by seqadmin




        Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
        12-16-2024, 07:57 AM
      • seqadmin
        Recent Advances in Sequencing Technologies
        by seqadmin



        Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

        Long-Read Sequencing
        Long-read sequencing has seen remarkable advancements,...
        12-02-2024, 01:49 PM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 12-17-2024, 10:28 AM
      0 responses
      33 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 12-13-2024, 08:24 AM
      0 responses
      48 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 12-12-2024, 07:41 AM
      0 responses
      34 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 12-11-2024, 07:45 AM
      0 responses
      46 views
      0 likes
      Last Post seqadmin  
      Working...
      X