Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Cuffcompare with a single experiment

    I am running RNA-seq analysis on a paired-end deep sequencing data set with no replicates. We are interested in finding novel gene and transcript isoforms in addition to variant info. Grooming and Tophat alignment went well and I’ve processed the .bam output through cufflinks in RABT mode with –GTF-guide. I then take the .gtf output from this and run cuffcompare with the reference .gtf and .fasta.

    I am experiencing confusion related to the last step and was hoping that somebody with more experience than I could help to clarify a few things.

    Firstly, most of the references I have read regading cuffcompare indicate that it is used for multiple replicates or experiments: “Used to Track Cufflinks transcripts across multiple experiments (e.g. across a time course)”. Is it common to use cuffcompare on a single experiment in order to find novel isoforms?

    Secondly, there are some entries in the output from cuffcompare that aren’t making sense to me. What does it mean when I see an "=" class code with a zero FMI? How about a "j" class code with a FMI of 100? Based on the definition of FMI (fraction of major isoform), these scenarios don't seem possible.

    Thirdly, if I want an fpkm score for a known gene, is it common to sum all transcript fpkms belonging to that gene with an "=" class code?







    Thanks so much for any help, and let me know if I can/should provide more information!



    -Jeremy

  • #2
    Originally posted by reventropy View Post
    Firstly, most of the references I have read regading cuffcompare indicate that it is used for multiple replicates or experiments: “Used to Track Cufflinks transcripts across multiple experiments (e.g. across a time course)”. Is it common to use cuffcompare on a single experiment in order to find novel isoforms?
    Depends on your definition of "common". There's no technical reason you can't (I certainly have). Usually people use the cuffcompare output as the guide file for cuffdiff. The former gives you the union set of transcripts, the latter then looks for differential expression in those transcripts.

    Secondly, there are some entries in the output from cuffcompare that aren’t making sense to me. What does it mean when I see an "=" class code with a zero FMI? How about a "j" class code with a FMI of 100? Based on the definition of FMI (fraction of major isoform), these scenarios don't seem possible.
    cuffcompare outputs all the transcripts it finds, or is told are real (exist in the guide file). "=" transcripts exist in the guide file, so are output even if there's no support for their existence. It's not clear why you think a j class transcript cannot have an FMI of 100.


    Thirdly, if I want an fpkm score for a known gene, is it common to sum all transcript fpkms belonging to that gene with an "=" class code?
    Summing fpkms is fine, but you should include novel transcripts, or rerun cufflinks without novel transcript finding.

    Comment


    • #3
      Thanks a lot mikep!

      It's not clear why you think a j class transcript cannot have an FMI of 100.
      This is probably owing to my flawed reasoning.

      I was operating under the assumption that major isoforms come from the annotation file and cannot be novel. If I see an FMI of 100 and a "j" class code then should I assume that Cufflinks identified the man isoform as being novel, i.e., a novel gene?

      Thanks again for addressing my questions so that I can proceed with more confidence.

      -Jeremy

      Comment


      • #4
        Originally posted by reventropy View Post
        If I see an FMI of 100 and a "j" class code then should I assume that Cufflinks identified the man isoform as being novel, i.e., a novel gene?

        -Jeremy
        Your interpretation is correct.

        Thanks again for addressing my questions so that I can proceed with more confidence.
        I would be very careful being confident in novel isoforms from cufflinks, it has a pretty high error rate. You haven't mentioned which organism you are working with but if it is human or one of the model organisms you might be better off with using just the existing annotation. If it is a few genes you care about I'd load the cufflinks output & BAM file into a genome viewer and have a look at the actual reads.

        Comment


        • #5
          I would be very careful being confident in novel isoforms from cufflinks, it has a pretty high error rate. You haven't mentioned which organism you are working with but if it is human or one of the model organisms you might be better off with using just the existing annotation. If it is a few genes you care about I'd load the cufflinks output & BAM file into a genome viewer and have a look at the actual reads.
          I'll definitely keep that in the front of my mind. The sequencing is human. I have been using IGV, but am still training my eye. We're only interested in coding genes so I will be filtering the cuffdiff output, but we would like to catch any novel transcripts or gene isoforms in this subset.

          -Jeremy

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Strategies for Sequencing Challenging Samples
            by seqadmin


            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
            03-22-2024, 06:39 AM
          • seqadmin
            Techniques and Challenges in Conservation Genomics
            by seqadmin



            The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

            Avian Conservation
            Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
            03-08-2024, 10:41 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 03-27-2024, 06:37 PM
          0 responses
          12 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-27-2024, 06:07 PM
          0 responses
          11 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-22-2024, 10:03 AM
          0 responses
          53 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-21-2024, 07:32 AM
          0 responses
          69 views
          0 likes
          Last Post seqadmin  
          Working...
          X