Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • cuffmerge produces too many isoforms

    Hey all,

    I'll try to quickly explain my problem.
    I am working on rice, which is poorly annotated, and following the Tuxedo Nat Meth paper form Cole.
    I have 120 RNAseq samples (which have been ribotreated), and use cuffmerge, to produce a unique GTF file.

    The problem is that cuffmerge produces too many of isoforms, which results in very low number of differentially expressed genes after cuffdiff

    (quote form Cole: In general, the more isoforms a gene has, the more uncertainty there will be in assigning reads to each isoform, and the more uncertainty there will be in the overall gene expression level. That means more variance, so if you have a ton of isoforms (possibly because of a bad assembly), you'll see very few differentially expressed genes.)


    Do you have any solutions for that, apart from using the poorly annotated original gtf file ?

    Thanks a lot.

    David

  • #2
    Hi David,

    So you're doing the RABT assembly I assume? In my experience the RABT assembly is way over aggressive and shouldn't be relied on as the only source of gene annotation. Especially when you have so many samples.

    Something that would be fairly easy to pass the cuffmerge file through to Maker2, along with the reference annotations and create a new genome annotation file that way. Then just pass your external annotation file through cuffmerge for cuffdiff.

    Comment


    • #3
      Hi Wallysb01

      Thanks for the tip. I ll try Maker2, and I'll try to reduce the number of sample to perform the RABT assembly, hopefully it will decrease the number of isoforms assembled.

      Cheers

      David

      Comment


      • #4
        Actually, just passing it through the cufflinks2gff3 script for use with Mater will decrease the number of isoforms and genes. If you actually want to run Maker itself, you'll likely get ride of all the isoforms, or be left with maybe 1.25/gene depending on how you run it. If you're concerned about keeping as many real isoforms as you can, you'll probably have to do some de novo transcriptome assembly and use PASA. Or, you'll have to try to real in the RABT assembly greatly. I remember the -j option was fairly critical for me, and used .35 instead of .1.

        Comment


        • #5
          Hi Wallysb01,
          I am having similar issues with Cuffmerge.
          I am looking at changing the -F and -j options. Was -j 0.35 optimal for your data? Did you try a range of values? Did you also play with the default -F option?

          Any tips greatly appreciated.

          Comment


          • #6
            I never did parameter sweeps with -j. I simply increased it the one time and compared to the original run on the defaults and was happy with the results. I think increasing -F could be a good option too, but I'm not sure I would go higher than around .25 myself, though I never tried it.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Essential Discoveries and Tools in Epitranscriptomics
              by seqadmin




              The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
              04-22-2024, 07:01 AM
            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, Today, 08:47 AM
            0 responses
            12 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            60 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            59 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 09:21 AM
            0 responses
            54 views
            0 likes
            Last Post seqadmin  
            Working...
            X