Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • cufflinks filter output gtf

    I have been running cufflinks on a number of RNA-seq files to find novel transcripts/isoforms. I sequenced fairly deeply and it seems to be calling a ton of novel, short, single exon transcripts, a number of which I think are junk. Do people usually filter these out before running cuffmerge and cuffcompare? Do you just write a small script to manually filter these?
    Thanks

  • #2
    I have the same question. waiting ...

    Comment


    • #3
      Have you tried constraining your cufflinks assembly parameters? You could increase the -F and -j options (min-isoform-fraction and pre-mrna-fraction). But its been my general experience that if you’re looking for novel isoforms with just the cufflinks RABT assembly, you’re going to have to filter through a lot of junk no matter what you do with cufflinks.

      You can also start doing things a little more sophisticated with maker. They have nice way of handling cufflinks assemblies and potentially creating a “reannotation” with them. Check out: http://gmod.org/wiki/MAKER_Tutorial

      Another option is the PASA pipeline: http://pasa.sourceforge.net. Neither Maker or PASA may not really be easy to get up and running for the casual bioinformatic tool user but they aren’t that bad either.

      Comment


      • #4
        I generally only keep those of the new transcripts that have class_code "j". There might be something real in the other classes, but too many of them do not look like they're real.

        Concerning the -F and -j options: I tested those quite extensively but in my experience you just get more pre-mRNA in the transcripts.

        I have found that it matters how you filter your reads before aligning. If you have paired-end reads, only use the fragments where both reads map for building the transcripts.

        I have no experience with MAKER or PASA, but they look interesting. Thanks for the pointer.

        Comment


        • #5
          Originally posted by jake13 View Post
          I have been running cufflinks on a number of RNA-seq files to find novel transcripts/isoforms. I sequenced fairly deeply and it seems to be calling a ton of novel, short, single exon transcripts, a number of which I think are junk. Do people usually filter these out before running cuffmerge and cuffcompare? Do you just write a small script to manually filter these?
          Thanks
          Hi, do you have any idea about why cufflinks gives a large number of predicted transcripts and how to filter the result now? Thank you.

          Comment


          • #6
            Originally posted by 11xinqi View Post
            Hi, do you have any idea about why cufflinks gives a large number of predicted transcripts and how to filter the result now? Thank you.
            I don't really know why this happens (It does in my data as well. I suspect it is influenced by library construction method and any adapter and rRNA contamination that may be present.). But here is one way I have seen people dealing with it: http://www.ncbi.nlm.nih.gov/pubmed/23237380. Basically they look at the distributions of the "c" and "=" class codes, and based on the hypothesis that artifacts (an unknown subset of j,o,x,u etc) and partially assembled transcripts (c) have separate FPKM distributions (generally lower values) than perfectly assembled transcripts (=), they build a simple classifier that labels transcripts as artifacts if they have an FPKM lower than a certain threshold. The output GTF from cufflinks is then filtered using this threshold as a cutoff.

            This is another approach (FRFE), which according to the authors perform better with respect to single-exon transcripts: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3882232/.

            If any of you know any other approaches, I am certainly interested in them.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Essential Discoveries and Tools in Epitranscriptomics
              by seqadmin




              The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
              04-22-2024, 07:01 AM
            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, Yesterday, 11:49 AM
            0 responses
            13 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-24-2024, 08:47 AM
            0 responses
            16 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            61 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            60 views
            0 likes
            Last Post seqadmin  
            Working...
            X