Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Cufflinks Dropping Transcripts

    Hello all,

    I am currently using cufflinks version 2.0.2 to analyze a set of pig RNAseq data. I am running cufflinks with the -g option using the Ensembl gtf for the latest build. When I do a count of the unique gene names in the ensembl gtf, there are 25,009 genes. I run the data through cufflinks using the default settings utilizing the -g option and when I check the output gtf, I only get 23,917 unique Ensembl gene names.

    The documentation says that cufflinks will include all genes in the gtf in the output.
    Tells Cufflinks to use the supplied reference annotation (GFF) to guide RABT assembly. Reference transcripts will be tiled with faux-reads to provide additional information in assembly. Output will include all reference transcripts as well as any novel genes and isoforms that are assembled.
    My question is this: Why would cufflinks be dropping these genes? Are there built in settings in cufflinks that would cause it to drop genes from different regions?

    To check to make sure that these genes did indeed have reads aligning to the region they were in, I ran htseq using the tophat output and the original Ensembl gtf file and get read counts to those genes.

  • #2
    We have found, in human as well as several other model systems, that Cufflinks will "trip" and not assemble transcripts (and, surprisingly, not even include the ones in the reference!) when there are too many reads in the region. Basically, I think it just gives up and falls over when trying to untangle the De Bruijn graph.

    I circumvent this by adding the reference gtf at the cuffmerge stage, using alternative tools for DE, and visually checking any calls that cufflinks is making in regions I am interested in by comparing the cufflinks output and the wiggle/ bamAsBed of the actual reads.

    I am also interested in WTF it is doing this and why. And am leaving this comment so you know you're not alone.

    Comment


    • #3
      Thanks for the info. We had assumed that for some reason, there were too many reads in the area and *cufflinks* was choking.

      Edit: Sorry, I was writing in a hurry and said tophat when I meant cufflinks. We know it's cufflinks choking because we have ran other analysis and know that there millions of reads in the particular region we were looking at, so tophat did align reads to that region.
      Last edited by ercfrtz; 11-12-2012, 06:33 AM.

      Comment


      • #4
        tophat was choking.
        An easy way to see whether it's top or cuff that is choking is to visualize the data by doing either a bamToBed or bamToWiggle to see how many reads are being mapped to the offending loci. I do this with bedtools or RSeQC + ucsc tools.

        Comment


        • #5
          Originally posted by dvanic View Post
          visualize the data by either a bamToBed or bamToWiggle to see how many reads are being mapped to the offending loci. I do this with bedtools or RSeQC + ucsc tools.
          Or, more simply, drag and drop the BAM file into IGV browser and look at the locus.

          Comment


          • #6
            drag and drop the BAM file into IGV browser and look at the locus.
            Yup, also works. I just upload both the tophat wiggles and the cufflinks gtfs and compare them visually in UCSC with known annotations + features. You could do this in IGV as well.

            Comment


            • #7
              Check out the --max-bundle-frags option for cufflinks. I don't know why anyone would need the option to skip loci due to excessive coverage but they put it in there. Maybe raising that value will fix the issue...or maybe cufflinks will explode. It says in there that skipped loci are reported in the skipped.gtf file but I've never seen anything in that file even though every time I run it it skips at least one bundle.
              /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
              Salk Institute for Biological Studies, La Jolla, CA, USA */

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Essential Discoveries and Tools in Epitranscriptomics
                by seqadmin




                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                04-22-2024, 07:01 AM
              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Yesterday, 08:47 AM
              0 responses
              16 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              60 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              60 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 09:21 AM
              0 responses
              54 views
              0 likes
              Last Post seqadmin  
              Working...
              X