Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Tracking_id in cufflinks

    Hi everyone,

    I have gone through the "tuxedo" RNA-seq analysis pipeline for rice data, and anything seemed to work well. But I am not sure how the "tracking_id" was generated, and how the coordinates were assigned for each tracking_id. Does the "tracking_id" come from the transcriptome assembly using the RNA-seq data?

    I can find the "gene_short_name" for most of the "tracking_id" in the file "genes.fpkm_tracking", because of the gene annotation. But when I checked the corresponding coordinates for each "tracking_id" in the file "genes.fpkm_tracking", the coordinates seemed to be different from those of annotated genes by the genome sequencing consortium. So, how does this happen? Does it mean that the initial gene annotation was incorrect? How to solve this discrepancy when you have similar issues?

    I really hope to get answers from cufflinks experts. Thank you for your patience!
    Last edited by 413fei; 10-13-2014, 05:59 PM.

  • #2
    I can't be sure if this is related to the issue you mention but cufflinks has to guess at the ends of isoforms so when it generates a de-novo isoform the extents of the 3' and 5' ends are not really biologically relevant. They are estimated based on pileup height and an arbitrary threshold (set in the run-time options). That and when cufflinks joins isoforms into loci for the gene level summary files it does so based on which bundles of reads are evaluated together. Sometimes that means adjacent and partially overlapping gene loci will be merged into a single loci thus resulting in some unexpected genomic boundaries.
    /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
    Salk Institute for Biological Studies, La Jolla, CA, USA */

    Comment


    • #3
      Hi Shawn- thank you for the reply! Your answer is helpful to me in understanding how cufflinks works in generating a boundary for each de novo isoform.

      I have another related question. How should I calculate FPKM values for an arbitrary region, which does not have the same boundary with the ones determined by cufflinks? I guess in many cases, we need to calculate the FPKM value using our own coordinates. Is cufflinks, or other tools, flexible enough to calculate the FPKM values given a certain region using the output files of cufflinks?

      Comment


      • #4
        Not sure if this helps but you can use cufflinks with -G and give it a GTF type gene annotation then cufflinks will run in quantification only mode instead of the default combo of de-novo assembly and quantification. With -G the gene boundaries will match those in your annotation.

        In general you can always calculate the FPKM of anything provided you know the region's length in nucleotides, the total number of reads mapped to that region and the total number of reads aligned. cufflinks does not report either the read count or the total number of mapped reads in the output so you cannot, in any straightforward way, use the cufflinks outputs to calculate FPKM values for arbitrary regions. You're better off making a GTF of the regions you want to quantify and running cufflinks with -G.
        /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
        Salk Institute for Biological Studies, La Jolla, CA, USA */

        Comment


        • #5
          That will probably solve my problem! I'll create a GTF file and try to then run cufflinks using the -G option. Thank you very much for your help!

          Comment


          • #6
            Hi Shawn, I tried to run cufflinks with the -G option using a GTF file that I created but with an "." for the strand because I don't know the strand information for those loci. The output files "genes.fpkm_tracking" and "isoforms.fpkm_tracking" are all empty. Do you know the reason for that?

            I also tried to fill in a "+" or "-" for the strand column of all coordinates, but it still generated empty files. So, do you know what's wrong with that? Is the strand information in the GTF file indispensable for running cufflinks?

            Here is the log:
            [20:02:45] Inspecting reads and determining fragment length distribution.
            > Processed 0 loci. [*************************] 100%
            > Map Properties:
            > Normalized Map Mass: 26534197.87
            > Raw Map Mass: 26534197.87
            > Fragment Length Distribution: Truncated Gaussian (default)
            > Default Mean: 200
            > Default Std Dev: 80
            [20:04:25] Estimating transcript abundances.
            > Processed 0 loci. [*************************] 100%

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM
            • seqadmin
              Techniques and Challenges in Conservation Genomics
              by seqadmin



              The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

              Avian Conservation
              Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
              03-08-2024, 10:41 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, Yesterday, 06:37 PM
            0 responses
            10 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, Yesterday, 06:07 PM
            0 responses
            9 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-22-2024, 10:03 AM
            0 responses
            49 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-21-2024, 07:32 AM
            0 responses
            67 views
            0 likes
            Last Post seqadmin  
            Working...
            X