Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Transcripts expression estimation using cuffdiff by providing reference annotaion

    I want to try to estimate the expression level for the transcripts in each gene.
    Rather than using the isoforms generated by TopHat-Cufflinks pipeline, I want to use the known annotations.
    When I run cuffdiff, I provided the mapping results in SAM format and the ensembl annotation as GTF file.
    When I checked the cuffdiff results, there are some weird things in the gene boundaries for the test.

    For example, gene ENSMUSG00000029019 structure is stored in the gtf file like below.

    Code:
    chr4    mm9_ensGene start_codon 147360085   147360087   0.000000    +   .   gene_id "ENSMUSG00000029019"; transcript_id "ENSMUST00000103231";
    chr4    mm9_ensGene CDS 147360085   147360210   0.000000    +   0   gene_id "ENSMUSG00000029019"; transcript_id "ENSMUST00000103231";
    chr4    mm9_ensGene exon    147360009   147360210   0.000000    +   .   gene_id "ENSMUSG00000029019"; transcript_id "ENSMUST00000103231";
    chr4    mm9_ensGene CDS 147360405   147360627   0.000000    +   0   gene_id "ENSMUSG00000029019"; transcript_id "ENSMUST00000103231";
    chr4    mm9_ensGene exon    147360405   147360627   0.000000    +   .   gene_id "ENSMUSG00000029019"; transcript_id "ENSMUST00000103231";
    chr4    mm9_ensGene CDS 147361071   147361087   0.000000    +   2   gene_id "ENSMUSG00000029019"; transcript_id "ENSMUST00000103231";
    chr4    mm9_ensGene exon    147361071   147361306   0.000000    +   .   gene_id "ENSMUSG00000029019"; transcript_id "ENSMUST00000103231";
    Therefore, the gene is starting from 147360009 to 147361306 (1-based position).
    But in the cuffdiff result, genes.fpkm_tracking, the locus for the gene is much larger than the original one, from 147326657 to 147416061.
    Code:
    tracking_id class_code  nearest_ref_id  gene_short_name tss_id  locus   q0_FPKM q0_conf_lo  q0_conf_hi  q1_FPKM q1_conf_lo  q1_conf_hi
    ENSMUSG00000029019  -   -   -   -   chr4:147326657-147416061    72.4927 68.7873 76.1981 47.0939 43.9784 50.2093
    Does it mean that cuffdiff is trying to set the new gene locus (or boundaries) based on the supplied short read data and the provided gene annotation (e.g. emsembl gtf file) is just used as the guidance?
    In that case, is there any way to estimate the expression using the exact gene structures provided by user rather than cufflinks definition?

    Thanks for any comments in advance.

  • #2
    This behavior changed in 0.8.2. What goes in that locus tag is really just a tag that you can copy into a browser window. It's NOT meant to define the boundaries of the object being tested precisely. It's just there so that you can grab a line out of your file and pop open a browser window to see not only that record, but all the records that cuffdiff processed simultaneously.

    I implemented this behavior to reflect the way I use cuffdiff: I see an isoform-level record for example, and I immediately want to see a UCSC browser shot of not only that isoform, but the whole gene it lives in, along what whatever else is in the neighborhood.

    I will update the manual to describe in more detail what this locus tag means.

    Comment


    • #3
      It makes sense. Thanks for the explanation of the tag.

      Comment


      • #4
        I have a question on cuffdiff output. When I use cuffcompare, I use the reference annoation downlowd from UCSC.
        ./cuffcompare -o sample1_4 -r hg18_ref.gtf -R ../sample1/transcripts.gtf /../sample4/transcripts.gtf

        Then I use cuffdiff is following:
        ./cuffdiff -m 200 sample1_4.combined.gtf s1_accepted_hits.sam s4_accepted_hits.sam

        In my results, there is no anything come out for 0_1_cds.diff except of the title line?

        Is this reasonable?
        Thanks in advance

        Comment


        • #5
          UCSC gtf files?

          Originally posted by sunnyvu View Post
          I have a question on cuffdiff output. When I use cuffcompare, I use the reference annoation downlowd from UCSC.
          In my results, there is no anything come out for 0_1_cds.diff except of the title line?
          I had the same problem. Is it possible that you have to use the GTF files from Ensembl? E.g. ftp://ftp.ensembl.org/pub/release-58/gtf/

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM
          • seqadmin
            Strategies for Sequencing Challenging Samples
            by seqadmin


            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
            03-22-2024, 06:39 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          17 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          22 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 09:21 AM
          0 responses
          16 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-04-2024, 09:00 AM
          0 responses
          46 views
          0 likes
          Last Post seqadmin  
          Working...
          X