Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • inconsistent gene names in genes.expr - Cufflinks

    Hi,

    I have discovered some inconsistencies when browsing through the Cufflinks files transcripts.expr, transcripts.tmap and genes.expr. Multiple transcripts belonging to the same gene are not named accordingly in the different files. Here are two examples to illustrate:

    in transcripts.expr we have the two transcripts:
    CUFF.8799.1 170650 chr1 47611249 47613205 107.266 1 0.749757 71.0992 143.433 60.5141 155
    CUFF.8800.2 170650 chr1 47611334 47613500 46.5777 0.434227 0.250243 0 112.334 26.2769 77
    But the gene "CUFF.8800" does not exist in the genes.expr file, only "CUFF.8799".

    In the transcripts.gtf file however both "genes" are present:
    chr1 Cufflinks transcript 47611250 47613205 1000 + . gene_id "CUFF.8799"; transcript_id "CUFF.8799.1"; FPKM "107.2658604057"; frac "0.749757"; conf_lo "71.099184"; conf_hi "143.432537"; cov "60.514078";
    chr1 Cufflinks exon 47611250 47611366 1000 + . gene_id "CUFF.8799"; transcript_id "CUFF.8799.1"; exon_number "1"; FPKM "107.2658604057"; frac "0.749757"; conf_lo "71.099184"; conf_hi "143.432537"; cov "60.514078";
    chr1 Cufflinks exon 47613168 47613205 1000 + . gene_id "CUFF.8799"; transcript_id "CUFF.8799.1"; exon_number "2"; FPKM "107.2658604057"; frac "0.749757"; conf_lo "71.099184"; conf_hi "143.432537"; cov "60.514078";
    chr1 Cufflinks transcript 47611335 47613500 434 + . gene_id "CUFF.8800"; transcript_id "CUFF.8800.2"; FPKM "46.5777479715"; frac "0.250243"; conf_lo "0.000000"; conf_hi "112.333534"; cov "26.276855";
    chr1 Cufflinks exon 47611335 47611366 434 + . gene_id "CUFF.8800"; transcript_id "CUFF.8800.2"; exon_number "1"; FPKM "46.5777479715"; frac "0.250243"; conf_lo "0.000000"; conf_hi "112.333534"; cov "26.276855";
    chr1 Cufflinks exon 47613456 47613500 434 + . gene_id "CUFF.8800"; transcript_id "CUFF.8800.2"; exon_number "2"; FPKM "46.5777479715"; frac "0.250243"; conf_lo "0.000000"; conf_hi "112.333534"; cov "26.276855";
    An picture from UCSC browser with my data is attached ("locus.8799.8800.gif"), and these transcripts were called in the first sample "Y1". Here you clearly see that these two transcripts come from the same loci, which the FMI and frac values in the transcript.expr file also implicate.

    A somewhat more complicated example is a loci with three different transcripts deemed to be present, see attached picture "loci30602.30603.30604.gif". Analougusly, the three transcripts have different gene names:
    CUFF.30602.1 198908 chr10 69761823 69768943 92.9518 0.424581 0.286601 50.5627 135.341 54.5443 491
    CUFF.30603.2 198908 chr10 69761823 69769315 218.926 1 0.596452 174.379 263.473 128.466 510
    CUFF.30604.3 198908 chr10 69768551 69769315 75.6472 0.345538 0.116947 60.7094 90.5849 44.3899 551
    But only one of them, CUFF.30602, is present in the genes.expr file, but in the tmap file all three transcripts are annotated as belonging to the three genes CUFF.30604, CUFF.30603, CUFF.30602.

    The FPKM value in the genes.expr file seems to be the total of all isoforms, but the naming and referencing is confusing.

    Now you know.

    Boel
    Last edited by Boel; 04-14-2010, 05:06 AM.

  • #2
    Your figures are really hard to see! Can you put up a better quality image?

    Comment


    • #3
      link to the illustrations

      Yup, they got tiny when I attached them apperently. They can be viewed here:

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Strategies for Sequencing Challenging Samples
        by seqadmin


        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
        03-22-2024, 06:39 AM
      • seqadmin
        Techniques and Challenges in Conservation Genomics
        by seqadmin



        The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

        Avian Conservation
        Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
        03-08-2024, 10:41 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 03-27-2024, 06:37 PM
      0 responses
      12 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-27-2024, 06:07 PM
      0 responses
      11 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-22-2024, 10:03 AM
      0 responses
      53 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-21-2024, 07:32 AM
      0 responses
      68 views
      0 likes
      Last Post seqadmin  
      Working...
      X