Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • inconsistent gene names in genes.expr - Cufflinks

    Hi,

    I have discovered some inconsistencies when browsing through the Cufflinks files transcripts.expr, transcripts.tmap and genes.expr. Multiple transcripts belonging to the same gene are not named accordingly in the different files. Here are two examples to illustrate:

    in transcripts.expr we have the two transcripts:
    CUFF.8799.1 170650 chr1 47611249 47613205 107.266 1 0.749757 71.0992 143.433 60.5141 155
    CUFF.8800.2 170650 chr1 47611334 47613500 46.5777 0.434227 0.250243 0 112.334 26.2769 77
    But the gene "CUFF.8800" does not exist in the genes.expr file, only "CUFF.8799".

    In the transcripts.gtf file however both "genes" are present:
    chr1 Cufflinks transcript 47611250 47613205 1000 + . gene_id "CUFF.8799"; transcript_id "CUFF.8799.1"; FPKM "107.2658604057"; frac "0.749757"; conf_lo "71.099184"; conf_hi "143.432537"; cov "60.514078";
    chr1 Cufflinks exon 47611250 47611366 1000 + . gene_id "CUFF.8799"; transcript_id "CUFF.8799.1"; exon_number "1"; FPKM "107.2658604057"; frac "0.749757"; conf_lo "71.099184"; conf_hi "143.432537"; cov "60.514078";
    chr1 Cufflinks exon 47613168 47613205 1000 + . gene_id "CUFF.8799"; transcript_id "CUFF.8799.1"; exon_number "2"; FPKM "107.2658604057"; frac "0.749757"; conf_lo "71.099184"; conf_hi "143.432537"; cov "60.514078";
    chr1 Cufflinks transcript 47611335 47613500 434 + . gene_id "CUFF.8800"; transcript_id "CUFF.8800.2"; FPKM "46.5777479715"; frac "0.250243"; conf_lo "0.000000"; conf_hi "112.333534"; cov "26.276855";
    chr1 Cufflinks exon 47611335 47611366 434 + . gene_id "CUFF.8800"; transcript_id "CUFF.8800.2"; exon_number "1"; FPKM "46.5777479715"; frac "0.250243"; conf_lo "0.000000"; conf_hi "112.333534"; cov "26.276855";
    chr1 Cufflinks exon 47613456 47613500 434 + . gene_id "CUFF.8800"; transcript_id "CUFF.8800.2"; exon_number "2"; FPKM "46.5777479715"; frac "0.250243"; conf_lo "0.000000"; conf_hi "112.333534"; cov "26.276855";
    An picture from UCSC browser with my data is attached ("locus.8799.8800.gif"), and these transcripts were called in the first sample "Y1". Here you clearly see that these two transcripts come from the same loci, which the FMI and frac values in the transcript.expr file also implicate.

    A somewhat more complicated example is a loci with three different transcripts deemed to be present, see attached picture "loci30602.30603.30604.gif". Analougusly, the three transcripts have different gene names:
    CUFF.30602.1 198908 chr10 69761823 69768943 92.9518 0.424581 0.286601 50.5627 135.341 54.5443 491
    CUFF.30603.2 198908 chr10 69761823 69769315 218.926 1 0.596452 174.379 263.473 128.466 510
    CUFF.30604.3 198908 chr10 69768551 69769315 75.6472 0.345538 0.116947 60.7094 90.5849 44.3899 551
    But only one of them, CUFF.30602, is present in the genes.expr file, but in the tmap file all three transcripts are annotated as belonging to the three genes CUFF.30604, CUFF.30603, CUFF.30602.

    The FPKM value in the genes.expr file seems to be the total of all isoforms, but the naming and referencing is confusing.

    Now you know.

    Boel
    Last edited by Boel; 04-14-2010, 05:06 AM.

  • #2
    Your figures are really hard to see! Can you put up a better quality image?

    Comment


    • #3
      link to the illustrations

      Yup, they got tiny when I attached them apperently. They can be viewed here:

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Essential Discoveries and Tools in Epitranscriptomics
        by seqadmin


        The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
        Today, 07:01 AM
      • seqadmin
        Current Approaches to Protein Sequencing
        by seqadmin


        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
        04-04-2024, 04:25 PM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 04-11-2024, 12:08 PM
      0 responses
      37 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 10:19 PM
      0 responses
      41 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 09:21 AM
      0 responses
      35 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-04-2024, 09:00 AM
      0 responses
      54 views
      0 likes
      Last Post seqadmin  
      Working...
      X