Hi,
I have discovered some inconsistencies when browsing through the Cufflinks files transcripts.expr, transcripts.tmap and genes.expr. Multiple transcripts belonging to the same gene are not named accordingly in the different files. Here are two examples to illustrate:
in transcripts.expr we have the two transcripts:
But the gene "CUFF.8800" does not exist in the genes.expr file, only "CUFF.8799".
In the transcripts.gtf file however both "genes" are present:
An picture from UCSC browser with my data is attached ("locus.8799.8800.gif"), and these transcripts were called in the first sample "Y1". Here you clearly see that these two transcripts come from the same loci, which the FMI and frac values in the transcript.expr file also implicate.
A somewhat more complicated example is a loci with three different transcripts deemed to be present, see attached picture "loci30602.30603.30604.gif". Analougusly, the three transcripts have different gene names:
But only one of them, CUFF.30602, is present in the genes.expr file, but in the tmap file all three transcripts are annotated as belonging to the three genes CUFF.30604, CUFF.30603, CUFF.30602.
The FPKM value in the genes.expr file seems to be the total of all isoforms, but the naming and referencing is confusing.
Now you know.
Boel
I have discovered some inconsistencies when browsing through the Cufflinks files transcripts.expr, transcripts.tmap and genes.expr. Multiple transcripts belonging to the same gene are not named accordingly in the different files. Here are two examples to illustrate:
in transcripts.expr we have the two transcripts:
CUFF.8799.1 170650 chr1 47611249 47613205 107.266 1 0.749757 71.0992 143.433 60.5141 155
CUFF.8800.2 170650 chr1 47611334 47613500 46.5777 0.434227 0.250243 0 112.334 26.2769 77
CUFF.8800.2 170650 chr1 47611334 47613500 46.5777 0.434227 0.250243 0 112.334 26.2769 77
In the transcripts.gtf file however both "genes" are present:
chr1 Cufflinks transcript 47611250 47613205 1000 + . gene_id "CUFF.8799"; transcript_id "CUFF.8799.1"; FPKM "107.2658604057"; frac "0.749757"; conf_lo "71.099184"; conf_hi "143.432537"; cov "60.514078";
chr1 Cufflinks exon 47611250 47611366 1000 + . gene_id "CUFF.8799"; transcript_id "CUFF.8799.1"; exon_number "1"; FPKM "107.2658604057"; frac "0.749757"; conf_lo "71.099184"; conf_hi "143.432537"; cov "60.514078";
chr1 Cufflinks exon 47613168 47613205 1000 + . gene_id "CUFF.8799"; transcript_id "CUFF.8799.1"; exon_number "2"; FPKM "107.2658604057"; frac "0.749757"; conf_lo "71.099184"; conf_hi "143.432537"; cov "60.514078";
chr1 Cufflinks transcript 47611335 47613500 434 + . gene_id "CUFF.8800"; transcript_id "CUFF.8800.2"; FPKM "46.5777479715"; frac "0.250243"; conf_lo "0.000000"; conf_hi "112.333534"; cov "26.276855";
chr1 Cufflinks exon 47611335 47611366 434 + . gene_id "CUFF.8800"; transcript_id "CUFF.8800.2"; exon_number "1"; FPKM "46.5777479715"; frac "0.250243"; conf_lo "0.000000"; conf_hi "112.333534"; cov "26.276855";
chr1 Cufflinks exon 47613456 47613500 434 + . gene_id "CUFF.8800"; transcript_id "CUFF.8800.2"; exon_number "2"; FPKM "46.5777479715"; frac "0.250243"; conf_lo "0.000000"; conf_hi "112.333534"; cov "26.276855";
chr1 Cufflinks exon 47611250 47611366 1000 + . gene_id "CUFF.8799"; transcript_id "CUFF.8799.1"; exon_number "1"; FPKM "107.2658604057"; frac "0.749757"; conf_lo "71.099184"; conf_hi "143.432537"; cov "60.514078";
chr1 Cufflinks exon 47613168 47613205 1000 + . gene_id "CUFF.8799"; transcript_id "CUFF.8799.1"; exon_number "2"; FPKM "107.2658604057"; frac "0.749757"; conf_lo "71.099184"; conf_hi "143.432537"; cov "60.514078";
chr1 Cufflinks transcript 47611335 47613500 434 + . gene_id "CUFF.8800"; transcript_id "CUFF.8800.2"; FPKM "46.5777479715"; frac "0.250243"; conf_lo "0.000000"; conf_hi "112.333534"; cov "26.276855";
chr1 Cufflinks exon 47611335 47611366 434 + . gene_id "CUFF.8800"; transcript_id "CUFF.8800.2"; exon_number "1"; FPKM "46.5777479715"; frac "0.250243"; conf_lo "0.000000"; conf_hi "112.333534"; cov "26.276855";
chr1 Cufflinks exon 47613456 47613500 434 + . gene_id "CUFF.8800"; transcript_id "CUFF.8800.2"; exon_number "2"; FPKM "46.5777479715"; frac "0.250243"; conf_lo "0.000000"; conf_hi "112.333534"; cov "26.276855";
A somewhat more complicated example is a loci with three different transcripts deemed to be present, see attached picture "loci30602.30603.30604.gif". Analougusly, the three transcripts have different gene names:
CUFF.30602.1 198908 chr10 69761823 69768943 92.9518 0.424581 0.286601 50.5627 135.341 54.5443 491
CUFF.30603.2 198908 chr10 69761823 69769315 218.926 1 0.596452 174.379 263.473 128.466 510
CUFF.30604.3 198908 chr10 69768551 69769315 75.6472 0.345538 0.116947 60.7094 90.5849 44.3899 551
CUFF.30603.2 198908 chr10 69761823 69769315 218.926 1 0.596452 174.379 263.473 128.466 510
CUFF.30604.3 198908 chr10 69768551 69769315 75.6472 0.345538 0.116947 60.7094 90.5849 44.3899 551
The FPKM value in the genes.expr file seems to be the total of all isoforms, but the naming and referencing is confusing.
Now you know.
Boel
Comment