Hello,
I use Cufflinks to estimate gene but particularly isoform expression level estimation.
I have a GFF file:
I turned it into a GTF file and I used this file to run Cufflinks:
First, I'm not sure, should I keep the mRNA/gene lines?
This file contains 9864 lines, so 9864 exons.
As a result, I get the outputs of Cufflinks (genes.expr, transcripts.expr and transcripts.gtf) for each lane. Here is an example of one lane:
- gene.expr:
First problem: some genes appear several times, because exon and gene are mixed (but not all!)...
- transcripts.expr:
My main problem is here, in this "transcripts.expr" file: the result is the exon expression level estimation, instead of isoform expression level estimation.
Moreover, I should find the same number of exons as in my GFT file, but this is not the case! 583 exons are lost!
Has someone managed to use Cufflinks to compute gene and isoform expression level estimation? Which GTF file did you use?
Any help would be appreciated, I'm confused ...
I use Cufflinks to estimate gene but particularly isoform expression level estimation.
I have a GFF file:
Code:
DS571145 alternativeSplicer gene 3484 4143 . - . ID=EHI_151170;Name=hypothetical protein; DS571145 alternativeSplicer mRNA 3484 4143 . - . ID=EHI_151170.ref;Name=EHI_151170.ref;Parent=EHI_151170;completeORF=yes DS571145 alternativeSplicer exon 3484 4143 . - . ID=exon_EHI_151170.ref-1;Name=exon;Parent=EHI_151170.ref; DS571145 alternativeSplicer mRNA 3484 4143 . - . ID=EHI_151170.alt1;Name=EHI_151170.alt1;Parent=EHI_151170;completeORF=no DS571145 alternativeSplicer exon 3484 3943 . - . ID=exon_EHI_151170.alt1-1;Name=exon;Parent=EHI_151170.alt1; DS571145 alternativeSplicer exon 3997 4143 . - . ID=exon_EHI_151170.alt1-2;Name=exon;Parent=EHI_151170.alt1; DS571145 alternativeSplicer gene 4256 5944 . + . ID=EHI_151180;Name=calcineurin catalytic subunit A putative; DS571145 alternativeSplicer mRNA 4256 5944 . + . ID=EHI_151180.ref;Name=EHI_151180.ref;Parent=EHI_151180;completeORF=yes DS571145 alternativeSplicer exon 4256 5944 . + . ID=exon_EHI_151180.ref-1;Name=exon;Parent=EHI_151180.ref; DS571145 alternativeSplicer gene 9060 10194 . - . ID=EHI_151210.1;Name=TBC domain containing protein; DS571145 alternativeSplicer mRNA 9060 10194 . - . ID=EHI_151210.1.ref;Name=EHI_151210.1.ref;Parent=EHI_151210.1;completeORF=yes DS571145 alternativeSplicer exon 9060 9961 . - . ID=exon_EHI_151210.1.ref-1;Name=exon;Parent=EHI_151210.1.ref; DS571145 alternativeSplicer exon 10053 10194 . - . ID=exon_EHI_151210.1.ref-2;Name=exon;Parent=EHI_151210.1.ref;
Code:
DS571145 alternativeSplicer exon 3484 4143 . - . gene_id "EHI_151170"; transcript_id "exon_EHI_151170.ref-1"; DS571145 alternativeSplicer exon 3484 3943 . - . gene_id "EHI_151170"; transcript_id "exon_EHI_151170.alt1-1"; DS571145 alternativeSplicer exon 3997 4143 . - . gene_id "EHI_151170"; transcript_id "exon_EHI_151170.alt1-2"; DS571145 alternativeSplicer exon 4256 5944 . + . gene_id "EHI_151180"; transcript_id "exon_EHI_151180.ref-1"; DS571145 alternativeSplicer exon 9060 9961 . - . gene_id "EHI_151210.1"; transcript_id "exon_EHI_151210.1.ref-1"; DS571145 alternativeSplicer exon 10053 10194 . - . gene_id "EHI_151210.1"; transcript_id "exon_EHI_151210.1.ref-2";
This file contains 9864 lines, so 9864 exons.
As a result, I get the outputs of Cufflinks (genes.expr, transcripts.expr and transcripts.gtf) for each lane. Here is an example of one lane:
- gene.expr:
Code:
EHI_151170 7186 DS571145 3483 4143 27.9499 15.5065 40.3932 OK EHI_151180 7187 DS571145 4255 5944 18.3523 9.7844 26.9202 OK EHI_151210.1 7190 DS571145 9059 9961 33.9837 22.3246 45.6428 OK EHI_151210.1 7191 DS571145 10052 10194 43.9839 30.7198 57.248 OK
- transcripts.expr:
Code:
exon_EHI_151170.alt1-1 7186 DS571145 3483 3943 0.267399 0.00180684 0.00108507 0 2.90168 0.112715 460 361 OK exon_EHI_151170.ref-1 7186 DS571145 3483 4143 147.993 1 0.933245 122.898 173.088 62.3827 660 561 OK exon_EHI_151170.alt1-2 7186 DS571145 3996 4143 121.712 0.822414 0.0656696 49.073 194.35 51.3044 147 48 OK exon_EHI_151180.ref-1 7187 DS571145 4255 5944 176.861 1 1 150.263 203.459 78.6164 1689 1590 OK exon_EHI_151210.1.ref-1 7190 DS571145 9059 9961 327.5 1 1 291.306 363.694 147.572 902 803 OK exon_EHI_151210.1.ref-2 7191 DS571145 10052 10194 423.872 1 1 382.696 465.049 102.326 142 43 OK
Moreover, I should find the same number of exons as in my GFT file, but this is not the case! 583 exons are lost!
Has someone managed to use Cufflinks to compute gene and isoform expression level estimation? Which GTF file did you use?
Any help would be appreciated, I'm confused ...
Comment