Hi all,
I am discovering the pleasure to work with GFF files and I have a question related to the human GFF file present in Ensembl.
More particulary if I look at this transcript:
FOPNL-007
Region: chromosome:GRCh37:16:15961195:15982482:1 Transcript: ENST00000575073 (FOPNL-007)
16 Ensembl_havana Exon 15961195 15961373 . - 2 gene_id=ENSG00000133393; gene_name=FOPNL; transcript_id=ENST00000575073; transcript_name=FOPNL-007; exon_id=ENSE00002640477; gene_type=KNOWN_protein_coding
16 Ensembl_havana Exon 15973661 15973745 . - 1 gene_id=ENSG00000133393; gene_name=FOPNL; transcript_id=ENST00000575073; transcript_name=FOPNL-007; exon_id=ENSE00003662092; gene_type=KNOWN_protein_coding
16 Ensembl_havana Exon 15977865 15978062 . - 2 gene_id=ENSG00000133393; gene_name=FOPNL; transcript_id=ENST00000575073; transcript_name=FOPNL-007; exon_id=ENSE00000909153; gene_type=KNOWN_protein_coding
16 Ensembl_havana Exon 15982415 15982482 . - . gene_id=ENSG00000133393; gene_name=FOPNL; transcript_id=ENST00000575073; transcript_name=FOPNL-007; exon_id=ENSE00002635299; gene_type=KNOWN_protein_coding
and the same transcript in the Ensembl GFF file:
ftp://ftp.ensembl.org/pub/release-75...Ch37.75.gtf.gz
16 protein_coding transcript 15961195 15982482 . - . gene_id "ENSG00000133393"; transcript_id "ENST00000575073"; gene_name "FOPNL"; gene_source "ensembl_havana"; gene_biotype "protein_coding"; transcript_name "FOPNL-007"; transcript_source "havana";
16 protein_coding exon 15982415 15982482 . - . gene_id "ENSG00000133393"; transcript_id "ENST00000575073"; exon_number "1"; gene_name "FOPNL"; gene_source "ensembl_havana"; gene_biotype "protein_coding"; transcript_name "FOPNL-007"; transcript_source "havana"; exon_id "ENSE00002635299";
16 protein_coding CDS 15982415 15982442 . - 0 gene_id "ENSG00000133393"; transcript_id "ENST00000575073"; exon_number "1"; gene_name "FOPNL"; gene_source "ensembl_havana"; gene_biotype "protein_coding"; transcript_name "FOPNL-007"; transcript_source "havana"; protein_id "ENSP00000459804";
16 protein_coding start_codon 15982440 15982442 . - 0 gene_id "ENSG00000133393"; transcript_id "ENST00000575073"; exon_number "1"; gene_name "FOPNL"; gene_source "ensembl_havana"; gene_biotype "protein_coding"; transcript_name "FOPNL-007"; transcript_source "havana";
16 protein_coding exon 15977865 15978062 . - . gene_id "ENSG00000133393"; transcript_id "ENST00000575073"; exon_number "2"; gene_name "FOPNL"; gene_source "ensembl_havana"; gene_biotype "protein_coding"; transcript_name "FOPNL-007"; transcript_source "havana"; exon_id "ENSE00000909153";
16 protein_coding CDS 15977865 15978062 . - 2 gene_id "ENSG00000133393"; transcript_id "ENST00000575073"; exon_number "2"; gene_name "FOPNL"; gene_source "ensembl_havana"; gene_biotype "protein_coding"; transcript_name "FOPNL-007"; transcript_source "havana"; protein_id "ENSP00000459804";
16 protein_coding exon 15973661 15973745 . - . gene_id "ENSG00000133393"; transcript_id "ENST00000575073"; exon_number "3"; gene_name "FOPNL"; gene_source "ensembl_havana"; gene_biotype "protein_coding"; transcript_name "FOPNL-007"; transcript_source "havana"; exon_id "ENSE00003662092";
16 protein_coding CDS 15973661 15973745 . - 2 gene_id "ENSG00000133393"; transcript_id "ENST00000575073"; exon_number "3"; gene_name "FOPNL"; gene_source "ensembl_havana"; gene_biotype "protein_coding"; transcript_name "FOPNL-007"; transcript_source "havana"; protein_id "ENSP00000459804";
16 protein_coding exon 15961195 15961373 . - . gene_id "ENSG00000133393"; transcript_id "ENST00000575073"; exon_number "4"; gene_name "FOPNL"; gene_source "ensembl_havana"; gene_biotype "protein_coding"; transcript_name "FOPNL-007"; transcript_source "havana"; exon_id "ENSE00002640477";
16 protein_coding CDS 15961373 15961373 . - 1 gene_id "ENSG00000133393"; transcript_id "ENST00000575073"; exon_number "4"; gene_name "FOPNL"; gene_source "ensembl_havana"; gene_biotype "protein_coding"; transcript_name "FOPNL-007"; transcript_source "havana"; protein_id "ENSP00000459804";
16 protein_coding stop_codon 15961370 15961372 . - 0 gene_id "ENSG00000133393"; transcript_id "ENST00000575073"; exon_number "4"; gene_name "FOPNL"; gene_source "ensembl_havana"; gene_biotype "protein_coding"; transcript_name "FOPNL-007"; transcript_source "havana";
16 protein_coding UTR 15982443 15982482 . - . gene_id "ENSG00000133393"; transcript_id "ENST00000575073"; gene_name "FOPNL"; gene_source "ensembl_havana"; gene_biotype "protein_coding"; transcript_name "FOPNL-007"; transcript_source "havana";
16 protein_coding UTR 15961195 15961369 . - . gene_id "ENSG00000133393"; transcript_id "ENST00000575073"; gene_name "FOPNL"; gene_source "ensembl_havana"; gene_biotype "protein_coding"; transcript_name "FOPNL-007"; transcript_source "havana";
The exons are fine but is it normal that the last CDS have a length of 0?
Thanks!
I am discovering the pleasure to work with GFF files and I have a question related to the human GFF file present in Ensembl.
More particulary if I look at this transcript:
FOPNL-007
Region: chromosome:GRCh37:16:15961195:15982482:1 Transcript: ENST00000575073 (FOPNL-007)
16 Ensembl_havana Exon 15961195 15961373 . - 2 gene_id=ENSG00000133393; gene_name=FOPNL; transcript_id=ENST00000575073; transcript_name=FOPNL-007; exon_id=ENSE00002640477; gene_type=KNOWN_protein_coding
16 Ensembl_havana Exon 15973661 15973745 . - 1 gene_id=ENSG00000133393; gene_name=FOPNL; transcript_id=ENST00000575073; transcript_name=FOPNL-007; exon_id=ENSE00003662092; gene_type=KNOWN_protein_coding
16 Ensembl_havana Exon 15977865 15978062 . - 2 gene_id=ENSG00000133393; gene_name=FOPNL; transcript_id=ENST00000575073; transcript_name=FOPNL-007; exon_id=ENSE00000909153; gene_type=KNOWN_protein_coding
16 Ensembl_havana Exon 15982415 15982482 . - . gene_id=ENSG00000133393; gene_name=FOPNL; transcript_id=ENST00000575073; transcript_name=FOPNL-007; exon_id=ENSE00002635299; gene_type=KNOWN_protein_coding
and the same transcript in the Ensembl GFF file:
ftp://ftp.ensembl.org/pub/release-75...Ch37.75.gtf.gz
16 protein_coding transcript 15961195 15982482 . - . gene_id "ENSG00000133393"; transcript_id "ENST00000575073"; gene_name "FOPNL"; gene_source "ensembl_havana"; gene_biotype "protein_coding"; transcript_name "FOPNL-007"; transcript_source "havana";
16 protein_coding exon 15982415 15982482 . - . gene_id "ENSG00000133393"; transcript_id "ENST00000575073"; exon_number "1"; gene_name "FOPNL"; gene_source "ensembl_havana"; gene_biotype "protein_coding"; transcript_name "FOPNL-007"; transcript_source "havana"; exon_id "ENSE00002635299";
16 protein_coding CDS 15982415 15982442 . - 0 gene_id "ENSG00000133393"; transcript_id "ENST00000575073"; exon_number "1"; gene_name "FOPNL"; gene_source "ensembl_havana"; gene_biotype "protein_coding"; transcript_name "FOPNL-007"; transcript_source "havana"; protein_id "ENSP00000459804";
16 protein_coding start_codon 15982440 15982442 . - 0 gene_id "ENSG00000133393"; transcript_id "ENST00000575073"; exon_number "1"; gene_name "FOPNL"; gene_source "ensembl_havana"; gene_biotype "protein_coding"; transcript_name "FOPNL-007"; transcript_source "havana";
16 protein_coding exon 15977865 15978062 . - . gene_id "ENSG00000133393"; transcript_id "ENST00000575073"; exon_number "2"; gene_name "FOPNL"; gene_source "ensembl_havana"; gene_biotype "protein_coding"; transcript_name "FOPNL-007"; transcript_source "havana"; exon_id "ENSE00000909153";
16 protein_coding CDS 15977865 15978062 . - 2 gene_id "ENSG00000133393"; transcript_id "ENST00000575073"; exon_number "2"; gene_name "FOPNL"; gene_source "ensembl_havana"; gene_biotype "protein_coding"; transcript_name "FOPNL-007"; transcript_source "havana"; protein_id "ENSP00000459804";
16 protein_coding exon 15973661 15973745 . - . gene_id "ENSG00000133393"; transcript_id "ENST00000575073"; exon_number "3"; gene_name "FOPNL"; gene_source "ensembl_havana"; gene_biotype "protein_coding"; transcript_name "FOPNL-007"; transcript_source "havana"; exon_id "ENSE00003662092";
16 protein_coding CDS 15973661 15973745 . - 2 gene_id "ENSG00000133393"; transcript_id "ENST00000575073"; exon_number "3"; gene_name "FOPNL"; gene_source "ensembl_havana"; gene_biotype "protein_coding"; transcript_name "FOPNL-007"; transcript_source "havana"; protein_id "ENSP00000459804";
16 protein_coding exon 15961195 15961373 . - . gene_id "ENSG00000133393"; transcript_id "ENST00000575073"; exon_number "4"; gene_name "FOPNL"; gene_source "ensembl_havana"; gene_biotype "protein_coding"; transcript_name "FOPNL-007"; transcript_source "havana"; exon_id "ENSE00002640477";
16 protein_coding CDS 15961373 15961373 . - 1 gene_id "ENSG00000133393"; transcript_id "ENST00000575073"; exon_number "4"; gene_name "FOPNL"; gene_source "ensembl_havana"; gene_biotype "protein_coding"; transcript_name "FOPNL-007"; transcript_source "havana"; protein_id "ENSP00000459804";
16 protein_coding stop_codon 15961370 15961372 . - 0 gene_id "ENSG00000133393"; transcript_id "ENST00000575073"; exon_number "4"; gene_name "FOPNL"; gene_source "ensembl_havana"; gene_biotype "protein_coding"; transcript_name "FOPNL-007"; transcript_source "havana";
16 protein_coding UTR 15982443 15982482 . - . gene_id "ENSG00000133393"; transcript_id "ENST00000575073"; gene_name "FOPNL"; gene_source "ensembl_havana"; gene_biotype "protein_coding"; transcript_name "FOPNL-007"; transcript_source "havana";
16 protein_coding UTR 15961195 15961369 . - . gene_id "ENSG00000133393"; transcript_id "ENST00000575073"; gene_name "FOPNL"; gene_source "ensembl_havana"; gene_biotype "protein_coding"; transcript_name "FOPNL-007"; transcript_source "havana";
The exons are fine but is it normal that the last CDS have a length of 0?
Thanks!