Unconfigured Ad

**steven** · 09-17-2009, 09:47 AM

a gtf file of what?
for gene annotation, maybe the TAIR or the TIGR

**SOLiD_User** · 09-17-2009, 10:06 AM

Yes, a gtf for gene annotation. I checked TAIR, TIGR, EMBL, etc and so far have been unable to locate a gtf file. I can only find gff files for arabidopsis. It seems EMBL has gtf files for everything except plants. I looked at gbrowse and the UCSC genome browser and I don't see a way to export as a gtf file. I have spent much time with google and I haven't found anything useful.

**steven** · 09-17-2009, 10:18 AM

Why isn't gff ok? Are you looking for a specific field like "transcript_id:" or so?

**SOLiD_User** · 09-17-2009, 11:08 AM

I'm trying to get the AB whole transcriptome pipeline working and it requires the transcript_id and gene_id fields in the gtf file. I tried the gff and it didn't work. I don't have the programming skills to create a perl script so I was hoping I could download a gtf file or find an application that could create one.

**steven** · 09-17-2009, 11:38 AM

could you show me a few lines of the gff file(s) you have, just in case the information is available and easy to convert?

**SOLiD_User** · 09-17-2009, 12:12 PM

The gff files look like this. I think the major challenge in converting gff to gtf is counting the exons for each transcript.
--
Chr1 TAIR9 CDS 3760 3913 . + 0 Parent=AT1G01010.1,AT1G01010.1-Protein;
Chr1 TAIR9 exon 3996 4276 . + . Parent=AT1G01010.1
Chr1 TAIR9 CDS 3996 4276 . + 2 Parent=AT1G01010.1,AT1G01010.1-Protein;
Chr1 TAIR9 exon 4486 4605 . + . Parent=AT1G01010.1
Chr1 TAIR9 CDS 4486 4605 . + 0 Parent=AT1G01010.1,AT1G01010.1-Protein;
Chr1 TAIR9 exon 4706 5095 . + . Parent=AT1G01010.1
Chr1 TAIR9 CDS 4706 5095 . + 0 Parent=AT1G01010.1,AT1G01010.1-Protein;
Chr1 TAIR9 exon 5174 5326 . + . Parent=AT1G01010.1
Chr1 TAIR9 CDS 5174 5326 . + 0 Parent=AT1G01010.1,AT1G01010.1-Protein;
Chr1 TAIR9 exon 5439 5899 . + . Parent=AT1G01010.1
Chr1 TAIR9 CDS 5439 5630 . + 0 Parent=AT1G01010.1,AT1G01010.1-Protein;

the gtf file has to be in this format

supercont1.1 protein_coding CDS 2191663 2191958 . - 1 gene_id "AAEL000037"; transcript_id "AAEL000037-RA"; exon_number "4"; protein_id "AAEL000037-PA";
supercont1.1 protein_coding exon 2191201 2191600 . - . gene_id "AAEL000037"; transcript_id "AAEL000037-RA"; exon_number "5";
supercont1.1 protein_coding CDS 2191299 2191600 . - 2 gene_id "AAEL000037"; transcript_id "AAEL000037-RA"; exon_number "5"; protein_id "AAEL000037-PA";
supercont1.1 protein_coding stop_codon 2191296 2191298 . - 0 gene_id "AAEL000037"; transcript_id "AAEL000037-RA"; exon_number "5";
supercont1.1 protein_coding exon 2207362 2207580 . - . gene_id "AAEL000086"; transcript_id "AAEL000086-RA"; exon_number "1";
supercont1.1 protein_coding CDS 2207362 2207580 . - 0 gene_id "AAEL000086"; transcript_id "AAEL000086-RA"; exon_number "1"; protein_id "AAEL000086-PA";
supercont1.1 protein_coding start_codon 2207578 2207580 . - 0 gene_id "AAEL000086"; transcript_id "AAEL000086-RA"; exon_number "1";
supercont1.1 protein_coding exon 2207263 2207299 . - . gene_id "AAEL000086"; transcript_id "AAEL000086-RA"; exon_number "2";

**andreas.sjodin** · 09-21-2009, 06:27 AM

I would recommend you to take a look at the Python GFF parsers developed by Brad Chapman. It can be downloaded form GitHub (http://github.com/chapmanb/bcbb/tree/master/gff/). Those script can convert between different types of GFF versions. More information about his script is found at in some blog posts (http://bcbio.wordpress.com)

**SOLiD_User** · 09-22-2009, 04:40 AM

Thanks Andreas. I'll have a look at the GFF parser.

**knc** · 12-03-2009, 10:37 AM

gtf for arabidopsis

Hi,

Did you solve your gtf problem? You can use the gff for the first 8 fields, the last field needs to be changed to include the gene_id, transcript_id, and exon #.

**dsidote** · 12-03-2009, 11:58 AM

We were able to get what we needed. Thanks!

**knc** · 12-03-2009, 02:27 PM

Quick question:
How did you deal with transcripts that have different stop codons?

**dsidote** · 12-03-2009, 06:09 PM

I'm not sure I understand what your asking. Are you referring to splice variants? If so, we wrote a perl script reads the file line by line and counted exons for each transcript. I can send it to you if it would help. In it's current form it only works for the gff file from TAIR.

**knc** · 12-10-2009, 02:08 PM

Sure that would be great. We are having issues with our gtf file... The file format you refer to seems a bit different from the description on the cufflinks site (http://mblab.wustl.edu/GTF22.html). Did you validate your gtf file? We get errors when we do. Thanks for your help!

**saha** · 05-03-2010, 12:36 AM

dear all,

i am facing similar problem. i am very much in need of tigr rice genome v6.0 but not able to get it yet. i want to utilize this gtf file as refgene list to upload on broad institute's IGV browser.
Any help is appreciable.

regards,
Saha

Topics	Statistics	Last Post
Large-Scale Protein Screen Uncovers Hidden Regulators of Alternative Polyadenylation by SEQadmin2 Started by SEQadmin2, 06-26-2026, 11:10 AM	0 responses 15 views 0 reactions	Last Post by SEQadmin2 06-26-2026, 11:10 AM
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, 06-17-2026, 06:09 AM	0 responses 49 views 0 reactions	Last Post by SEQadmin2 06-17-2026, 06:09 AM
Sequencing the Two-Toed Sloth Genome Reveals Jumping Genes Tied to Its Extreme Metabolism by SEQadmin2 Started by SEQadmin2, 06-09-2026, 11:58 AM	0 responses 107 views 0 reactions	Last Post by SEQadmin2 06-09-2026, 11:58 AM
A New Method Makes Hantavirus Genome Analysis Faster and More Accessible by SEQadmin2 Started by SEQadmin2, 06-05-2026, 10:09 AM	0 responses 125 views 0 reactions	Last Post by SEQadmin2 06-05-2026, 10:09 AM

Unconfigured Ad

gtf file for arabidopsis

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News