![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
converting GFF to GTF | efoss | Bioinformatics | 8 | 10-15-2013 06:06 AM |
GFF to GTF, and GTF to GRanges objects | lewewoo | Bioinformatics | 2 | 04-03-2012 03:52 PM |
gff3,gtf to gff | parulvk | Bioinformatics | 2 | 11-15-2011 12:48 PM |
GFF to GTF | gen2prot | Bioinformatics | 9 | 12-14-2010 11:07 AM |
merging a tab and a fasta file | arg | General | 2 | 10-21-2010 11:53 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: Kansas City Join Date: Oct 2009
Posts: 88
|
![]()
Hello everyone,
I have a bunch of GFFs that I would like to convert into GTF format in order to provide annotation for use in Cufflinks. Can anyone recommend a tab delimited file editor I could use to do this? I'm not a programmer so if there is coding necessary it would have to be very basic. I've tried using Galaxy, but it changes the data I enter (mainly: "" ). Thanks, Brandon |
![]() |
![]() |
![]() |
#2 |
Senior Member
Location: Kansas City Join Date: Mar 2008
Posts: 197
|
![]()
How about this?
http://www.sequenceontology.org/cgi-bin/converter.cgi Oh, sorry. I read it wrong. You want to go in the other direction. Last edited by mgogol; 03-29-2010 at 01:34 PM. |
![]() |
![]() |
![]() |
#3 |
Member
Location: Cardiff Join Date: Mar 2010
Posts: 23
|
![]()
Hi Brandon,
Did you find a simple way to convert GFF to GTF. I want to do exactly the same thing, I am also not a programmer. Thanks, J |
![]() |
![]() |
![]() |
#4 |
Senior Member
Location: Kansas City Join Date: Mar 2008
Posts: 197
|
![]()
You can try my perl script. I used this with a flybase gff file. Note that if you want to represent ncRNAs, tRNAs, rRNAs, snRNAs, miRNAs, you'll have to manually change them to "mRNA" in the gff file or modify the script.
This also expects all the mRNA entries to come before the exons. If your gff file isn't ordered like that, you can grep the mRNAs out and then the exons and cat them together. Code:
#!/usr/bin/env perl ############################### # gff2gtf.pl # # parses mRNA, exon lines from a gff file and prints gtf lines (for cufflinks) # 5/2010 # ############################# use Bio::Tools::GFF; my $parser = new Bio::Tools::GFF->new(-file=> $ARGV[0], -gff_version => 3); my %hash; while( my $result = $parser->next_feature ) { ($id,@junk)= $result->get_tag_values("ID"); $type = $result->primary_tag(); if(!$result) { last; } $seq_id = $result->seq_id(); $strand = $result->strand(); $strand =~ s/-1/-/g; $strand =~ s/1/+/g; $start = $result->start(); $end = $result->end(); if($type eq "mRNA") { ($parent,@junk)= $result->get_tag_values("Parent"); $hash{$id} = $parent; } if($type eq "exon") { #find out transcript (parent) and gene for THIS exon ($parent,@junk)= $result->get_tag_values("Parent"); $transcript = $parent; $gene = $hash{$transcript}; print "$seq_id\tFlyBase\t$type\t$start\t$end\t.\t$strand\t.\tgene_id \"$gene\";transcript_id \"$transcript\";\n"; } } |
![]() |
![]() |
![]() |
#5 | ||
Member
Location: Cardiff Join Date: Mar 2010
Posts: 23
|
![]()
Thanks for that. Sorry I am a newbie to this and perl. How would I go about changing the script to suit my data?
This is my data: Quote:
Quote:
James$ perl gff2gtf.pl chrm1_mRNA_exon.gff > chrm1.gtf ------------- EXCEPTION ------------- MSG: asking for tag value that does not exist ID STACK Bio::SeqFeature::Generic::get_tag_values Bio/SeqFeature/Generic.pm:517 STACK toplevel gff2gtf.pl:16 ------------------------------------- Thanks alot, James Last edited by James; 07-31-2010 at 12:43 PM. Reason: edit details |
||
![]() |
![]() |
![]() |
#6 |
Member
Location: Cardiff Join Date: Mar 2010
Posts: 23
|
![]()
oh DDB0232428 is chrm1. I'll change that to chrm1 with sed.
|
![]() |
![]() |
![]() |
#7 |
Junior Member
Location: La Jolla, CA Join Date: Sep 2010
Posts: 2
|
![]()
Hi!
I'm experiencing a similar problem. I have a .gff file for my organism (Anabaena sp. strain 7120) and would like to convert it to a .gtf to upload with the software cufflinks. My current format looks like this: ##gff-version 3 #!gff-spec-version 1.14 #!source-version NCBI C++ formatter 0.2 ##Type DNA BA000019.2 BA000019.2 DDBJ source 1 6413771 . + . organism=Nostoc sp. PCC 7120;mol_type=genomic DNA;strain=PCC 7120;db_xref=taxon:103690;note=synonym: Anabaena sp. PCC 7120 BA000019.2 DDBJ gene 1 918 . - . ID=BA000019.2:all0001 BA000019.2 DDBJ gene 6413460 6413771 . - . ID=BA000019.2:all0001 BA000019.2 DDBJ CDS 1 918 . - 0 note=all0001%3B ORF_ID:all0001%3B%0Aunknown protein;transl_table=11;protein_id=BAB77525.1;db_xref=GI:55420319;exon_number=1 BA000019.2 DDBJ CDS 6413463 6413771 . - 0 note=all0001%3B ORF_ID:all0001%3B%0Aunknown protein;transl_table=11;protein_id=BAB77525.1;db_xref=GI:55420319;exon_number=2 BA000019.2 DDBJ start_codon 916 918 . - 0 note=all0001%3B ORF_ID:all0001%3B%0Aunknown protein;transl_table=11;protein_id=BAB77525.1;db_xref=GI:55420319;exon_number=1 and I need this: AB000381 Twinscan CDS 380 401 . + 0 gene_id "001"; transcript_id "001.1"; AB000381 Twinscan CDS 501 650 . + 2 gene_id "001"; transcript_id "001.1"; AB000381 Twinscan CDS 700 707 . + 2 gene_id "001"; transcript_id "001.1"; AB000381 Twinscan start_codon 380 382 . + 0 gene_id "001"; transcript_id "001.1"; AB000381 Twinscan stop_codon 708 710 . + 0 gene_id "001"; transcript_id "001.1"; I tried a couple gff to gtf perl converters like this one by the ninth column never comes out right. Any help would be great. Thanks! Britt |
![]() |
![]() |
![]() |
#8 |
Junior Member
Location: La Jolla, CA Join Date: Sep 2010
Posts: 2
|
![]()
any help would be great!
|
![]() |
![]() |
![]() |
#9 |
Nils Homer
Location: Boston, MA, USA Join Date: Nov 2008
Posts: 1,285
|
![]() |
![]() |
![]() |
![]() |
#10 | ||
Junior Member
Location: Torrance, CA Join Date: Oct 2010
Posts: 3
|
![]()
Hi,
I am also trying to convert a gff file to gtf, and am using the gff2gtf.pl script. However, I'm getting an error about the length of each line in the file: Quote:
Quote:
thank you! Last edited by jbittner; 10-18-2010 at 03:32 PM. Reason: :D made a smiley face when posted |
||
![]() |
![]() |
![]() |
#11 |
Senior Member
Location: Kansas City Join Date: Mar 2008
Posts: 197
|
![]()
Maybe you can get rid of some of the irrelevant lines? grep for mRNA and exon and make a new file only containing those lines? If you put your file up somewhere maybe I could take a look at it.
Same with other people having problems. The errors are from Bioperl, so I'm having trouble figuring out what they mean, I'd have to do more testing with the script. |
![]() |
![]() |
![]() |
#12 |
Junior Member
Location: Torrance, CA Join Date: Oct 2010
Posts: 3
|
![]()
Thank you for the idea, I am sort of new to this so any advice really helps.
I got the GFF file off of the Sanger FTP site, and it's for the parasite Leishmania braziliensis. It's too big to upload to the forum even when I compress it. Is there another way I can get it to you? Here is the link for where I got it ftp://ftp.sanger.ac.uk/pub/pathogens/L_braziliensis/ (I connected as "guest", then found it through the folders Datasets/GFF) |
![]() |
![]() |
![]() |
#13 |
Senior Member
Location: Kansas City Join Date: Mar 2008
Posts: 197
|
![]()
That GFF file doesn't have exon entries and the last column doesn't have an ID tag... Do you have a source for exon level information?
If you don't, you could try running without a gtf file, and just trying to let cufflinks define it's own transcripts. |
![]() |
![]() |
![]() |
#14 |
Junior Member
Location: Torrance, CA Join Date: Oct 2010
Posts: 3
|
![]()
Unfortunately the only exon level information that we have found is in a .cds file and I haven't found any ways to convert this to GFF or GTF, I don't even know what that file extension means. (I found it in the same FTP site).
Also, we are ultimately trying to get a refflat file to use with DEGseq, and so converting our gff to gtf file was just an intermediate step in that process. I really appreciate your help |
![]() |
![]() |
![]() |
#15 |
Senior Member
Location: Kansas City Join Date: Mar 2008
Posts: 197
|
![]()
Um. I don't know either. The cds file doesn't seem to have exon information. I've got to get back to my own work now... Good luck.
|
![]() |
![]() |
![]() |
#16 |
Junior Member
Location: japan Join Date: May 2011
Posts: 6
|
![]()
Has anyone know the script that really convert gff3 to gtf2.2? I have tried so far and none of them gave the corrected format?
Any suggestion will be great. Best, MS |
![]() |
![]() |
![]() |
#17 |
Junior Member
Location: Boston, MA Join Date: Jan 2011
Posts: 8
|
![]()
Check out the gffread utility as a part of cufflinks programs. One of the options allows you to read from GFF3 and convert to GTF. See the info at http://cufflinks.cbcb.umd.edu/gff.html
-Bob |
![]() |
![]() |
![]() |
Thread Tools | |
|
|