Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • converting GFF to GTF

    I want to convert a GFF file from Flybase into a GTF file and I'm having a lot of trouble figuring out how to do this. Here is a description of the difference between GTF and GFF:



    Does anyone know of an easy way to convert GFF to GTF? It looks like there is a perl script to do this, but I was unable to install a module required for that script and also the script looks a bit complex and I saw comments claiming that it is not dependable. Does anyone know of an easy way to convert GFF to GTF?

  • #2
    I sent you private email but I suggest getting GTFs from Ensembl. ftp://ftp.ensembl.org/pub/current_gtf

    Or if you are doing Illumina work then iGenomes is a good resource.

    The GFF to GTF conversion is a pain.

    Comment


    • #3
      Originally posted by westerman View Post
      I sent you private email but I suggest getting GTFs from Ensembl. ftp://ftp.ensembl.org/pub/current_gtf

      Or if you are doing Illumina work then iGenomes is a good resource.

      The GFF to GTF conversion is a pain.
      Thanks very much. I see that they have a Drosophila GTF file on Ensemble. It's 5.25 rather than 5.41, but I assume that as long as it's 5 point something that it will be compatible with my other files.

      Thanks again for the help.

      Eric

      Comment


      • #4
        Use the below perl script for coverting GFF to GTF

        ##!/usr/bin/perl

        use strict;
        use warnings;
        use Data:umper;

        use File::Basename;
        use Bio::FeatureIO;

        my $inFile = shift;
        my ($name, $path, $suffix) = fileparse($inFile, qr/\.gff/);
        my $outFile = $path . $name . ".gtf";

        my $inGFF = Bio::FeatureIO->new( '-file' => "$inFile",
        '-format' => 'GFF',
        '-version' => 3 );
        my $outGTF = Bio::FeatureIO->new( '-file' => ">$outFile",
        '-format' => 'GFF',
        '-version' => 2.5);

        while (my $feature = $inGFF->next_feature() ) {

        $outGTF->write_feature($feature);

        }

        Comment


        • #5
          Originally posted by upendra_35 View Post
          Use the below perl script for coverting GFF to GTF

          Code:
          ##!/usr/bin/perl
           
          use strict;
          use warnings;
          use Data::Dumper;
           
          use File::Basename;
          use Bio::FeatureIO;
           
          my $inFile = shift;
          my ($name, $path, $suffix) = fileparse($inFile, qr/\.gff/);
          my $outFile = $path . $name . ".gtf";
           
          my $inGFF = Bio::FeatureIO->new( '-file' => "$inFile",
           '-format' => 'GFF',
           '-version' => 3 );
          my $outGTF = Bio::FeatureIO->new( '-file' => ">$outFile",
           '-format' => 'GFF',
           '-version' => 2.5);
           
          while (my $feature = $inGFF->next_feature() ) {
           
          $outGTF->write_feature($feature);
           
          }
          While it looks good on paper, in my experience this has rarely worked. Almost alway due to some non standard formatting of the attributes column (column #9) of the input GFF. But, as always, YMMV.

          Comment


          • #6
            Originally posted by upendra_35 View Post
            Use the below perl script for coverting GFF to GTF

            ##!/usr/bin/perl

            use strict;
            use warnings;
            use Data:umper;

            use File::Basename;
            use Bio::FeatureIO;

            my $inFile = shift;
            my ($name, $path, $suffix) = fileparse($inFile, qr/\.gff/);
            my $outFile = $path . $name . ".gtf";

            my $inGFF = Bio::FeatureIO->new( '-file' => "$inFile",
            '-format' => 'GFF',
            '-version' => 3 );
            my $outGTF = Bio::FeatureIO->new( '-file' => ">$outFile",
            '-format' => 'GFF',
            '-version' => 2.5);

            while (my $feature = $inGFF->next_feature() ) {

            $outGTF->write_feature($feature);

            }
            Hi Upendra,

            Thanks for the suggestion. I just tried to run this and got this error message:


            ------------- EXCEPTION -------------
            MSG: don't know what do do with directive: '##species'
            STACK Bio::FeatureIO::gff::_handle_directive /Library/Perl/5.10.0/Bio/FeatureIO/gff.pm:537
            STACK Bio::FeatureIO::gff::_initialize /Library/Perl/5.10.0/Bio/FeatureIO/gff.pm:115
            STACK Bio::FeatureIO::new /Library/Perl/5.10.0/Bio/FeatureIO.pm:277
            STACK Bio::FeatureIO::new /Library/Perl/5.10.0/Bio/FeatureIO.pm:297
            STACK toplevel /Users/efoss/sequencing/Aida/RNAseq/test_GFF_to_GTF_script_102611_3.pl:16
            -------------------------------------

            I imagine that this is related to kmcarr's comment. I find it a bit weird that it is so hard to convert from GFF to GTF, since the GTF format is only changing the 9th column of the GFF format, and it seems like (if you have all the necessary information in the GFF file) that it should be trivial to convert it into the GTF format. Is all the information necessary for the GTF format necessarily included in the GFF format?

            Eric
            Seattle

            Comment


            • #7
              I imagine that this is related to kmcarr's comment. I find it a bit weird that it is so hard to convert from GFF to GTF, since the GTF format is only changing the 9th column of the GFF format, and it seems like (if you have all the necessary information in the GFF file) that it should be trivial to convert it into the GTF format. Is all the information necessary for the GTF format necessarily included in the GFF format?
              The whole point of the GTF format was to standardise certain aspects that are left open in GFF. Hence, there are many different valid ways to encode the same information in a valid GFF format, and any parser or converter needs to be written specifically for the choices the author of the GFF file made. For example, a GTF file requires the gene ID attribute to be called "gene_id", while in GFF files, it may be "ID", "Gene", something different, or completely missing. Hence, a general GFF-to-GTF converter (as opposed to one converting only GFF files from a very specific source) needs to guess this from the data, which is non-trivial.

              Comment


              • #8
                Originally posted by Simon Anders View Post
                The whole point of the GTF format was to standardise certain aspects that are left open in GFF. Hence, there are many different valid ways to encode the same information in a valid GFF format, and any parser or converter needs to be written specifically for the choices the author of the GFF file made. For example, a GTF file requires the gene ID attribute to be called "gene_id", while in GFF files, it may be "ID", "Gene", something different, or completely missing. Hence, a general GFF-to-GTF converter (as opposed to one converting only GFF files from a very specific source) needs to guess this from the data, which is non-trivial.
                Thank you. That's an excellent explanation.

                Eric

                Comment


                • #9
                  Hello all,

                  I would like to have Genome coordinates of miRNA in .gtf format, is it possible to convert from .gff3 of miRBase ?? if yes then how ??


                  Thanks and looking forward for your kind reply
                  Last edited by unique379; 10-15-2013, 05:12 AM.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Current Approaches to Protein Sequencing
                    by seqadmin


                    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                    04-04-2024, 04:25 PM
                  • seqadmin
                    Strategies for Sequencing Challenging Samples
                    by seqadmin


                    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                    03-22-2024, 06:39 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 04-11-2024, 12:08 PM
                  0 responses
                  30 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 10:19 PM
                  0 responses
                  32 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 09:21 AM
                  0 responses
                  28 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-04-2024, 09:00 AM
                  0 responses
                  53 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X