Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • SOLiD_User
    Junior Member
    • Aug 2009
    • 7

    gtf file for arabidopsis

    Does anyone know where I can find a gtf file for arabidopsis or a program that I can use to create one?

    Thanks!
  • steven
    Senior Member
    • Aug 2009
    • 269

    #2
    a gtf file of what?
    for gene annotation, maybe the TAIR or the TIGR

    Comment

    • SOLiD_User
      Junior Member
      • Aug 2009
      • 7

      #3
      Yes, a gtf for gene annotation. I checked TAIR, TIGR, EMBL, etc and so far have been unable to locate a gtf file. I can only find gff files for arabidopsis. It seems EMBL has gtf files for everything except plants. I looked at gbrowse and the UCSC genome browser and I don't see a way to export as a gtf file. I have spent much time with google and I haven't found anything useful.

      Comment

      • steven
        Senior Member
        • Aug 2009
        • 269

        #4
        Why isn't gff ok? Are you looking for a specific field like "transcript_id:" or so?

        Comment

        • SOLiD_User
          Junior Member
          • Aug 2009
          • 7

          #5
          I'm trying to get the AB whole transcriptome pipeline working and it requires the transcript_id and gene_id fields in the gtf file. I tried the gff and it didn't work. I don't have the programming skills to create a perl script so I was hoping I could download a gtf file or find an application that could create one.

          Comment

          • steven
            Senior Member
            • Aug 2009
            • 269

            #6
            could you show me a few lines of the gff file(s) you have, just in case the information is available and easy to convert?

            Comment

            • SOLiD_User
              Junior Member
              • Aug 2009
              • 7

              #7
              The gff files look like this. I think the major challenge in converting gff to gtf is counting the exons for each transcript.
              --
              Chr1 TAIR9 CDS 3760 3913 . + 0 Parent=AT1G01010.1,AT1G01010.1-Protein;
              Chr1 TAIR9 exon 3996 4276 . + . Parent=AT1G01010.1
              Chr1 TAIR9 CDS 3996 4276 . + 2 Parent=AT1G01010.1,AT1G01010.1-Protein;
              Chr1 TAIR9 exon 4486 4605 . + . Parent=AT1G01010.1
              Chr1 TAIR9 CDS 4486 4605 . + 0 Parent=AT1G01010.1,AT1G01010.1-Protein;
              Chr1 TAIR9 exon 4706 5095 . + . Parent=AT1G01010.1
              Chr1 TAIR9 CDS 4706 5095 . + 0 Parent=AT1G01010.1,AT1G01010.1-Protein;
              Chr1 TAIR9 exon 5174 5326 . + . Parent=AT1G01010.1
              Chr1 TAIR9 CDS 5174 5326 . + 0 Parent=AT1G01010.1,AT1G01010.1-Protein;
              Chr1 TAIR9 exon 5439 5899 . + . Parent=AT1G01010.1
              Chr1 TAIR9 CDS 5439 5630 . + 0 Parent=AT1G01010.1,AT1G01010.1-Protein;

              the gtf file has to be in this format

              supercont1.1 protein_coding CDS 2191663 2191958 . - 1 gene_id "AAEL000037"; transcript_id "AAEL000037-RA"; exon_number "4"; protein_id "AAEL000037-PA";
              supercont1.1 protein_coding exon 2191201 2191600 . - . gene_id "AAEL000037"; transcript_id "AAEL000037-RA"; exon_number "5";
              supercont1.1 protein_coding CDS 2191299 2191600 . - 2 gene_id "AAEL000037"; transcript_id "AAEL000037-RA"; exon_number "5"; protein_id "AAEL000037-PA";
              supercont1.1 protein_coding stop_codon 2191296 2191298 . - 0 gene_id "AAEL000037"; transcript_id "AAEL000037-RA"; exon_number "5";
              supercont1.1 protein_coding exon 2207362 2207580 . - . gene_id "AAEL000086"; transcript_id "AAEL000086-RA"; exon_number "1";
              supercont1.1 protein_coding CDS 2207362 2207580 . - 0 gene_id "AAEL000086"; transcript_id "AAEL000086-RA"; exon_number "1"; protein_id "AAEL000086-PA";
              supercont1.1 protein_coding start_codon 2207578 2207580 . - 0 gene_id "AAEL000086"; transcript_id "AAEL000086-RA"; exon_number "1";
              supercont1.1 protein_coding exon 2207263 2207299 . - . gene_id "AAEL000086"; transcript_id "AAEL000086-RA"; exon_number "2";

              Comment

              • andreas.sjodin
                Member
                • Apr 2009
                • 27

                #8
                I would recommend you to take a look at the Python GFF parsers developed by Brad Chapman. It can be downloaded form GitHub (http://github.com/chapmanb/bcbb/tree/master/gff/). Those script can convert between different types of GFF versions. More information about his script is found at in some blog posts (http://bcbio.wordpress.com)

                Comment

                • SOLiD_User
                  Junior Member
                  • Aug 2009
                  • 7

                  #9
                  Thanks Andreas. I'll have a look at the GFF parser.

                  Comment

                  • knc
                    Junior Member
                    • Apr 2008
                    • 3

                    #10
                    gtf for arabidopsis

                    Hi,

                    Did you solve your gtf problem? You can use the gff for the first 8 fields, the last field needs to be changed to include the gene_id, transcript_id, and exon #.

                    Comment

                    • dsidote
                      Member
                      • Aug 2009
                      • 23

                      #11
                      We were able to get what we needed. Thanks!

                      Comment

                      • knc
                        Junior Member
                        • Apr 2008
                        • 3

                        #12
                        Quick question:
                        How did you deal with transcripts that have different stop codons?

                        Comment

                        • dsidote
                          Member
                          • Aug 2009
                          • 23

                          #13
                          I'm not sure I understand what your asking. Are you referring to splice variants? If so, we wrote a perl script reads the file line by line and counted exons for each transcript. I can send it to you if it would help. In it's current form it only works for the gff file from TAIR.

                          Comment

                          • knc
                            Junior Member
                            • Apr 2008
                            • 3

                            #14
                            Sure that would be great. We are having issues with our gtf file... The file format you refer to seems a bit different from the description on the cufflinks site (http://mblab.wustl.edu/GTF22.html). Did you validate your gtf file? We get errors when we do. Thanks for your help!

                            Comment

                            • saha
                              Junior Member
                              • Jan 2010
                              • 5

                              #15
                              dear all,

                              i am facing similar problem. i am very much in need of tigr rice genome v6.0 but not able to get it yet. i want to utilize this gtf file as refgene list to upload on broad institute's IGV browser.
                              Any help is appreciable.

                              regards,
                              Saha

                              Comment

                              Latest Articles

                              Collapse

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, 06-09-2026, 11:58 AM
                              0 responses
                              17 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-05-2026, 10:09 AM
                              0 responses
                              27 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-04-2026, 08:59 AM
                              0 responses
                              38 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-02-2026, 12:03 PM
                              0 responses
                              61 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...