Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • SOLiD_User
    Junior Member
    • Aug 2009
    • 7

    gtf file for arabidopsis

    Does anyone know where I can find a gtf file for arabidopsis or a program that I can use to create one?

    Thanks!
  • steven
    Senior Member
    • Aug 2009
    • 269

    #2
    a gtf file of what?
    for gene annotation, maybe the TAIR or the TIGR

    Comment

    • SOLiD_User
      Junior Member
      • Aug 2009
      • 7

      #3
      Yes, a gtf for gene annotation. I checked TAIR, TIGR, EMBL, etc and so far have been unable to locate a gtf file. I can only find gff files for arabidopsis. It seems EMBL has gtf files for everything except plants. I looked at gbrowse and the UCSC genome browser and I don't see a way to export as a gtf file. I have spent much time with google and I haven't found anything useful.

      Comment

      • steven
        Senior Member
        • Aug 2009
        • 269

        #4
        Why isn't gff ok? Are you looking for a specific field like "transcript_id:" or so?

        Comment

        • SOLiD_User
          Junior Member
          • Aug 2009
          • 7

          #5
          I'm trying to get the AB whole transcriptome pipeline working and it requires the transcript_id and gene_id fields in the gtf file. I tried the gff and it didn't work. I don't have the programming skills to create a perl script so I was hoping I could download a gtf file or find an application that could create one.

          Comment

          • steven
            Senior Member
            • Aug 2009
            • 269

            #6
            could you show me a few lines of the gff file(s) you have, just in case the information is available and easy to convert?

            Comment

            • SOLiD_User
              Junior Member
              • Aug 2009
              • 7

              #7
              The gff files look like this. I think the major challenge in converting gff to gtf is counting the exons for each transcript.
              --
              Chr1 TAIR9 CDS 3760 3913 . + 0 Parent=AT1G01010.1,AT1G01010.1-Protein;
              Chr1 TAIR9 exon 3996 4276 . + . Parent=AT1G01010.1
              Chr1 TAIR9 CDS 3996 4276 . + 2 Parent=AT1G01010.1,AT1G01010.1-Protein;
              Chr1 TAIR9 exon 4486 4605 . + . Parent=AT1G01010.1
              Chr1 TAIR9 CDS 4486 4605 . + 0 Parent=AT1G01010.1,AT1G01010.1-Protein;
              Chr1 TAIR9 exon 4706 5095 . + . Parent=AT1G01010.1
              Chr1 TAIR9 CDS 4706 5095 . + 0 Parent=AT1G01010.1,AT1G01010.1-Protein;
              Chr1 TAIR9 exon 5174 5326 . + . Parent=AT1G01010.1
              Chr1 TAIR9 CDS 5174 5326 . + 0 Parent=AT1G01010.1,AT1G01010.1-Protein;
              Chr1 TAIR9 exon 5439 5899 . + . Parent=AT1G01010.1
              Chr1 TAIR9 CDS 5439 5630 . + 0 Parent=AT1G01010.1,AT1G01010.1-Protein;

              the gtf file has to be in this format

              supercont1.1 protein_coding CDS 2191663 2191958 . - 1 gene_id "AAEL000037"; transcript_id "AAEL000037-RA"; exon_number "4"; protein_id "AAEL000037-PA";
              supercont1.1 protein_coding exon 2191201 2191600 . - . gene_id "AAEL000037"; transcript_id "AAEL000037-RA"; exon_number "5";
              supercont1.1 protein_coding CDS 2191299 2191600 . - 2 gene_id "AAEL000037"; transcript_id "AAEL000037-RA"; exon_number "5"; protein_id "AAEL000037-PA";
              supercont1.1 protein_coding stop_codon 2191296 2191298 . - 0 gene_id "AAEL000037"; transcript_id "AAEL000037-RA"; exon_number "5";
              supercont1.1 protein_coding exon 2207362 2207580 . - . gene_id "AAEL000086"; transcript_id "AAEL000086-RA"; exon_number "1";
              supercont1.1 protein_coding CDS 2207362 2207580 . - 0 gene_id "AAEL000086"; transcript_id "AAEL000086-RA"; exon_number "1"; protein_id "AAEL000086-PA";
              supercont1.1 protein_coding start_codon 2207578 2207580 . - 0 gene_id "AAEL000086"; transcript_id "AAEL000086-RA"; exon_number "1";
              supercont1.1 protein_coding exon 2207263 2207299 . - . gene_id "AAEL000086"; transcript_id "AAEL000086-RA"; exon_number "2";

              Comment

              • andreas.sjodin
                Member
                • Apr 2009
                • 27

                #8
                I would recommend you to take a look at the Python GFF parsers developed by Brad Chapman. It can be downloaded form GitHub (http://github.com/chapmanb/bcbb/tree/master/gff/). Those script can convert between different types of GFF versions. More information about his script is found at in some blog posts (http://bcbio.wordpress.com)

                Comment

                • SOLiD_User
                  Junior Member
                  • Aug 2009
                  • 7

                  #9
                  Thanks Andreas. I'll have a look at the GFF parser.

                  Comment

                  • knc
                    Junior Member
                    • Apr 2008
                    • 3

                    #10
                    gtf for arabidopsis

                    Hi,

                    Did you solve your gtf problem? You can use the gff for the first 8 fields, the last field needs to be changed to include the gene_id, transcript_id, and exon #.

                    Comment

                    • dsidote
                      Member
                      • Aug 2009
                      • 23

                      #11
                      We were able to get what we needed. Thanks!

                      Comment

                      • knc
                        Junior Member
                        • Apr 2008
                        • 3

                        #12
                        Quick question:
                        How did you deal with transcripts that have different stop codons?

                        Comment

                        • dsidote
                          Member
                          • Aug 2009
                          • 23

                          #13
                          I'm not sure I understand what your asking. Are you referring to splice variants? If so, we wrote a perl script reads the file line by line and counted exons for each transcript. I can send it to you if it would help. In it's current form it only works for the gff file from TAIR.

                          Comment

                          • knc
                            Junior Member
                            • Apr 2008
                            • 3

                            #14
                            Sure that would be great. We are having issues with our gtf file... The file format you refer to seems a bit different from the description on the cufflinks site (http://mblab.wustl.edu/GTF22.html). Did you validate your gtf file? We get errors when we do. Thanks for your help!

                            Comment

                            • saha
                              Junior Member
                              • Jan 2010
                              • 5

                              #15
                              dear all,

                              i am facing similar problem. i am very much in need of tigr rice genome v6.0 but not able to get it yet. i want to utilize this gtf file as refgene list to upload on broad institute's IGV browser.
                              Any help is appreciable.

                              regards,
                              Saha

                              Comment

                              Latest Articles

                              Collapse

                              • SEQadmin2
                                Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                                by SEQadmin2


                                I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                                Here are nine questions we think about, in roughly the order they matter, before...
                                06-18-2026, 07:11 AM
                              • SEQadmin2
                                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                by SEQadmin2


                                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                ...
                                06-02-2026, 10:05 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, 06-26-2026, 11:10 AM
                              0 responses
                              15 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-17-2026, 06:09 AM
                              0 responses
                              49 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-09-2026, 11:58 AM
                              0 responses
                              107 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-05-2026, 10:09 AM
                              0 responses
                              125 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...