Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • KeithD
    Junior Member
    • Oct 2010
    • 3

    Looking for Tophat GFF file (mm9)

    Hello,

    Does anyone know where I can download a GTF file that will work using Tophat and their provided mm9 build? I downloaded the version from ftp://ftp.ensembl.org/pub/current/gtf/mus_musculus/ and keep getting the following error:

    [Thu Oct 28 12:08:01 2010] Reading known junctions from GFF file
    Warning: TopHat did not find any junctions in GFF file

    I have even tried reformatting the file by adding "chr" in front of everything in the first column of each line (this changes the notation of X of 18 to chrX or chr18). At this point I would prefer downloading a GTF build that works with Tophat v1.1.1 but I can also try to modify the file I have now if someone knows what needs to be changed

    A sample of one line of the GTF file:
    18 protein_coding CDS 30483176 30483260 . + 0 gene_id "ENSMUSG00000033628"; transcript_id "ENSMUST00000115811"; exon_number "20"; gene_name "Pik3c3"; transcript_name "Pik3c3-004"; protein_id "ENSMUSP00000111478";

    -Keith
  • RockChalkJayhawk
    Senior Member
    • Mar 2009
    • 192

    #2
    Originally posted by KeithD View Post
    Hello,

    Does anyone know where I can download a GTF file that will work using Tophat and their provided mm9 build? I downloaded the version from ftp://ftp.ensembl.org/pub/current/gtf/mus_musculus/ and keep getting the following error:

    [Thu Oct 28 12:08:01 2010] Reading known junctions from GFF file
    Warning: TopHat did not find any junctions in GFF file

    I have even tried reformatting the file by adding "chr" in front of everything in the first column of each line (this changes the notation of X of 18 to chrX or chr18). At this point I would prefer downloading a GTF build that works with Tophat v1.1.1 but I can also try to modify the file I have now if someone knows what needs to be changed

    A sample of one line of the GTF file:
    18 protein_coding CDS 30483176 30483260 . + 0 gene_id "ENSMUSG00000033628"; transcript_id "ENSMUST00000115811"; exon_number "20"; gene_name "Pik3c3"; transcript_name "Pik3c3-004"; protein_id "ENSMUSP00000111478";

    -Keith
    1) Go to UCSC table browser


    2) Select mouse genome assmbly mm9

    3) Select Genes and Gene Prediction Tracks in the Group section

    4) Select the Ensemble Genes track

    5) Under output format select GTF

    6) Give the output file a name

    7) Get output

    Comment

    • KeithD
      Junior Member
      • Oct 2010
      • 3

      #3
      Originally posted by RockChalkJayhawk View Post
      1) Go to UCSC table browser


      2) Select mouse genome assmbly mm9

      3) Select Genes and Gene Prediction Tracks in the Group section

      4) Select the Ensemble Genes track

      5) Under output format select GTF

      6) Give the output file a name

      7) Get output
      I did this and got exactly the same error in output. The file I downloaded had this format:
      chr1 mm9_ensGene start_codon 134212807 134212809 0.000000 + . gene_id "ENSMUST00000072177"; transcript_id "ENSMUST00000072177";


      Other information that might be helpful, versions of programs I am using:
      Tophat: 1.1.1
      Bowtie: 0.12.7
      cufflinks: 0.9.1
      myrna: 1.0.9
      samtools: 0.1.8

      Comment

      • RockChalkJayhawk
        Senior Member
        • Mar 2009
        • 192

        #4
        Originally posted by KeithD View Post
        I did this and got exactly the same error in output. The file I downloaded had this format:
        chr1 mm9_ensGene start_codon 134212807 134212809 0.000000 + . gene_id "ENSMUST00000072177"; transcript_id "ENSMUST00000072177";


        Other information that might be helpful, versions of programs I am using:
        Tophat: 1.1.1
        Bowtie: 0.12.7
        cufflinks: 0.9.1
        myrna: 1.0.9
        samtools: 0.1.8
        Can you post 10 lines of the GTF and the command you are putting into TopHat?

        I just followed those instructions and it worked fine.

        Comment

        • KeithD
          Junior Member
          • Oct 2010
          • 3

          #5
          Originally posted by RockChalkJayhawk View Post
          Can you post 10 lines of the GTF and the command you are putting into TopHat?

          I just followed those instructions and it worked fine.
          The tophat command I used was:

          tophat -p 4 -o DMSO_tophat_test -G /home/lab/Downloads/ENSEMBLE.genes.gtf --no-novel-juncs /home/lab/Tools/bowtie-0.12.7/indexes/mm9 /home/lab/Data/DMSO_Run/s_6_sequence.fq

          and the first 10 lines of the GTF file are:

          chr1 mm9_ensGene start_codon 134212807 134212809 0.000000 + . gene_id "ENSMUST00000072177"; transcript_id "ENSMUST00000072177";
          chr1 mm9_ensGene CDS 134212807 134213049 0.000000 + 0 gene_id "ENSMUST00000072177"; transcript_id "ENSMUST00000072177";
          chr1 mm9_ensGene exon 134212703 134213049 0.000000 + . gene_id "ENSMUST00000072177"; transcript_id "ENSMUST00000072177";
          chr1 mm9_ensGene CDS 134221530 134221650 0.000000 + 0 gene_id "ENSMUST00000072177"; transcript_id "ENSMUST00000072177";
          chr1 mm9_ensGene exon 134221530 134221650 0.000000 + . gene_id "ENSMUST00000072177"; transcript_id "ENSMUST00000072177";
          chr1 mm9_ensGene CDS 134222783 134222806 0.000000 + 2 gene_id "ENSMUST00000072177"; transcript_id "ENSMUST00000072177";
          chr1 mm9_ensGene exon 134222783 134222806 0.000000 + . gene_id "ENSMUST00000072177"; transcript_id "ENSMUST00000072177";
          chr1 mm9_ensGene CDS 134224274 134224425 0.000000 + 2 gene_id "ENSMUST00000072177"; transcript_id "ENSMUST00000072177";
          chr1 mm9_ensGene exon 134224274 134224425 0.000000 + . gene_id "ENSMUST00000072177"; transcript_id "ENSMUST00000072177";
          chr1 mm9_ensGene CDS 134224708 134224773 0.000000 + 0 gene_id "ENSMUST00000072177"; transcript_id "ENSMUST00000072177";

          Comment

          • RockChalkJayhawk
            Senior Member
            • Mar 2009
            • 192

            #6
            Keith,

            I was able to reproduce your error using the GTF lines that you supplied.
            However, using my human data, the process I described above works just fine. Something else you can try is to use this GTF instead.

            Try it and post back your results.

            Comment

            • RockChalkJayhawk
              Senior Member
              • Mar 2009
              • 192

              #7
              Originally posted by RockChalkJayhawk View Post
              Keith,

              I was able to reproduce your error using the GTF lines that you supplied.
              However, using my human data, the process I described above works just fine. Something else you can try is to use this GTF instead.

              Try it and post back your results.
              Also, you will need to run this command to make it match your bowtie index:
              Code:
              awk '{print "chr"$0}' Homo_sapiens.GRCh37.59.gtf > ENSEMBLE.gtf

              Comment

              • nkwuji
                Member
                • Mar 2010
                • 19

                #8
                Originally posted by RockChalkJayhawk View Post
                1) Go to UCSC table browser


                2) Select mouse genome assmbly mm9

                3) Select Genes and Gene Prediction Tracks in the Group section

                4) Select the Ensemble Genes track

                5) Under output format select GTF

                6) Give the output file a name

                7) Get output
                I fount the GTF file built by UCSC genome browser tends to have errors for stop_codon coordinates, then refused by cufflinks. This error was caused by some spliced stop_codons. I have tg write my own script to transform the UCSC data table to GTF file.

                Comment

                • GKM
                  Member
                  • May 2009
                  • 45

                  #9
                  You are probably better off just inputing it a junctions file of the simple chr / left / right / strand variety. Those always work, and it is relatively trivial to generate them from any annotation format. I have had gtf files rejected too so I have switched to that format completely for all genomes I work with when mapping with TopHat

                  Comment

                  • nkwuji
                    Member
                    • Mar 2010
                    • 19

                    #10
                    Originally posted by GKM View Post
                    You are probably better off just inputing it a junctions file of the simple chr / left / right / strand variety. Those always work, and it is relatively trivial to generate them from any annotation format. I have had gtf files rejected too so I have switched to that format completely for all genomes I work with when mapping with TopHat
                    It is a good practice to use juncs file instead. But for the last step in cufflinks--cuffdiff also requires a good GTF to calculate the geneexp.diff. So it is hard to get around the bad GTF.

                    Comment

                    • Pawan Noel
                      Junior Member
                      • Nov 2010
                      • 4

                      #11
                      Does anyone have a quick list of the most used SAMtool command lines

                      I'm really new to using UNIX:Linux and I would greatly appreciate if someone could share a pdf/doc for the SAMtool commands

                      Thank you very much and hv a nice day

                      Pawan

                      Comment

                      • mbom777
                        Junior Member
                        • Oct 2010
                        • 4

                        #12
                        I had the same error message as the original post. In the logs subdirectory I found a file called "gtf_juncs.log" with the contents:

                        Code:
                        gtf_juncs v1.1.4 (1709)
                        ---------------------------
                        Error: duplicate GFF ID 'ENSMUST00000127664' (or exons too far apart)!
                        Removing the corresponding line in the Ensembl GTF file fixed it.

                        Comment

                        • edge
                          Senior Member
                          • Sep 2009
                          • 199

                          #13
                          hi KeithD,

                          Do you figure out the error message about "Warning: TopHat did not find any junctions in GTF file"?
                          I'm facing the same error message as well
                          Thanks for advice and sharing

                          Comment

                          • edge
                            Senior Member
                            • Sep 2009
                            • 199

                            #14
                            Hi nkwuji,
                            mind to share the script that you written to transform the UCSC data table to GTF file?
                            I'm facing the same error message in Cufflink as well
                            Thanks in advance.

                            Comment

                            • edge
                              Senior Member
                              • Sep 2009
                              • 199

                              #15
                              Hi nkwuji,
                              Is it we need to prepare the junction file based on the annotate gtf file from Ensembl or UCSC?
                              Thanks.

                              Comment

                              Latest Articles

                              Collapse

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, Yesterday, 10:09 AM
                              0 responses
                              10 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-04-2026, 08:59 AM
                              0 responses
                              20 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-02-2026, 12:03 PM
                              0 responses
                              27 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-02-2026, 11:40 AM
                              0 responses
                              21 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...