Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Looking for Tophat GFF file (mm9)

    Hello,

    Does anyone know where I can download a GTF file that will work using Tophat and their provided mm9 build? I downloaded the version from ftp://ftp.ensembl.org/pub/current/gtf/mus_musculus/ and keep getting the following error:

    [Thu Oct 28 12:08:01 2010] Reading known junctions from GFF file
    Warning: TopHat did not find any junctions in GFF file

    I have even tried reformatting the file by adding "chr" in front of everything in the first column of each line (this changes the notation of X of 18 to chrX or chr18). At this point I would prefer downloading a GTF build that works with Tophat v1.1.1 but I can also try to modify the file I have now if someone knows what needs to be changed

    A sample of one line of the GTF file:
    18 protein_coding CDS 30483176 30483260 . + 0 gene_id "ENSMUSG00000033628"; transcript_id "ENSMUST00000115811"; exon_number "20"; gene_name "Pik3c3"; transcript_name "Pik3c3-004"; protein_id "ENSMUSP00000111478";

    -Keith

  • #2
    Originally posted by KeithD View Post
    Hello,

    Does anyone know where I can download a GTF file that will work using Tophat and their provided mm9 build? I downloaded the version from ftp://ftp.ensembl.org/pub/current/gtf/mus_musculus/ and keep getting the following error:

    [Thu Oct 28 12:08:01 2010] Reading known junctions from GFF file
    Warning: TopHat did not find any junctions in GFF file

    I have even tried reformatting the file by adding "chr" in front of everything in the first column of each line (this changes the notation of X of 18 to chrX or chr18). At this point I would prefer downloading a GTF build that works with Tophat v1.1.1 but I can also try to modify the file I have now if someone knows what needs to be changed

    A sample of one line of the GTF file:
    18 protein_coding CDS 30483176 30483260 . + 0 gene_id "ENSMUSG00000033628"; transcript_id "ENSMUST00000115811"; exon_number "20"; gene_name "Pik3c3"; transcript_name "Pik3c3-004"; protein_id "ENSMUSP00000111478";

    -Keith
    1) Go to UCSC table browser


    2) Select mouse genome assmbly mm9

    3) Select Genes and Gene Prediction Tracks in the Group section

    4) Select the Ensemble Genes track

    5) Under output format select GTF

    6) Give the output file a name

    7) Get output

    Comment


    • #3
      Originally posted by RockChalkJayhawk View Post
      1) Go to UCSC table browser


      2) Select mouse genome assmbly mm9

      3) Select Genes and Gene Prediction Tracks in the Group section

      4) Select the Ensemble Genes track

      5) Under output format select GTF

      6) Give the output file a name

      7) Get output
      I did this and got exactly the same error in output. The file I downloaded had this format:
      chr1 mm9_ensGene start_codon 134212807 134212809 0.000000 + . gene_id "ENSMUST00000072177"; transcript_id "ENSMUST00000072177";


      Other information that might be helpful, versions of programs I am using:
      Tophat: 1.1.1
      Bowtie: 0.12.7
      cufflinks: 0.9.1
      myrna: 1.0.9
      samtools: 0.1.8

      Comment


      • #4
        Originally posted by KeithD View Post
        I did this and got exactly the same error in output. The file I downloaded had this format:
        chr1 mm9_ensGene start_codon 134212807 134212809 0.000000 + . gene_id "ENSMUST00000072177"; transcript_id "ENSMUST00000072177";


        Other information that might be helpful, versions of programs I am using:
        Tophat: 1.1.1
        Bowtie: 0.12.7
        cufflinks: 0.9.1
        myrna: 1.0.9
        samtools: 0.1.8
        Can you post 10 lines of the GTF and the command you are putting into TopHat?

        I just followed those instructions and it worked fine.

        Comment


        • #5
          Originally posted by RockChalkJayhawk View Post
          Can you post 10 lines of the GTF and the command you are putting into TopHat?

          I just followed those instructions and it worked fine.
          The tophat command I used was:

          tophat -p 4 -o DMSO_tophat_test -G /home/lab/Downloads/ENSEMBLE.genes.gtf --no-novel-juncs /home/lab/Tools/bowtie-0.12.7/indexes/mm9 /home/lab/Data/DMSO_Run/s_6_sequence.fq

          and the first 10 lines of the GTF file are:

          chr1 mm9_ensGene start_codon 134212807 134212809 0.000000 + . gene_id "ENSMUST00000072177"; transcript_id "ENSMUST00000072177";
          chr1 mm9_ensGene CDS 134212807 134213049 0.000000 + 0 gene_id "ENSMUST00000072177"; transcript_id "ENSMUST00000072177";
          chr1 mm9_ensGene exon 134212703 134213049 0.000000 + . gene_id "ENSMUST00000072177"; transcript_id "ENSMUST00000072177";
          chr1 mm9_ensGene CDS 134221530 134221650 0.000000 + 0 gene_id "ENSMUST00000072177"; transcript_id "ENSMUST00000072177";
          chr1 mm9_ensGene exon 134221530 134221650 0.000000 + . gene_id "ENSMUST00000072177"; transcript_id "ENSMUST00000072177";
          chr1 mm9_ensGene CDS 134222783 134222806 0.000000 + 2 gene_id "ENSMUST00000072177"; transcript_id "ENSMUST00000072177";
          chr1 mm9_ensGene exon 134222783 134222806 0.000000 + . gene_id "ENSMUST00000072177"; transcript_id "ENSMUST00000072177";
          chr1 mm9_ensGene CDS 134224274 134224425 0.000000 + 2 gene_id "ENSMUST00000072177"; transcript_id "ENSMUST00000072177";
          chr1 mm9_ensGene exon 134224274 134224425 0.000000 + . gene_id "ENSMUST00000072177"; transcript_id "ENSMUST00000072177";
          chr1 mm9_ensGene CDS 134224708 134224773 0.000000 + 0 gene_id "ENSMUST00000072177"; transcript_id "ENSMUST00000072177";

          Comment


          • #6
            Keith,

            I was able to reproduce your error using the GTF lines that you supplied.
            However, using my human data, the process I described above works just fine. Something else you can try is to use this GTF instead.

            Try it and post back your results.

            Comment


            • #7
              Originally posted by RockChalkJayhawk View Post
              Keith,

              I was able to reproduce your error using the GTF lines that you supplied.
              However, using my human data, the process I described above works just fine. Something else you can try is to use this GTF instead.

              Try it and post back your results.
              Also, you will need to run this command to make it match your bowtie index:
              Code:
              awk '{print "chr"$0}' Homo_sapiens.GRCh37.59.gtf > ENSEMBLE.gtf

              Comment


              • #8
                Originally posted by RockChalkJayhawk View Post
                1) Go to UCSC table browser


                2) Select mouse genome assmbly mm9

                3) Select Genes and Gene Prediction Tracks in the Group section

                4) Select the Ensemble Genes track

                5) Under output format select GTF

                6) Give the output file a name

                7) Get output
                I fount the GTF file built by UCSC genome browser tends to have errors for stop_codon coordinates, then refused by cufflinks. This error was caused by some spliced stop_codons. I have tg write my own script to transform the UCSC data table to GTF file.

                Comment


                • #9
                  You are probably better off just inputing it a junctions file of the simple chr / left / right / strand variety. Those always work, and it is relatively trivial to generate them from any annotation format. I have had gtf files rejected too so I have switched to that format completely for all genomes I work with when mapping with TopHat

                  Comment


                  • #10
                    Originally posted by GKM View Post
                    You are probably better off just inputing it a junctions file of the simple chr / left / right / strand variety. Those always work, and it is relatively trivial to generate them from any annotation format. I have had gtf files rejected too so I have switched to that format completely for all genomes I work with when mapping with TopHat
                    It is a good practice to use juncs file instead. But for the last step in cufflinks--cuffdiff also requires a good GTF to calculate the geneexp.diff. So it is hard to get around the bad GTF.

                    Comment


                    • #11
                      Does anyone have a quick list of the most used SAMtool command lines

                      I'm really new to using UNIX:Linux and I would greatly appreciate if someone could share a pdf/doc for the SAMtool commands

                      Thank you very much and hv a nice day

                      Pawan

                      Comment


                      • #12
                        I had the same error message as the original post. In the logs subdirectory I found a file called "gtf_juncs.log" with the contents:

                        Code:
                        gtf_juncs v1.1.4 (1709)
                        ---------------------------
                        Error: duplicate GFF ID 'ENSMUST00000127664' (or exons too far apart)!
                        Removing the corresponding line in the Ensembl GTF file fixed it.

                        Comment


                        • #13
                          hi KeithD,

                          Do you figure out the error message about "Warning: TopHat did not find any junctions in GTF file"?
                          I'm facing the same error message as well
                          Thanks for advice and sharing

                          Comment


                          • #14
                            Hi nkwuji,
                            mind to share the script that you written to transform the UCSC data table to GTF file?
                            I'm facing the same error message in Cufflink as well
                            Thanks in advance.

                            Comment


                            • #15
                              Hi nkwuji,
                              Is it we need to prepare the junction file based on the annotate gtf file from Ensembl or UCSC?
                              Thanks.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 06:37 PM
                              0 responses
                              8 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, Yesterday, 06:07 PM
                              0 responses
                              8 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              49 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              66 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X