Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Tophat - Error: gtf_to_fasta returned an error.

    Hi all,
    i am trying to use tophat with annotation file.
    i am working on zv9 annotations from UCSC.
    i fixed the original gtf file to match the first column in it to reference sequence in the bowtie index.
    for example:
    GTF - (2 first lines)
    #chrom strand txStart txEnd cdsStart cdsEnd exonCount exonStarts exonEnds score name2 cdsStartStat cdsEndStat exonFrames
    chr1 + 50321633 50410568 50322024 50393582 11 50321633,50323684,50327722,50376641,50384688,50384995,50387281,50388021,50392530,50393547,50409289, 50322231,50323751,50327850,50376774,50384782,50385109,50387444,50388129,50392579,50393588,50410568, 0 lef1 cmplcmpl 0,0,1,0,1,2,2,0,0,1,-1,

    REFERENCE - (2 first lines)
    >chr1
    TTCTTCTGGGGAAAGTCTGATTTGATTTATTTCCCTTTTAAGATCAATATTATTAGCCCC

    when i execute tophat without the GTF it all run well.
    now i am having this error:
    Error: gtf_to_fasta returned an error.

    My command:
    nohup ./tophat -r 430 -p 10 -z 0 -G ../annotation /mnt/FILE/index/zvgenome ../ex1/R1_001.fastq ../ex1/R2_001.fastq &

    Does anyone familiar with this?

    Best,
    Pap

  • #2
    this is all the output

    [Thu Feb 9 06:59:47 2012] Beginning TopHat run (v1.4.0)
    -----------------------------------------------
    [Thu Feb 9 06:59:47 2012] Preparing output location ./tophat_out/
    [Thu Feb 9 06:59:47 2012] Checking for Bowtie index files
    [Thu Feb 9 06:59:47 2012] Checking for reference FASTA file
    [Thu Feb 9 06:59:47 2012] Checking for Bowtie
    Bowtie version: 0.12.7.0
    [Thu Feb 9 06:59:47 2012] Checking for Samtools
    Samtools Version: 0.1.18
    [Thu Feb 9 06:59:47 2012] Generating SAM header for /mnt/FILE/index/zvgenome
    format: fastq
    quality scale: phred33 (default)
    [Thu Feb 9 06:59:51 2012] Reading known junctions from GTF file
    Warning: TopHat did not find any junctions in GTF file
    [Thu Feb 9 06:59:51 2012] Preparing reads
    left reads: min. length=101, count=21379580
    right reads: min. length=101, count=21310206
    [Thu Feb 9 07:08:54 2012] Creating transcriptome data files..
    [FAILED]
    Error: gtf_to_fasta returned an error.

    Comment


    • #3
      I get the same error and I working with zv9(zebra Fish genome) . Can anyone please help me with this?

      Comment


      • #4
        I don't think your GTF file is in the right format.

        According to UCSC, GTF file contains 9 column:
        <seqname> <source> <feature> <start> <end> <score> <strand> <frame> [attributes]



        Originally posted by papori View Post
        Hi all,
        i am trying to use tophat with annotation file.
        i am working on zv9 annotations from UCSC.
        i fixed the original gtf file to match the first column in it to reference sequence in the bowtie index.
        for example:
        GTF - (2 first lines)
        #chrom strand txStart txEnd cdsStart cdsEnd exonCount exonStarts exonEnds score name2 cdsStartStat cdsEndStat exonFrames
        chr1 + 50321633 50410568 50322024 50393582 11 50321633,50323684,50327722,50376641,50384688,50384995,50387281,50388021,50392530,50393547,50409289, 50322231,50323751,50327850,50376774,50384782,50385109,50387444,50388129,50392579,50393588,50410568, 0 lef1 cmplcmpl 0,0,1,0,1,2,2,0,0,1,-1,

        REFERENCE - (2 first lines)
        >chr1
        TTCTTCTGGGGAAAGTCTGATTTGATTTATTTCCCTTTTAAGATCAATATTATTAGCCCC

        when i execute tophat without the GTF it all run well.
        now i am having this error:
        Error: gtf_to_fasta returned an error.

        My command:
        nohup ./tophat -r 430 -p 10 -z 0 -G ../annotation /mnt/FILE/index/zvgenome ../ex1/R1_001.fastq ../ex1/R2_001.fastq &

        Does anyone familiar with this?

        Best,
        Pap

        Comment


        • #5
          Genome file of Entamoeba in GTF format

          Hi,
          I am working with entamoeba histolytica data. I need entamoeba histolytica reference genome data in GTF format. I got the file in genebank format but unable to find out in GTF format. If any one can provide me the appropriate link, I would be very grateful.

          Comment


          • #6
            Hi,

            I had the same problem, but think I have solved it now. I believe the error occurs because the fasta file name is different from the index files and/or gtf file. So if your index and gtf base is Danio_rerio. then your fasta file should be Danio_rerio.fa.

            Comment


            • #7
              hi, i also have that problem. here the chromosome name is same between index and gtf. the file name of index, fa, gtf is hg18_ref. anyone can help me?

              Comment


              • #8
                Tophat problem gtf to fasta

                Many have faced the same problem. Actually I just overcame the problem. Follow the steps and see if you can too.
                1. 1.Go on the following link and select the genome you want to download. In my case I downloaded the mm10 mouse genome UCSC. (http://cufflinks.cbcb.umd.edu/igenomes.html)
                2. 2. Unzip the file. You will see mm10/Annotation mm10/Sequence. These folders inside them have all the files required for the tophat run. Just make sure the paths while running the tophat command are directed to them.
                3. 3.Here is the code I used:
                  tophat -p 8 --keep-fasta-order --no-coverage-search --library-type fr-firststrand -G Mus_musculus/UCSC/mm10/Annotation/Archives/archive-2014-05-23-16-05-10/Genes/genes.gtf --transcriptome-index Mus_musculus/UCSC/mm10/Annotation/Genes/transcriptome_index_bt2/genes -g 10 --output-dir shP1_4hr_n1 Mus_musculus/UCSC/mm10/Sequence/Bowtie2Index/genome *.fastq.gz


                In the above case the archive has the UCSC genes.gtf file which already has the chr annotation to it and the gene names. Make sure you don't rename those files. Also then the output file to the transcriptome index has to be something like Mus_musculus/UCSC/mm10/Annotation/Genes/transcriptome_index_bt2/genes , I don't know somehow that worked. Then the index files are in the Sequence/Bowtie2Index folder, you can also use the bowtie1 Index file. Last is the input.

                Hope this helps. If it doesn't let me know and I can help you further.

                Tulip.

                Comment


                • #9
                  Many have faced the same problem. Actually I just overcame the problem. Follow the steps and see if you can too.
                  1. 1.Go on the following link and select the genome you want to download. In my case I downloaded the mm10 mouse genome UCSC. (http://cufflinks.cbcb.umd.edu/igenomes.html)
                  2. 2. Unzip the file. You will see mm10/Annotation mm10/Sequence. These folders inside them have all the files required for the tophat run. Just make sure the paths while running the tophat command are directed to them.
                  3. 3.Here is the code I used:
                    tophat -p 8 --keep-fasta-order --no-coverage-search --library-type fr-firststrand -G Mus_musculus/UCSC/mm10/Annotation/Archives/archive-2014-05-23-16-05-10/Genes/genes.gtf --transcriptome-index Mus_musculus/UCSC/mm10/Annotation/Genes/transcriptome_index_bt2/genes -g 10 --output-dir shP1_4hr_n1 Mus_musculus/UCSC/mm10/Sequence/Bowtie2Index/genome *.fastq.gz


                  In the above case the archive has the UCSC genes.gtf file which already has the chr annotation to it and the gene names. Make sure you don't rename those files. Also then the output file to the transcriptome index has to be something like Mus_musculus/UCSC/mm10/Annotation/Genes/transcriptome_index_bt2/genes , I don't know somehow that worked. Then the index files are in the Sequence/Bowtie2Index folder, you can also use the bowtie1 Index file. Last is the input.

                  Hope this helps. If it doesn't let me know and I can help you further.

                  Tulip.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Current Approaches to Protein Sequencing
                    by seqadmin


                    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                    04-04-2024, 04:25 PM
                  • seqadmin
                    Strategies for Sequencing Challenging Samples
                    by seqadmin


                    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                    03-22-2024, 06:39 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 04-11-2024, 12:08 PM
                  0 responses
                  25 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 10:19 PM
                  0 responses
                  28 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 09:21 AM
                  0 responses
                  24 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-04-2024, 09:00 AM
                  0 responses
                  52 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X