Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Tophat 1.4.0 RNA seq mapping

    I am using tophat to map the RNA-seq data. But, I am getting the following error:

    [Wed Jan 18 16:02:44 2012] Beginning TopHat run (v1.4.0)
    -----------------------------------------------
    [Wed Jan 18 16:02:44 2012] Preparing output location /home/RNA/Type_EM/Sample1/tophatNew/
    [Wed Jan 18 16:02:44 2012] Checking for Bowtie index files
    [Wed Jan 18 16:02:44 2012] Checking for reference FASTA file
    [Wed Jan 18 16:02:44 2012] Checking for Bowtie
    Bowtie version: 0.12.7.0
    [Wed Jan 18 16:02:44 2012] Checking for Samtools
    Samtools Version: 0.1.18
    [Wed Jan 18 16:02:44 2012] Generating SAM header for /home/RefGenome/hg19/Homo_sapiens_assembly19_sorted
    format: fastq
    quality scale: phred33 (default)
    [Wed Jan 18 16:02:47 2012] Reading known junctions from GTF file
    [Wed Jan 18 16:03:00 2012] Preparing reads
    left reads: min. length=101, count=18736919
    right reads: min. length=101, count=18728487
    [Wed Jan 18 16:49:23 2012] Creating transcriptome data files..
    [Wed Jan 18 16:50:31 2012] Building Bowtie index from Homo_sapiens.GRCh37.65.fa
    [FAILED]
    Error: Couldn't build bowtie index with err = 1

    The reference genome I used is Homo_sapiens_assembly19_sorted.fasta.
    The transcriptome GTF file is Homo_sapiens.GRCh37.65.gtf.

    why do I need Bowtie index from Homo_sapiens.GRCh37.65.fa ? Such file doesn't exit. Or do the reference genome fasta and the transcriptome gtf have to have the same name?

    Thanks for any help.

  • #2
    I think that is a novelty of Tophat 1.4.
    You don't need to provide the file "Homo_sapiens.GRCh37.65.fa" but it is generated from Genome Index (in your case: /home/RefGenome/hg19/Homo_sapiens_assembly19_sorted) and from the GTF file (which does not need to get the same name as the Bowtie index)

    Did you check if the chromosomes are labelled the same in both files?

    Comment


    • #3
      Your GTF looks to be from ensembl, not sure of your genome maybe UCSC is my guess. This results in a problem often as UCSC uses chr1 while ensembl just 1. Best not to mix and match data sources when you can avoid it.

      Comment


      • #4
        Hi i get the same error. I have the bowtie index from fasta manually built and i have a gff3 file as well. i used bowtie-inspect --names to get the names from the index and renamed all entries in the first column of the gff3 file. still i get error during the building of the index from the fasta file. but why should be built again when is already built?
        I have the names in both files in this format:
        gi|240254421|ref|NC_003070.9| Arabidopsis thaliana chromosome 1, complete sequence

        Comment


        • #5
          Hi again,
          i think is an error in the tophat script file. it is expecting to have the fasta file for indexing in the output directory which doesnt make sense.here is the line from run.log:

          /opt/bowtie-0.12.7/bowtie-build ./tophat_out/tmp/A_thaliana_rg.fa ./tophat_out/tmp/A_thaliana_rg

          Comment


          • #6
            I am stuck here too. But I don't think it's an error in the script. Apparently if you supply a GFF Tophat will call Bowtie to re-index using a new fasta that it made from that GFF. The new file will be placed in the temp folder. For some reason Bowtie-build can't open the fasta file it just created for me. I guess I'm just gonna try to go without the GFF for Tophat and see if that would work.

            Comment


            • #7
              As Jon_Keats suggested, these identifiers have to be the same. I had the same issue and it was apparently caused by the fact that the chromosome identifiers where "1" in my gtf file whereas it was chr1 in my fasta file. I made the corrections and it works fine now.

              Comment


              • #8
                Hello people, i am very new to tophat, bowtie and samtools
                i read the manual of tophat and ran it on ubuntu vitualized on windows 7 and i got this error
                [2013-07-20 04:11:39] Beginning TopHat run (v2.0.9)
                -----------------------------------------------
                [2013-07-20 04:11:39] Checking for Bowtie
                Bowtie version: 2.1.0.0
                [2013-07-20 04:11:39] Checking for Samtools
                Samtools version: 0.1.19.0
                [2013-07-20 04:11:39] Checking for Bowtie index files (genome)..
                [2013-07-20 04:11:39] Checking for reference FASTA file
                Warning: Could not find FASTA file seq.fa
                [2013-07-20 04:11:39] Reconstituting reference FASTA file from Bowtie index
                Executing: /usr/bin/bowtie2-inspect seq > ./tophat_out/tmp/seq.fa
                [2013-07-20 04:11:45] Generating SAM header for seq
                format: fastq
                quality scale: phred33 (default)
                [2013-07-20 04:11:51] Preparing reads
                left reads: min. length=100, max. length=100, 63588062 kept reads (424 discarded)
                right reads: min. length=100, max. length=100, 63000645 kept reads (587841 discarded)
                [2013-07-20 05:17:15] Mapping left_kept_reads to genome seq with Bowtie2
                [2013-07-20 13:05:46] Mapping left_kept_reads_seg1 to genome seq with Bowtie2 (1/4)
                [2013-07-20 13:59:24] Mapping left_kept_reads_seg2 to genome seq with Bowtie2 (2/4)
                [2013-07-20 15:06:18] Mapping left_kept_reads_seg3 to genome seq with Bowtie2 (3/4)
                [2013-07-20 15:56:59] Mapping left_kept_reads_seg4 to genome seq with Bowtie2 (4/4)
                [2013-07-20 16:49:52] Mapping right_kept_reads to genome seq with Bowtie2
                [2013-07-20 23:01:03] Mapping right_kept_reads_seg1 to genome seq with Bowtie2 (1/4)
                [FAILED]
                Error running bowtie:
                Saw ASCII character -93 but expected 33-based Phred qual.
                terminate called after throwing an instance of 'int'


                please what should i do
                thanks

                Comment


                • #9
                  Tophat 1.4.0 RNA seq mapping

                  What type of reads are you trying to align,
                  and what quality scale is used for the base qualities?

                  Comment


                  • #10
                    sorry for miss out this information

                    it is pair-end reads from illumina
                    quality scale is phred score

                    thanks a lot

                    Comment


                    • #11
                      Do you know what exact scale/encoding those Q-scores are using? http://en.wikipedia.org/wiki/FASTQ_format (5 encoding types)

                      That is important as Maria pointed out.

                      Comment


                      • #12
                        ok, from the report of the sequencing company (i dont know these people) it is greater than Q30
                        and in the tem log file i see this quality scale: phred33 (default)
                        thanks

                        Comment


                        • #13
                          Is that a -93 you are seeing in the error (or just 93) hard to tell from the original post? If the scale is Sanger (Phred33) then your raw sequence Q-scores should not have that value.

                          Comment


                          • #14
                            yes ur right it is -93
                            so what should i.
                            any ideas i will be very glad because now
                            i think i am stock

                            Comment


                            • #15
                              Looking at the time stamps in your log it appears that the error is thrown after ~18 h of run time. That is a pain ..

                              Can you try the script posted by Simon Andrews in post #8 to check for errors in your fastq files: http://seqanswers.com/forums/showthread.php?t=7784

                              Did you compile bowtie on this VM? Are you using 32-bit Ubuntu or 64-bit?

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Essential Discoveries and Tools in Epitranscriptomics
                                by seqadmin




                                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                                04-22-2024, 07:01 AM
                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              59 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              57 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              51 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              55 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X