Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • problem with tophat

    Can anyone here help me with the tophat error?
    when i map single-end solid sequences(fastq format) to hg18 as follow:
    tophat --solexa1.3-quals /usr/local/bowtie/indexes/hg18 s_1_1.fastq

    there is such error:
    [Thu Jan 14 09:28:00 2010] Beginning TopHat run (v1.0.10)
    -----------------------------------------------
    [Thu Jan 14 09:28:00 2010] Preparing output location ./tophat_out/
    [Thu Jan 14 09:28:00 2010] Checking for Bowtie index files
    [Thu Jan 14 09:28:00 2010] Checking for reference FASTA file
    [Thu Jan 14 09:28:00 2010] Checking for Bowtie
    Bowtie version: 0.10.1.0
    [Thu Jan 14 09:28:00 2010] Checking reads
    seed length: 76bp
    format: fastq
    quality scale: --solexa1.3-quals
    Splitting reads into 3 segments
    [Thu Jan 14 09:52:04 2010] Mapping reads against hg18 with Bowtie
    [FAILED]
    Error: could not execute Bowtie
    Traceback (most recent call last):
    File "/usr/local/tophat-1.0.10/bin/tophat", line 1490, in ?
    sys.exit(main())
    File "/usr/local/tophat-1.0.10/bin/tophat", line 1462, in main
    user_supplied_juncs)
    File "/usr/local/tophat-1.0.10/bin/tophat", line 1241, in spliced_alignment
    seg)
    File "/usr/local/tophat-1.0.10/bin/tophat", line 752, in bowtie
    exit(1)
    TypeError: 'str' object is not callable

    What could be wrong?
    can anyone help me? thanks in advance~~

  • #2
    Do you have bowtie in your PATH?

    Comment


    • #3
      By the way, there are newer versions of both bowtie and tophat available for download and the authors have squashed a few bugs. Probably not relevant to your error, but worth having the latest.

      Comment


      • #4
        Yes, I have bowtie in my path.

        I have run the test data and it works.

        The s_1_1.fastq is ~3G bytes, converted and joined from 120 seperate qseq.txt files using the perl script provided by the thread 'Conversion from ‘qseq.txt’ to ‘fastq’ format'.

        I did a quick test by converting and joining only 10 qseq.txt files and run in tophat and it also worked.

        But when I converted and joined all the 120 files, it shows the error above.

        Any suggestions?

        Comment


        • #5
          Hmm, I've never tried tophat with such large fastq files. The largest I've tried has been 1.5G. Maybe you should get in touch with Cole Trapnell, the guy who largely wrote Tophat, and see if there's a reason why it's choking on large input files. (Cole was very helpful via e-mail with some annotation problems I had in early versions of Tophat.)

          Comment


          • #6
            Thanks! I will try.

            Just one question about reference hg18.

            I noticed that hg18.3.ebwt only has 4 kb, whereas other ebwt files have 300-800Mb.

            I downloaded the 2.7 GB UCSC hg18 and unziped it in windows.

            Comment


            • #7
              My g18.3.ebwt is also of 4kb. I think the index is ok.

              Can you execute BOWTIE by typing "bowtie" in the command line?
              Xi Wang

              Comment


              • #8
                Yes, I can confirm that your .3.ebwt file is OK. I have a bunch of bowtie indexes for mouse (self-built from Ensembl databases) and the .3 file is always a few kb only.

                Comment


                • #9
                  It looks like either the index or the fastq file has a problem.

                  Any way to check the hg18 index file and the fastq file?

                  My fastq file is converted from qseq.txt by first replacing all the '.' to 'N', then use the perl script quoted as above.

                  Do I need to filter the bad quality/ambiguous sequence before I feed it the to tophat?

                  Comment


                  • #10
                    you can use "bowtie-inspect" to check the index file. The bad quality sequence is ok for tophat.
                    Xi Wang

                    Comment


                    • #11
                      Hi Xi Wang,

                      Thanks a lot for your help.

                      If you are also doing human mRNA sequencing, do you know how long does it take for TopHat to finish analyzing 1 sample?
                      What's the minimum hardware set up for reasonable speed?

                      Currently I am running through a RedHat linux server and the speed is painfully slow. For only 1/6 of the total data for 1 sample, it hasn't been finished over this weekend since middle day of Friday. And I am aiming to analyze 20-40 samples in the near future.

                      Do you think it is possible that I can open a few connections to the Linux server and run TopHat in seperate windows simultaneously?

                      Comment


                      • #12
                        Hi,

                        I am also doing human mRNA mapping. It takes about 4-5 hours to map ~20 million reads to the human reference genome (hg18). Some paramters will affect the mapping efficiency, such as read length (our data is of 50nt), number of mismatches, number of multi-aligned loci allowed.
                        How may reads do you have for one sample? I can't understand why it took so long to deal with a sample.

                        Sure, you can run Tophat in seperate windows simultaneously.
                        Xi Wang

                        Comment


                        • #13
                          I forget to say that Tophat will use ~5G memory for mapping to the human genome. More memory will speed up the mapping.
                          Xi Wang

                          Comment


                          • #14
                            Hi,

                            Thanks a lot for your information.

                            I only know my fastq file for 1 sample is around 3 GB after converting and joining all the 120 qseq.txt files, not sure how to find out how many reads in total? How do you know?

                            The read length is 76 bp. I am running tophat with the default configuration without any argument except --solexa1.3-quals. I guess you are designating the number of mismatches, number of multi-aligned loci by the argument. If that's the case, what number do you use?

                            PS. I am running TopHat through univ connection to the Linux server. Is it supposed to be faster than running on my local computer? How many processors do you have in your computer? Is a normal PC enough?

                            Comment


                            • #15
                              Another question:

                              Is there any need to run Bowtie alone as TopHat will call Bowtie anyway?

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM
                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              18 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              22 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              17 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              49 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X