SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
some problem with tophat zslee Bioinformatics 6 04-04-2012 03:24 AM
problem running TopHat anecsulea RNA Sequencing 0 05-28-2010 01:44 AM
tophat running problem yasu Bioinformatics 4 02-08-2010 11:54 PM
Novice Problem with Tophat DrD2009 Bioinformatics 10 12-30-2009 01:24 PM
Tophat problem iloveneworleans Bioinformatics 0 07-15-2009 03:05 PM

Reply
 
Thread Tools
Old 01-13-2010, 03:26 PM   #1
jiwu2573
Member
 
Location: Australia

Join Date: Jun 2009
Posts: 34
Question problem with tophat

Can anyone here help me with the tophat error?
when i map single-end solid sequences(fastq format) to hg18 as follow:
tophat --solexa1.3-quals /usr/local/bowtie/indexes/hg18 s_1_1.fastq

there is such error:
[Thu Jan 14 09:28:00 2010] Beginning TopHat run (v1.0.10)
-----------------------------------------------
[Thu Jan 14 09:28:00 2010] Preparing output location ./tophat_out/
[Thu Jan 14 09:28:00 2010] Checking for Bowtie index files
[Thu Jan 14 09:28:00 2010] Checking for reference FASTA file
[Thu Jan 14 09:28:00 2010] Checking for Bowtie
Bowtie version: 0.10.1.0
[Thu Jan 14 09:28:00 2010] Checking reads
seed length: 76bp
format: fastq
quality scale: --solexa1.3-quals
Splitting reads into 3 segments
[Thu Jan 14 09:52:04 2010] Mapping reads against hg18 with Bowtie
[FAILED]
Error: could not execute Bowtie
Traceback (most recent call last):
File "/usr/local/tophat-1.0.10/bin/tophat", line 1490, in ?
sys.exit(main())
File "/usr/local/tophat-1.0.10/bin/tophat", line 1462, in main
user_supplied_juncs)
File "/usr/local/tophat-1.0.10/bin/tophat", line 1241, in spliced_alignment
seg)
File "/usr/local/tophat-1.0.10/bin/tophat", line 752, in bowtie
exit(1)
TypeError: 'str' object is not callable

What could be wrong?
can anyone help me? thanks in advance~~
jiwu2573 is offline   Reply With Quote
Old 01-13-2010, 04:44 PM   #2
sjm
Member
 
Location: St Louis, MO

Join Date: Nov 2009
Posts: 27
Default

Do you have bowtie in your PATH?
sjm is offline   Reply With Quote
Old 01-13-2010, 04:45 PM   #3
sjm
Member
 
Location: St Louis, MO

Join Date: Nov 2009
Posts: 27
Default

By the way, there are newer versions of both bowtie and tophat available for download and the authors have squashed a few bugs. Probably not relevant to your error, but worth having the latest.
sjm is offline   Reply With Quote
Old 01-13-2010, 05:10 PM   #4
jiwu2573
Member
 
Location: Australia

Join Date: Jun 2009
Posts: 34
Default

Yes, I have bowtie in my path.

I have run the test data and it works.

The s_1_1.fastq is ~3G bytes, converted and joined from 120 seperate qseq.txt files using the perl script provided by the thread 'Conversion from ‘qseq.txt’ to ‘fastq’ format'.

I did a quick test by converting and joining only 10 qseq.txt files and run in tophat and it also worked.

But when I converted and joined all the 120 files, it shows the error above.

Any suggestions?
jiwu2573 is offline   Reply With Quote
Old 01-13-2010, 06:44 PM   #5
sjm
Member
 
Location: St Louis, MO

Join Date: Nov 2009
Posts: 27
Default

Hmm, I've never tried tophat with such large fastq files. The largest I've tried has been 1.5G. Maybe you should get in touch with Cole Trapnell, the guy who largely wrote Tophat, and see if there's a reason why it's choking on large input files. (Cole was very helpful via e-mail with some annotation problems I had in early versions of Tophat.)
sjm is offline   Reply With Quote
Old 01-13-2010, 09:32 PM   #6
jiwu2573
Member
 
Location: Australia

Join Date: Jun 2009
Posts: 34
Default

Thanks! I will try.

Just one question about reference hg18.

I noticed that hg18.3.ebwt only has 4 kb, whereas other ebwt files have 300-800Mb.

I downloaded the 2.7 GB UCSC hg18 and unziped it in windows.
jiwu2573 is offline   Reply With Quote
Old 01-13-2010, 11:43 PM   #7
Xi Wang
Senior Member
 
Location: MDC, Berlin, Germany

Join Date: Oct 2009
Posts: 317
Default

My g18.3.ebwt is also of 4kb. I think the index is ok.

Can you execute BOWTIE by typing "bowtie" in the command line?
__________________
Xi Wang
Xi Wang is offline   Reply With Quote
Old 01-14-2010, 05:45 AM   #8
sjm
Member
 
Location: St Louis, MO

Join Date: Nov 2009
Posts: 27
Default

Yes, I can confirm that your .3.ebwt file is OK. I have a bunch of bowtie indexes for mouse (self-built from Ensembl databases) and the .3 file is always a few kb only.
sjm is offline   Reply With Quote
Old 01-14-2010, 05:51 PM   #9
jiwu2573
Member
 
Location: Australia

Join Date: Jun 2009
Posts: 34
Default

It looks like either the index or the fastq file has a problem.

Any way to check the hg18 index file and the fastq file?

My fastq file is converted from qseq.txt by first replacing all the '.' to 'N', then use the perl script quoted as above.

Do I need to filter the bad quality/ambiguous sequence before I feed it the to tophat?
jiwu2573 is offline   Reply With Quote
Old 01-14-2010, 08:20 PM   #10
Xi Wang
Senior Member
 
Location: MDC, Berlin, Germany

Join Date: Oct 2009
Posts: 317
Default

you can use "bowtie-inspect" to check the index file. The bad quality sequence is ok for tophat.
__________________
Xi Wang
Xi Wang is offline   Reply With Quote
Old 01-17-2010, 02:30 PM   #11
jiwu2573
Member
 
Location: Australia

Join Date: Jun 2009
Posts: 34
Default

Hi Xi Wang,

Thanks a lot for your help.

If you are also doing human mRNA sequencing, do you know how long does it take for TopHat to finish analyzing 1 sample?
What's the minimum hardware set up for reasonable speed?

Currently I am running through a RedHat linux server and the speed is painfully slow. For only 1/6 of the total data for 1 sample, it hasn't been finished over this weekend since middle day of Friday. And I am aiming to analyze 20-40 samples in the near future.

Do you think it is possible that I can open a few connections to the Linux server and run TopHat in seperate windows simultaneously?
jiwu2573 is offline   Reply With Quote
Old 01-17-2010, 09:52 PM   #12
Xi Wang
Senior Member
 
Location: MDC, Berlin, Germany

Join Date: Oct 2009
Posts: 317
Default

Hi,

I am also doing human mRNA mapping. It takes about 4-5 hours to map ~20 million reads to the human reference genome (hg18). Some paramters will affect the mapping efficiency, such as read length (our data is of 50nt), number of mismatches, number of multi-aligned loci allowed.
How may reads do you have for one sample? I can't understand why it took so long to deal with a sample.

Sure, you can run Tophat in seperate windows simultaneously.
__________________
Xi Wang
Xi Wang is offline   Reply With Quote
Old 01-17-2010, 11:03 PM   #13
Xi Wang
Senior Member
 
Location: MDC, Berlin, Germany

Join Date: Oct 2009
Posts: 317
Default

I forget to say that Tophat will use ~5G memory for mapping to the human genome. More memory will speed up the mapping.
__________________
Xi Wang
Xi Wang is offline   Reply With Quote
Old 01-18-2010, 02:41 PM   #14
jiwu2573
Member
 
Location: Australia

Join Date: Jun 2009
Posts: 34
Default

Hi,

Thanks a lot for your information.

I only know my fastq file for 1 sample is around 3 GB after converting and joining all the 120 qseq.txt files, not sure how to find out how many reads in total? How do you know?

The read length is 76 bp. I am running tophat with the default configuration without any argument except --solexa1.3-quals. I guess you are designating the number of mismatches, number of multi-aligned loci by the argument. If that's the case, what number do you use?

PS. I am running TopHat through univ connection to the Linux server. Is it supposed to be faster than running on my local computer? How many processors do you have in your computer? Is a normal PC enough?
jiwu2573 is offline   Reply With Quote
Old 01-18-2010, 04:24 PM   #15
jiwu2573
Member
 
Location: Australia

Join Date: Jun 2009
Posts: 34
Default

Another question:

Is there any need to run Bowtie alone as TopHat will call Bowtie anyway?
jiwu2573 is offline   Reply With Quote
Old 01-18-2010, 04:36 PM   #16
jiwu2573
Member
 
Location: Australia

Join Date: Jun 2009
Posts: 34
Default

Another check:

My hg18.fa constructed from hg18 UCSC from TopHat website has the size of 3131776827 bytes, same as yours?
jiwu2573 is offline   Reply With Quote
Old 01-18-2010, 10:24 PM   #17
Xi Wang
Senior Member
 
Location: MDC, Berlin, Germany

Join Date: Oct 2009
Posts: 317
Default

Quote:
Originally Posted by jiwu2573 View Post
I only know my fastq file for 1 sample is around 3 GB after converting and joining all the 120 qseq.txt files, not sure how to find out how many reads in total? How do you know?
I guess there are more 10 million reads. You can count the number of lines of FASEQ file, and the number of reads equals to 1/4 of the number of lines.

Quote:
The read length is 76 bp. I am running tophat with the default configuration without any argument except --solexa1.3-quals. I guess you are designating the number of mismatches, number of multi-aligned loci by the argument. If that's the case, what number do you use?
I usually discard all the multi-reads, by specifying "-g 1". I think it is normal to take several hours to finish your job.

Quote:
PS. I am running TopHat through univ connection to the Linux server. Is it supposed to be faster than running on my local computer? How many processors do you have in your computer? Is a normal PC enough?
I think so. The computational server will be more efficient than PC.
__________________
Xi Wang
Xi Wang is offline   Reply With Quote
Old 01-18-2010, 10:25 PM   #18
Xi Wang
Senior Member
 
Location: MDC, Berlin, Germany

Join Date: Oct 2009
Posts: 317
Default

Quote:
Originally Posted by jiwu2573 View Post
Another question:

Is there any need to run Bowtie alone as TopHat will call Bowtie anyway?
You can just run Tophat.
__________________
Xi Wang
Xi Wang is offline   Reply With Quote
Old 01-18-2010, 10:28 PM   #19
Xi Wang
Senior Member
 
Location: MDC, Berlin, Germany

Join Date: Oct 2009
Posts: 317
Default

Quote:
Originally Posted by jiwu2573 View Post
Another check:

My hg18.fa constructed from hg18 UCSC from TopHat website has the size of 3131776827 bytes, same as yours?
It should be ok. The size of mine is 3169831337. How about anybody else?
__________________
Xi Wang
Xi Wang is offline   Reply With Quote
Old 01-19-2010, 09:17 PM   #20
jiwu2573
Member
 
Location: Australia

Join Date: Jun 2009
Posts: 34
Default

Hi Xi,

Thanks a lot for your help.

Last night I managed to run 1 sample by tophat successfully (it took 17 hours).

I tried to visualize the output, coverage.wig and junctions.bed in UCSC genome browser.

When I load coverage.wig, it shows Error File 'coverage.wig' - Error line 3771557 of custom track: chromEnd less than 1 (0)

When I load junctions.bed, it only shows chromosome 20?
Name Description Type Doc Items Pos
junctions TopHat junctions bed 80207 chr20:

By the way, my junctions file is 6430K and my coverage file is around 173M, are these normal?

How do you do your visualization?

Do you use Cufflinks to quantify the expression after TopHat?
jiwu2573 is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 01:50 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO