Seqanswers Leaderboard Ad

**sjm** · 01-13-2010, 04:44 PM

Do you have bowtie in your PATH?

**sjm** · 01-13-2010, 04:45 PM

By the way, there are newer versions of both bowtie and tophat available for download and the authors have squashed a few bugs. Probably not relevant to your error, but worth having the latest.

**jiwu2573** · 01-13-2010, 05:10 PM

Yes, I have bowtie in my path.

I have run the test data and it works.

The s_1_1.fastq is ~3G bytes, converted and joined from 120 seperate qseq.txt files using the perl script provided by the thread 'Conversion from ‘qseq.txt’ to ‘fastq’ format'.

I did a quick test by converting and joining only 10 qseq.txt files and run in tophat and it also worked.

But when I converted and joined all the 120 files, it shows the error above.

Any suggestions?

**sjm** · 01-13-2010, 06:44 PM

Hmm, I've never tried tophat with such large fastq files. The largest I've tried has been 1.5G. Maybe you should get in touch with Cole Trapnell, the guy who largely wrote Tophat, and see if there's a reason why it's choking on large input files. (Cole was very helpful via e-mail with some annotation problems I had in early versions of Tophat.)

**jiwu2573** · 01-13-2010, 09:32 PM

Thanks! I will try.

Just one question about reference hg18.

I noticed that hg18.3.ebwt only has 4 kb, whereas other ebwt files have 300-800Mb.

I downloaded the 2.7 GB UCSC hg18 and unziped it in windows.

**Xi Wang** · 01-13-2010, 11:43 PM

My g18.3.ebwt is also of 4kb. I think the index is ok.

Can you execute BOWTIE by typing "bowtie" in the command line?

**sjm** · 01-14-2010, 05:45 AM

Yes, I can confirm that your .3.ebwt file is OK. I have a bunch of bowtie indexes for mouse (self-built from Ensembl databases) and the .3 file is always a few kb only.

**jiwu2573** · 01-14-2010, 05:51 PM

It looks like either the index or the fastq file has a problem.

Any way to check the hg18 index file and the fastq file?

My fastq file is converted from qseq.txt by first replacing all the '.' to 'N', then use the perl script quoted as above.

Do I need to filter the bad quality/ambiguous sequence before I feed it the to tophat?

**Xi Wang** · 01-14-2010, 08:20 PM

you can use "bowtie-inspect" to check the index file. The bad quality sequence is ok for tophat.

**jiwu2573** · 01-17-2010, 02:30 PM

Hi Xi Wang,

Thanks a lot for your help.

If you are also doing human mRNA sequencing, do you know how long does it take for TopHat to finish analyzing 1 sample?
What's the minimum hardware set up for reasonable speed?

Currently I am running through a RedHat linux server and the speed is painfully slow. For only 1/6 of the total data for 1 sample, it hasn't been finished over this weekend since middle day of Friday. And I am aiming to analyze 20-40 samples in the near future.

Do you think it is possible that I can open a few connections to the Linux server and run TopHat in seperate windows simultaneously?

**Xi Wang** · 01-17-2010, 09:52 PM

Hi,

I am also doing human mRNA mapping. It takes about 4-5 hours to map ~20 million reads to the human reference genome (hg18). Some paramters will affect the mapping efficiency, such as read length (our data is of 50nt), number of mismatches, number of multi-aligned loci allowed.
How may reads do you have for one sample? I can't understand why it took so long to deal with a sample.

Sure, you can run Tophat in seperate windows simultaneously.

**Xi Wang** · 01-17-2010, 11:03 PM

I forget to say that Tophat will use ~5G memory for mapping to the human genome. More memory will speed up the mapping.

**jiwu2573** · 01-18-2010, 02:41 PM

Hi,

Thanks a lot for your information.

I only know my fastq file for 1 sample is around 3 GB after converting and joining all the 120 qseq.txt files, not sure how to find out how many reads in total? How do you know?

The read length is 76 bp. I am running tophat with the default configuration without any argument except --solexa1.3-quals. I guess you are designating the number of mismatches, number of multi-aligned loci by the argument. If that's the case, what number do you use?

PS. I am running TopHat through univ connection to the Linux server. Is it supposed to be faster than running on my local computer? How many processors do you have in your computer? Is a normal PC enough?

**jiwu2573** · 01-18-2010, 04:24 PM

Another question:

Is there any need to run Bowtie alone as TopHat will call Bowtie anyway?

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 30 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 32 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 53 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

problem with tophat

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News