Seqanswers Leaderboard Ad

**Brian Bushnell** · 11-01-2015, 11:10 AM

I don't think you are missing anything obvious... sounds like a major bug, in which only one thread is writing data. I suggest you roll back to an earlier version that you believe was working correctly.

**rwan** · 11-01-2015, 05:44 PM

Hi Brian,

Previously, I had expected a larger file if I used more threads because of some kind of duplication in results. But this was the opposite of what I had expected. What you said makes sense and it never occurred to me at all..

I guess I didn't expect there to be such a bug and presumed its use of threads was working correctly.

Actually, I found this out by accident and don't know how far back I have to go to get a working TopHat. But it sounds like if I use a single thread, it'll run longer, but the results might be more correct.

Thanks a lot for the advice!

Ray

**blancha** · 11-01-2015, 05:49 PM

Hate to point out the obvious, but you should start by trying the latest version of TopHat, version 2.1.0.

Another possibility is that you are running TopHat on a computing cluster, and are requesting less processors than number of threads. On my own computing cluster, I will get all sorts of strange results if I request only one processor from the scheduler, and then try and run 2 or more threads.

Those are two potential solutions that I would investigate, upgrading to the latest version of TopHat, which is my standard advice for any bug with any program, and verifying the number of processors available for multi-threading.

P.S. I run TopHat just about every day, and the only serious problem I have ever experienced was when I asked the scheduler for less processors than the number of threads given in argument to TopHat.

**blancha** · 11-01-2015, 05:57 PM

Another environment variable that you may want to investigate is OMP_NUM_THREADS.

It really depends on what kind of system you are running TopHat, so it's hard to do more troubleshooting without any more information about the system.

It could help to set OMP_NUM_THREADS to the number of threads TopHat is asked to run with, if these processors are available on the system.

**rwan** · 11-01-2015, 06:13 PM

Hi,

Originally posted by blancha View Post

Hate to point out the obvious, but you should start by trying the latest version of TopHat, version 2.1.0.

True, but 2.0.13 was the latest version for Ubuntu 15.04 so I'm sure I wasn't the only one still running it. I just moved to 15.10 and yes, it does run 2.1.0 and am looking into it.

Originally posted by blancha View Post

Another possibility is that you are running TopHat on a computing cluster, and are requesting less processors than number of threads. On my own computing cluster, I will get all sorts of strange results if I request only one processor from the scheduler, and then try and run 2 or more threads.

Actually, I'm running on a single computer but it is running a scheduler. However, even in such a scenario, there shouldn't be any "strange results", should there? I mean, logically there shouldn't be. In your example, the scheduler or OS should put a stop to it but if it does not, then TopHat shouldn't give strange results, should it?

Anyway, in my case, I'm passing the same value for -p to both TopHat and the scheduler. But that ended up being error prone so I made the scheduler processor number very large. I can't imagine *that* being a problem...if you suspect it could be, I can remove the scheduler from the equation and run via the command line. Hmmmmm, might be worth trying.

Originally posted by blancha View Post

Those are two potential solutions that I would investigate, upgrading to the latest version of TopHat, which is my standard advice for any bug with any program, and verifying the number of processors available for multi-threading.

P.S. I run TopHat just about every day, and the only serious problem I have ever experienced was when I asked the scheduler for less processors than the number of threads given in argument to TopHat.

So, may I ask if you've ran TopHat before on the same input but with various values of -p? I've tried several values from 2 to 16 (the limit of the computer I'm using) and have a gradual decrease in output size and reads mapped (as shown in IGV).

I never did this before and just thought of doing it on a whim. So, I'm a bit surprised with the results.

I'll give TopHat 2.1.0 a try and post what I find. But even if this is a problem with an older version, that's still a serious problem, isn't it? What I mean is, I have had projects using the older version of TopHat and never checked the effect of -p...

Thanks a lot for your comments! It certainly gives me some things to try...

Ray

**rwan** · 11-01-2015, 06:20 PM

Originally posted by blancha View Post

Another environment variable that you may want to investigate is OMP_NUM_THREADS.

It really depends on what kind of system you are running TopHat, so it's hard to do more troubleshooting without any more information about the system.

It could help to set OMP_NUM_THREADS to the number of threads TopHat is asked to run with, if these processors are available on the system.

Thanks for this as well! I've ran OMP-based programs before but didn't know I had to set this environment variable. I thought the program could determine it by itself.

I will give it a try -- thank you!

Ray

**Brian Bushnell** · 11-01-2015, 07:23 PM

Originally posted by rwan View Post

I'll give TopHat 2.1.0 a try and post what I find. But even if this is a problem with an older version, that's still a serious problem, isn't it? What I mean is, I have had projects using the older version of TopHat and never checked the effect of -p...

Unfortunately, it sounds like you should go back and re-evaluate all the data you processed with that version of Tophat, to be safe.

As for the number of processors and number of threads... the number of processors should be completely transparent, and a deterministic program should give identical results for a large number of threads whether there is 1 processor or many processors.

And by the way, I'd like to toss in a recommendation that you try BBMap for RNA-seq, as long as you're (possibly) going back and reprocessing a lot of data.

**blancha** · 11-01-2015, 07:23 PM

So, may I ask if you've ran TopHat before on the same input but with various values of -p? I've tried several values from 2 to 16 (the limit of the computer I'm using) and have a gradual decrease in output size and reads mapped (as shown in IGV).

Yes, the results were absolutely identical.
Only the runtime was shorted, obviously.

Actually, I'm running on a single computer but it is running a scheduler. However, even in such a scenario, there shouldn't be any "strange results", should there? I mean, logically there shouldn't be. In your example, the scheduler or OS should put a stop to it but if it does not, then TopHat shouldn't give strange results, should it?

I seem to remember that TopHat would run to completion, and give bewildering results, without any error messages. After some unfortunate experiences, I was always very careful to request a number of processors equal or greater to the the number of threads on which TopHat run. It's a dangerous "bug", since there are no error messages in the log.

So, you might want to check that you are requesting from the scheduler a number of processors equal or greater to the number of threads on which TopHat will run.

You can also add the following command in your submission script to the scheduler, before running TopHat.

export OMP_NUM_THREADS=#threads_requested_for_TopHat

Either due to updates to the scheduler or to TopHat, I no longer need to export this variable in my job submission scripts.
Until two years ago, users on my computing cluster had to export this variable when submitting multi-threaded TopHat jobs, or they would get incorrect results.

I should mention too that TopHat is really just calling Bowtie1 or 2 to do the actual alignment, so you might want to verify that you also have the latest version of Bowtie1 or 2.

**rwan** · 11-01-2015, 07:33 PM

Originally posted by blancha View Post

Yes, the results were absolutely identical.
Only the runtime was shorted, obviously.

Ok! That was what I was expecting so I'll try to figure out what's going on.

Originally posted by blancha View Post

I seem to remember that TopHat would run to completion, and give bewildering results, without any error messages. After some unfortunate experiences, I was always very careful to request a number of processors equal or greater to the the number of threads on which TopHat run. It's a dangerous "bug", since there are no error messages in the log.

So, you might want to check that you are requesting from the scheduler a number of processors equal or greater to the number of threads on which TopHat will run.

You can also add the following command in your submission script to the scheduler, before running TopHat.

export OMP_NUM_THREADS=#threads_requested_for_TopHat

Either due to updates to the scheduler or to TopHat, I no longer need to export this variable in my job submission scripts.
Until two years ago, users on my computing cluster had to export this variable when submitting multi-threaded TopHat jobs, or they would get incorrect results.

I should mention too that TopHat is really just calling Bowtie1 or 2 to do the actual alignment, so you might want to verify that you also have the latest version of Bowtie1 or 2.

I'm actually using the packages included with Ubuntu. I know that means the software could be 6 months (or so) out of date compared to downloading the latest version, but it's easier to maintain since upgrading the OS also upgrades the software. (And I hope there are other users as lazy as me and will end up using the same set of program versions as me.)

I will give what you suggest a try. Fortunately, I'm on a single-user system (i.e., it's just me) so I can bypass the scheduler if there's a possibility that the scheduler is the cause of the problems.

Thank you!

Ray

**rwan** · 11-01-2015, 07:38 PM

Originally posted by Brian Bushnell View Post

Unfortunately, it sounds like you should go back and re-evaluate all the data you processed with that version of Tophat, to be safe.

As for the number of processors and number of threads... the number of processors should be completely transparent, and a deterministic program should give identical results for a large number of threads whether there is 1 processor or many processors.

And by the way, I'd like to toss in a recommendation that you try BBMap for RNA-seq, as long as you're (possibly) going back and reprocessing a lot of data.

Yes, that is what I was expecting though I wouldn't be surprised if a careless mistake on my part was the cause of what I'm seeing. The old data was passed on to someone else -- I'll have to let them know.

Thanks for the suggestion about BBMap! I wasn't aware of it.

I did run STAR and the output file size was more than TopHat with 2 threads -- I am currently running it with 1 thread and if the file size ends up being similar, then your original suspicion was correct. STAR, regardless of number of threads, seems to give similar outputs (though I only checked file sizes and not throw the BAM file into IGV yet).

Ray

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 30 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 32 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 53 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Slightly different TopHat output from # of threads

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News