SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   Bioinformatics (http://seqanswers.com/forums/forumdisplay.php?f=18)
-   -   Blast threads always drop to 1 (http://seqanswers.com/forums/showthread.php?t=5752)

k-gun12 06-29-2010 09:25 AM

Blast threads always drop to 1
 
I've seen this issue with every version of blast+ that I've used recently. When I run jobs on a multi-core machine, I specify -num_threads XX to speed things up. Invariably, no matter how many threads I specify, after a short time it seems as though only 1 thread is active on the machine. I've compiled the blast binaries myself using both gcc and the intel icc compilers. When I start the job, top shows blastn/p/x using, say, 800% of processor if I specify 8 threads. After a few minutes this drops to 100%. The job completes in less time than a single-threaded job, but not by much. Is this normal behavior?

Thanks!

nickloman 06-29-2010 10:46 AM

One thought that sprung to mind is whether BLAST is I/O limited rather than CPU limited. I guess this might happen if you had a very large database and slow disks. vmstat/iostat might help determine this.

k-gun12 06-29-2010 10:49 AM

Here's the official response from NCBI. Only some of the code is multithreaded..


"BLAST search has three distinctive stages: word matching with database scan, ungapped alignment, gapped alignment with traceback.

As I understand it Only the word match stage is multi-threaded. So what you described make sense and it correct."

nickloman 06-29-2010 10:58 AM

That does indeed make sense. Sounds like it might be a bottleneck if your search returns a lot of matches that need aligning.

sjmillerAZ 07-02-2010 09:50 AM

blast+ vs older blastall
 
I've seen the same behavior for blast+ vs the older blastall. In section 4.5 of the NCBI user manual for blast+ they show a performance improvement over blastall for queries of length 10Kb - 10Mb, but for shorter queries my experience is that blast+ is much, much slower. When I run a blastx with 50 DNA queries of average length 1135 against a protein database of 475000 sequences (161M total letters) using 8 cpus, the blastall 2.2.18 code finishes the run in just under 5 hours with cpu usage 764%. The same blastx with blast+ version 2.2.23 is still running after 16 hours and has only finished 14000 queries, and the cpu usage shows 243% for 8 processors. Needless to say, I won't be encouraging anyone with shorter queries to use blast+ until this problem has been fixed.

aleferna 07-23-2010 06:20 AM

Well if all fails here's a simple little script to Multithread any application, I use it whenever I have to use blat or maq which don't have multithread support. It counts how many processes have been started and if there are less than the number of threads (24 in the script below) it starts a new one. All you need to do is cut your reads into many pieces.


for x in *.fa
do

while [ $(ps -Af | grep "blast" | wc -l) -gt 24 ]
do
sleep 5
done

blast $x &...
sleep 1

done


All times are GMT -8. The time now is 05:39 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.