Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Blast threads always drop to 1

    I've seen this issue with every version of blast+ that I've used recently. When I run jobs on a multi-core machine, I specify -num_threads XX to speed things up. Invariably, no matter how many threads I specify, after a short time it seems as though only 1 thread is active on the machine. I've compiled the blast binaries myself using both gcc and the intel icc compilers. When I start the job, top shows blastn/p/x using, say, 800% of processor if I specify 8 threads. After a few minutes this drops to 100%. The job completes in less time than a single-threaded job, but not by much. Is this normal behavior?

    Thanks!

  • #2
    One thought that sprung to mind is whether BLAST is I/O limited rather than CPU limited. I guess this might happen if you had a very large database and slow disks. vmstat/iostat might help determine this.

    Comment


    • #3
      Here's the official response from NCBI. Only some of the code is multithreaded..


      "BLAST search has three distinctive stages: word matching with database scan, ungapped alignment, gapped alignment with traceback.

      As I understand it Only the word match stage is multi-threaded. So what you described make sense and it correct."

      Comment


      • #4
        That does indeed make sense. Sounds like it might be a bottleneck if your search returns a lot of matches that need aligning.

        Comment


        • #5
          blast+ vs older blastall

          I've seen the same behavior for blast+ vs the older blastall. In section 4.5 of the NCBI user manual for blast+ they show a performance improvement over blastall for queries of length 10Kb - 10Mb, but for shorter queries my experience is that blast+ is much, much slower. When I run a blastx with 50 DNA queries of average length 1135 against a protein database of 475000 sequences (161M total letters) using 8 cpus, the blastall 2.2.18 code finishes the run in just under 5 hours with cpu usage 764%. The same blastx with blast+ version 2.2.23 is still running after 16 hours and has only finished 14000 queries, and the cpu usage shows 243% for 8 processors. Needless to say, I won't be encouraging anyone with shorter queries to use blast+ until this problem has been fixed.

          Comment


          • #6
            Well if all fails here's a simple little script to Multithread any application, I use it whenever I have to use blat or maq which don't have multithread support. It counts how many processes have been started and if there are less than the number of threads (24 in the script below) it starts a new one. All you need to do is cut your reads into many pieces.


            for x in *.fa
            do

            while [ $(ps -Af | grep "blast" | wc -l) -gt 24 ]
            do
            sleep 5
            done

            blast $x &...
            sleep 1

            done

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Recent Advances in Sequencing Analysis Tools
              by seqadmin


              The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
              05-06-2024, 07:48 AM
            • seqadmin
              Essential Discoveries and Tools in Epitranscriptomics
              by seqadmin




              The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
              04-22-2024, 07:01 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 05-07-2024, 06:57 AM
            0 responses
            12 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 05-06-2024, 07:17 AM
            0 responses
            16 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 05-02-2024, 08:06 AM
            0 responses
            21 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-30-2024, 12:17 PM
            0 responses
            24 views
            0 likes
            Last Post seqadmin  
            Working...
            X