Unconfigured Ad

**maubp** · 01-11-2015, 06:29 AM

If you had an input query FASTA file of (for example) 1000 query sequences, then I would split this into several separate FASTA files (e.g. ten files of 100 sequences each), and submit them to the cluster as ten jobs, and then combine the BLAST output.

Each BLAST job could be set to ask for a single machine with 8 threads. Or, there is flexibility here - while BLAST does get faster when given more threads, this is not perfect - so it might be faster overall to use four threads for each BLAST job (meaning on your cluster, there could be two BLAST jobs running at the same time - fine if you have enough RAM).

**TauOvermind** · 01-11-2015, 06:53 AM

maubp, thank you for your suggestion, I will try it tomorrow.

Just wanted to clarify, when you mentioned two BLAST jobs running at the same time, did you mean that they would be running on the same node simultaneously? So, if I have 6 nodes with 8 threads on each and I submit 12 blast jobs with 4 threads for each, there would be 12 independent blast instances (jobs) running in parallel, assuming that the memory is not a problem?

**maubp** · 01-11-2015, 07:00 AM

Yes - assuming your limit is really six machines at once. I'm not familiar enough with Torque/PBS to really guess, but it could be you are limited to six active jobs at once?

**TauOvermind** · 01-11-2015, 07:19 AM

Thank you for clarification, and you are probably right about the jobs limit, but I am not really sure about that. I was told that I can use 48 threads per job at most, so all the rest are just my guesses, and I might be completely wrong.

**GenoMax** · 01-11-2015, 10:35 AM

With 24GB RAM per node you should not start a lot of threads. Size of the database you are going to search against is going to determine the outcome here. Blastx searches are compute intensive as is.

I would recommend that you run one or two exploratory jobs (start with 4 and 8 threads, keep threads from a job on the same node) and allocate the maximum RAM you are allowed to use (with 24G physical RAM you can probably use 20-22G at most for the job, provided nothing else is running on node) and see how much RAM is actually used by the job in the log. Depending on results you can then decide on number of threads to use per node.

**TauOvermind** · 01-11-2015, 04:57 PM

Thank you for your suggestions as well, GenoMax. I made a script (virulence.sh) for Torque, which would submit 6 jobs of blastx with 4 threads per each node (total 24 threads):

Code:

#!/bin/bash


#Setting Torque parameters
#PBS -N vir_blast
#PBS -j oe
#PBS -m abe
#PBS -M [email protected]
#PBS -q main_queue
#PBS -l mem=22000mb
#PBS -l nodes=1:ppn=4
#PBS -t 0-5

#Loading modules
module add shared
module add torque
module add blastx


#Executing commands

cd $PBS_O_WORKDIR

#Each blastx instance cosists of one main thread and 'k' working threads, whose number is specified by '-num_threads' parameter
#Thus, to use 4 CPU threads per node '-num_threads' should be set to 3 (1 main and 3 worker blastx threads will be created)
 
blastx -db virDB -query ./meta_chunk_${PBS_ARRAYID}.fa -e 1e-5 -num_threads 3 -otfmt 6 -out ./results_chunk_${PBS_ARRAYID}.fm6

The script will be executed with 'qsub virulence.sh' command.

Could someone take a look at the script and tell me if it looks fine?

I am still trying to comprehend how queueing actually works. Let's assume I have an idle cluster with 6 nodes, 8 cores on each, so 48 cores in total. If I request 12 independent jobs with 4 cores per job (#PBS -l nodes=1

pn=4, #PBS -t 0-11), what would happen? Will all my jobs run simultaneously on the cluster, with 2 jobs running on each node, or will only 6 jobs be started with 6 others waiting in the queue?

**GenoMax** · 01-11-2015, 05:23 PM

Originally posted by TauOvermind View Post

I am still trying to comprehend how queueing actually works. Let's assume I have an idle cluster with 6 nodes, 8 cores on each, so 48 cores in total. If I request 12 independent jobs with 4 cores per job (#PBS -l nodes=1

pn=4, #PBS -t 0-11), what would happen? Will all my jobs run simultaneously on the cluster, with 2 jobs running on each node, or will only 6 jobs be started with 6 others waiting in the queue?

If you only consider job slots then technically your 12 independent jobs will start at the same time, if all 48 cores are idle. But in this case we are ignoring other requests you are making. e.g. if you request 22G of memory for each job then one one of the jobs can run at a give time considering your nodes have 24G of RAM. So a job scheduler takes into account a combination of what you request in terms of resources then matches it with what you have access to/are allowed to use based on local "fare share" policy and ultimately what the current load status is for the cluster (how busy are the nodes, are all job slots full etc).

Here is an example page of how PBS is used: http://arc.research.umich.edu/flux-a...ux/pbs-basics/

**GenoMax** · 01-11-2015, 05:27 PM

BTW: I don't know why you are referring to a main thread and working threads in your script. Here is an example of a PBS script that starts with "n" CPU's and equal number of threads: http://swes.cals.arizona.edu/maier_l...ome/docs/blast

**TauOvermind** · 01-11-2015, 05:45 PM

Thank you very much for both your explanations and the links you provided, GenoMax. I have spent a lot of time googling for a good example of a BLAST+ script for a cluster with Torque/PBS today, but haven't seen the second page. I read about main and working threads of BLAST+ here, but now I am confused:

https://wiki.hpcc.msu.edu/display/Bioinfo/BLAST+with+Multiple+Processors

I will try to run a test job tomorrow.

**GenoMax** · 01-11-2015, 05:54 PM

I don't use PBS but it does appear that the information at the link you provided indicates that a core is needed for the "main" job. Try with n = CPU = threads first and see what happens.

Topics	Statistics	Last Post
Single-Cell Atlases Skew Toward European Ancestry, Analysis Finds by SEQadmin2 Started by SEQadmin2, 07-20-2026, 11:10 AM	0 responses 18 views 0 reactions	Last Post by SEQadmin2 07-20-2026, 11:10 AM
UC San Diego Bioengineers Map Gene Function in Human Stem Cells by SEQadmin2 Started by SEQadmin2, 07-13-2026, 10:26 AM	0 responses 32 views 0 reactions	Last Post by SEQadmin2 07-13-2026, 10:26 AM
New Analysis Splits Leukemia Into 16 Epigenomic Subgroups by SEQadmin2 Started by SEQadmin2, 07-09-2026, 10:04 AM	0 responses 43 views 0 reactions	Last Post by SEQadmin2 07-09-2026, 10:04 AM
Genome-Wide CRISPR Screen Uncovers Unlikely Psoriasis Target by SEQadmin2 Started by SEQadmin2, 07-08-2026, 10:08 AM	0 responses 29 views 0 reactions	Last Post by SEQadmin2 07-08-2026, 10:08 AM

Unconfigured Ad

Running Blast+ on multiple nodes on a cluster -- what is the best way to that?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News