SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > Pacific Biosciences



Similar Threads
Thread Thread Starter Forum Replies Last Post
Bam output from SMRT Portal cascoamarillo Pacific Biosciences 3 04-29-2015 11:39 AM
PacBio assembly using SMRT portal manjari.deshmukh General 11 04-02-2015 08:53 AM
Importing Illumina data onto SMRT portal tonybert Pacific Biosciences 20 03-20-2015 08:30 AM
RS_CeleraAssembler not included in SMRT portal v2.3 macb Pacific Biosciences 5 02-02-2015 07:37 AM
SMRT portal errors bsp017 Pacific Biosciences 3 05-26-2014 04:57 AM

Reply
 
Thread Tools
Old 05-20-2015, 08:08 AM   #1
cascoamarillo
Senior Member
 
Location: MA

Join Date: Oct 2010
Posts: 160
Default installing SMRT Portal on a cluster

Me again,

So now, I've installed SMRT Portal on a better cluster, hoping to reduce time in computing. But during the installation process

Code:
bash smrtanalysis_2.3.0.140936.run -p smrtanalysis-patch_2.3.0.140936.p4.run --rootdir $SMRT_ROOT
it only detect two processors (??) on the machine. There are many more. I continue to the end and notice it only use one processor to run any analysis. Looking at the documentation, I modified the file
smrtanalysis/current/analysis/etcsmrtpipe.rc
and set a bigger number in
Code:
# number of processors to use for parallel operations
#
NPROC = 1
I set it to NPROC = 16/32; restart everything, though during the analysis still only using one processor. Please, any advise would be great. Thanks
cascoamarillo is offline   Reply With Quote
Old 05-20-2015, 08:22 AM   #2
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,077
Default

This is still a cluster that uses SGE?
GenoMax is offline   Reply With Quote
Old 05-20-2015, 11:45 AM   #3
cascoamarillo
Senior Member
 
Location: MA

Join Date: Oct 2010
Posts: 160
Default

Quote:
Originally Posted by GenoMax View Post
This is still a cluster that uses SGE?
Yes, this cluster uses SGE.
cascoamarillo is offline   Reply With Quote
Old 05-20-2015, 11:51 AM   #4
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,077
Default

If you have access to pacbio customer portal create a ticket there. You will get quicker service.

Support for other job schedulers has been dismal (e.g. LSF). If that has changed then we will hear from one of the pacbio folks.
GenoMax is offline   Reply With Quote
Old 05-20-2015, 01:08 PM   #5
gconcepcion
Member
 
Location: Menlo Park

Join Date: Dec 2010
Posts: 68
Default

Quote:
Originally Posted by cascoamarillo View Post
Me again,

So now, I've installed SMRT Portal on a better cluster, hoping to reduce time in computing. But during the installation process

Code:
bash smrtanalysis_2.3.0.140936.run -p smrtanalysis-patch_2.3.0.140936.p4.run --rootdir $SMRT_ROOT
it only detect two processors (??) on the machine. There are many more. I continue to the end and notice it only use one processor to run any analysis. Looking at the documentation, I modified the file
smrtanalysis/current/analysis/etcsmrtpipe.rc
and set a bigger number in
Code:
# number of processors to use for parallel operations
#
NPROC = 1
I set it to NPROC = 16/32; restart everything, though during the analysis still only using one processor. Please, any advise would be great. Thanks
Modifying smrtpipe.rc is all that needs to be done to increase the number of processors used per task. The fact that the new value isn't working points to a larger problem (cluster config or user error) that tech support can help you solve as GenoMax pointed out.

A few tips:
Is it really set to 'NPROC=16/32' ? Because that won't work, the value needs to be an 'int'.

Also, does CLUSTER_MANAGER=SGE in the same file?

What does the hardware config on the head node look like? What OS, what how many cores? How many nodes in the cluster, and how many cores available per node?
gconcepcion is offline   Reply With Quote
Old 05-20-2015, 01:42 PM   #6
cascoamarillo
Senior Member
 
Location: MA

Join Date: Oct 2010
Posts: 160
Default

Thanks for the answers!
Quote:
Originally Posted by gconcepcion View Post
Modifying smrtpipe.rc is all that needs to be done to increase the number of processors used per task. The fact that the new value isn't working points to a larger problem (cluster config or user error) that tech support can help you solve as GenoMax pointed out.

A few tips:
Is it really set to 'NPROC=16/32' ? Because that won't work, the value needs to be an 'int'.
I set first 32 and after 16, sorry for the confusion.

Quote:
Originally Posted by gconcepcion View Post
Also, does CLUSTER_MANAGER=SGE in the same file?
Yes

Quote:
Originally Posted by gconcepcion View Post
What does the hardware config on the head node look like? What OS, what how many cores? How many nodes in the cluster, and how many cores available per node?
Hope this is what you are asking for:
CentOS 6.5 1 @ 32*1024
Key: (number of nodes @) processors * GB memory
(processor count includes hyperthreads)
All systems have 64bit processors.

I'm placing the entire smrtpipe.rc file:

Code:
#
#   Configuration file for smrtpipe
#

#
# uncomment to default smrtpipe to --debug mode
#
#DEBUG = True

#
# Set EXIT_ON_FAILURE to True if you want a DAG based SMRTpipe job to exit 
# quickly after a failed task has been detected. Default behavior is to complete
# as many tasks as possible.
#
EXIT_ON_FAILURE = True

#
# Specifies the maximum number of concurrent threads SMRTpipe will use. 
# Each concurrent child job uses one thread.
#
MAX_THREADS = 8

#
# Specifies the maximum number of concurrent slots SMRTpipe will use.
#
MAX_SLOTS = 256
MAX_CHUNKS = 64

#
# root path for finding usmp-data, usmp-data2 imports (internal to PacBio)
#
DATA_ROOT = /mnt
#
# tmp directory
#
TMP = /users/me/smrtanalysis/tmpdir
#
# number of processors to use for parallel operations
#
NPROC = 16
#
# path for running IronPython
#
IPY = ipy -X:ColorfulConsole

# thresholds which define different read scopes
# In the format scopeName:upperLimit, ...
# where upper limit is expressed in megabases of post-filtered
# sequence
# These scopes are used to classify the scope of the requested
# analysis---tested in order, first one wins

READ_SCOPES = small:3.6, large:100, veryLarge:10000, huge:1000000
DENOVO_READ_SCOPES = small: 0.15, large:15, huge:1700

# thresholds which define different reference or genome scopes
# reference scopes are defined vs. total length of the reference in kbp
# (same logic as read scopes)
#
REFERENCE_SCOPES = small:10, large:1000, veryLarge:100000, huge:10000000
DENOVO_GENOME_SCOPES = small:100, large:1000, huge:40000

#
# extension to look for when finding input hdf5 files
# (if not fully specified)
#
INPUT_HDF5_EXT = bas.h5

#
# Distributed Computing Section
#

#
# Number of cluster jobs to submit when the user specifies --distribute
#
NJOBS = 64

#
# Maximum number of 'chunks' to distribute when using S_* module workflows
#
NCHUNKS = 64

#
# Path to a shared writable directory visible to all nodes (used for distributed analysis)
#
SHARED_DIR = /users/me/smrtanalysis/install/smrtanalysis_2.3.0.140936/common/userdata/shared_dir
#
# Specify the cluster management tool used.
# Supported: SGE
#
CLUSTER_MANAGER = SGE

#
# How the CeleraAssembler spec file's distribute params should be set up when --distribute is used
#
ca_distribute.pb2ca = useGrid:0, scriptOnGrid:0, frgCorrOnGrid:0, ovlCorrOnGrid:0
ca_distribute.hgap  = useGrid:0, scriptOnGrid:0, frgCorrOnGrid:0, ovlCorrOnGrid:0

#
# Cloud Settings
#
CLOUD = False
#CLOUD_BUCKET = @CLOUD_BUCKET

#CLOUD_ACCESS_KEY = @CLOUD_ACCESS_KEY
#CLOUD_SECRET_KEY = @CLOUD_SECRET_KEY

CLOUD_CHUNKS_PER_PROC = 2
CLOUD_LARGE_FILE_CUTOFF = 10MB


#
# Other defaults (usually don't need to get changed, see pbpy.smrtpipe.SmrtPipeConfig)
#
# Set this to enforce deterministic output
#RANDOM_SEED=@RANDOM_SEED
#
# Set this to for creating the vis.jnlp which points to a SMRT View server 
#VIS_HOST=@VIS_HOST
#
# (internal use only)
#VIS_NAMESPACE=@VIS_NAMESPACE
#
# (internal use only)
#SETTINGS_PATH=@SETTINGS_PATH
#
cascoamarillo is offline   Reply With Quote
Old 05-21-2015, 12:29 PM   #7
rhall
Senior Member
 
Location: San Francisco

Join Date: Aug 2012
Posts: 322
Default

Are you starting the analysis from smrtportal, or from the command line? When you say only one processor is being used, are jobs not being submitted to the queue, or are submitted jobs only using / requesting one core? Which process exactly is only using one core?
rhall is offline   Reply With Quote
Old 05-26-2015, 07:02 AM   #8
cascoamarillo
Senior Member
 
Location: MA

Join Date: Oct 2010
Posts: 160
Default

Quote:
Originally Posted by rhall View Post
Are you starting the analysis from smrtportal, or from the command line? When you say only one processor is being used, are jobs not being submitted to the queue, or are submitted jobs only using / requesting one core? Which process exactly is only using one core?
I'm using SMRT portal.
There's a explanation of how this cluster is configure. Just copy/paste from the sys admin.
The basic configuration is a beowulf cluster consisting of a head node and a single execution node. The scheduler is an SGE derivative. The head node has 2G of memory and 2 CPUs. The execution node has 1 terabyte of memory and 16 cores. The scheduler has 32 slots allocatable per job on the execution node.

And this is what is found where a job is running:
The job that is currently running is consuming 16 slots so theoretically the space is allocated. However, at the moment it only appears to be using a single processor.

Thanks
cascoamarillo is offline   Reply With Quote
Old 05-26-2015, 09:58 AM   #9
rhall
Senior Member
 
Location: San Francisco

Join Date: Aug 2012
Posts: 322
Default

Not all tasks are multi-threaded, some allocate a higher number of slots than threads in order to balance memory usage, i.e. some filtering tasks will allocate a high number of slots, but only use one processor (thread), high computation tasks like blasr (aligner) should be running multi-threaded and consume the 16 cores.
rhall is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:45 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO