Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • installing SMRT Portal on a cluster

    Me again,

    So now, I've installed SMRT Portal on a better cluster, hoping to reduce time in computing. But during the installation process

    Code:
    bash smrtanalysis_2.3.0.140936.run -p smrtanalysis-patch_2.3.0.140936.p4.run --rootdir $SMRT_ROOT
    it only detect two processors (??) on the machine. There are many more. I continue to the end and notice it only use one processor to run any analysis. Looking at the documentation, I modified the file
    smrtanalysis/current/analysis/etcsmrtpipe.rc
    and set a bigger number in
    Code:
    # number of processors to use for parallel operations
    #
    NPROC = 1
    I set it to NPROC = 16/32; restart everything, though during the analysis still only using one processor. Please, any advise would be great. Thanks

  • #2
    This is still a cluster that uses SGE?

    Comment


    • #3
      Originally posted by GenoMax View Post
      This is still a cluster that uses SGE?
      Yes, this cluster uses SGE.

      Comment


      • #4
        If you have access to pacbio customer portal create a ticket there. You will get quicker service.

        Support for other job schedulers has been dismal (e.g. LSF). If that has changed then we will hear from one of the pacbio folks.

        Comment


        • #5
          Originally posted by cascoamarillo View Post
          Me again,

          So now, I've installed SMRT Portal on a better cluster, hoping to reduce time in computing. But during the installation process

          Code:
          bash smrtanalysis_2.3.0.140936.run -p smrtanalysis-patch_2.3.0.140936.p4.run --rootdir $SMRT_ROOT
          it only detect two processors (??) on the machine. There are many more. I continue to the end and notice it only use one processor to run any analysis. Looking at the documentation, I modified the file
          smrtanalysis/current/analysis/etcsmrtpipe.rc
          and set a bigger number in
          Code:
          # number of processors to use for parallel operations
          #
          NPROC = 1
          I set it to NPROC = 16/32; restart everything, though during the analysis still only using one processor. Please, any advise would be great. Thanks
          Modifying smrtpipe.rc is all that needs to be done to increase the number of processors used per task. The fact that the new value isn't working points to a larger problem (cluster config or user error) that tech support can help you solve as GenoMax pointed out.

          A few tips:
          Is it really set to 'NPROC=16/32' ? Because that won't work, the value needs to be an 'int'.

          Also, does CLUSTER_MANAGER=SGE in the same file?

          What does the hardware config on the head node look like? What OS, what how many cores? How many nodes in the cluster, and how many cores available per node?

          Comment


          • #6
            Thanks for the answers!
            Originally posted by gconcepcion View Post
            Modifying smrtpipe.rc is all that needs to be done to increase the number of processors used per task. The fact that the new value isn't working points to a larger problem (cluster config or user error) that tech support can help you solve as GenoMax pointed out.

            A few tips:
            Is it really set to 'NPROC=16/32' ? Because that won't work, the value needs to be an 'int'.
            I set first 32 and after 16, sorry for the confusion.

            Originally posted by gconcepcion View Post
            Also, does CLUSTER_MANAGER=SGE in the same file?
            Yes

            Originally posted by gconcepcion View Post
            What does the hardware config on the head node look like? What OS, what how many cores? How many nodes in the cluster, and how many cores available per node?
            Hope this is what you are asking for:
            CentOS 6.5 1 @ 32*1024
            Key: (number of nodes @) processors * GB memory
            (processor count includes hyperthreads)
            All systems have 64bit processors.

            I'm placing the entire smrtpipe.rc file:

            Code:
            #
            #   Configuration file for smrtpipe
            #
            
            #
            # uncomment to default smrtpipe to --debug mode
            #
            #DEBUG = True
            
            #
            # Set EXIT_ON_FAILURE to True if you want a DAG based SMRTpipe job to exit 
            # quickly after a failed task has been detected. Default behavior is to complete
            # as many tasks as possible.
            #
            EXIT_ON_FAILURE = True
            
            #
            # Specifies the maximum number of concurrent threads SMRTpipe will use. 
            # Each concurrent child job uses one thread.
            #
            MAX_THREADS = 8
            
            #
            # Specifies the maximum number of concurrent slots SMRTpipe will use.
            #
            MAX_SLOTS = 256
            MAX_CHUNKS = 64
            
            #
            # root path for finding usmp-data, usmp-data2 imports (internal to PacBio)
            #
            DATA_ROOT = /mnt
            #
            # tmp directory
            #
            TMP = /users/me/smrtanalysis/tmpdir
            #
            # number of processors to use for parallel operations
            #
            NPROC = 16
            #
            # path for running IronPython
            #
            IPY = ipy -X:ColorfulConsole
            
            # thresholds which define different read scopes
            # In the format scopeName:upperLimit, ...
            # where upper limit is expressed in megabases of post-filtered
            # sequence
            # These scopes are used to classify the scope of the requested
            # analysis---tested in order, first one wins
            
            READ_SCOPES = small:3.6, large:100, veryLarge:10000, huge:1000000
            DENOVO_READ_SCOPES = small: 0.15, large:15, huge:1700
            
            # thresholds which define different reference or genome scopes
            # reference scopes are defined vs. total length of the reference in kbp
            # (same logic as read scopes)
            #
            REFERENCE_SCOPES = small:10, large:1000, veryLarge:100000, huge:10000000
            DENOVO_GENOME_SCOPES = small:100, large:1000, huge:40000
            
            #
            # extension to look for when finding input hdf5 files
            # (if not fully specified)
            #
            INPUT_HDF5_EXT = bas.h5
            
            #
            # Distributed Computing Section
            #
            
            #
            # Number of cluster jobs to submit when the user specifies --distribute
            #
            NJOBS = 64
            
            #
            # Maximum number of 'chunks' to distribute when using S_* module workflows
            #
            NCHUNKS = 64
            
            #
            # Path to a shared writable directory visible to all nodes (used for distributed analysis)
            #
            SHARED_DIR = /users/me/smrtanalysis/install/smrtanalysis_2.3.0.140936/common/userdata/shared_dir
            #
            # Specify the cluster management tool used.
            # Supported: SGE
            #
            CLUSTER_MANAGER = SGE
            
            #
            # How the CeleraAssembler spec file's distribute params should be set up when --distribute is used
            #
            ca_distribute.pb2ca = useGrid:0, scriptOnGrid:0, frgCorrOnGrid:0, ovlCorrOnGrid:0
            ca_distribute.hgap  = useGrid:0, scriptOnGrid:0, frgCorrOnGrid:0, ovlCorrOnGrid:0
            
            #
            # Cloud Settings
            #
            CLOUD = False
            #CLOUD_BUCKET = @CLOUD_BUCKET
            
            #CLOUD_ACCESS_KEY = @CLOUD_ACCESS_KEY
            #CLOUD_SECRET_KEY = @CLOUD_SECRET_KEY
            
            CLOUD_CHUNKS_PER_PROC = 2
            CLOUD_LARGE_FILE_CUTOFF = 10MB
            
            
            #
            # Other defaults (usually don't need to get changed, see pbpy.smrtpipe.SmrtPipeConfig)
            #
            # Set this to enforce deterministic output
            #RANDOM_SEED=@RANDOM_SEED
            #
            # Set this to for creating the vis.jnlp which points to a SMRT View server 
            #VIS_HOST=@VIS_HOST
            #
            # (internal use only)
            #VIS_NAMESPACE=@VIS_NAMESPACE
            #
            # (internal use only)
            #SETTINGS_PATH=@SETTINGS_PATH
            #

            Comment


            • #7
              Are you starting the analysis from smrtportal, or from the command line? When you say only one processor is being used, are jobs not being submitted to the queue, or are submitted jobs only using / requesting one core? Which process exactly is only using one core?

              Comment


              • #8
                Originally posted by rhall View Post
                Are you starting the analysis from smrtportal, or from the command line? When you say only one processor is being used, are jobs not being submitted to the queue, or are submitted jobs only using / requesting one core? Which process exactly is only using one core?
                I'm using SMRT portal.
                There's a explanation of how this cluster is configure. Just copy/paste from the sys admin.
                The basic configuration is a beowulf cluster consisting of a head node and a single execution node. The scheduler is an SGE derivative. The head node has 2G of memory and 2 CPUs. The execution node has 1 terabyte of memory and 16 cores. The scheduler has 32 slots allocatable per job on the execution node.

                And this is what is found where a job is running:
                The job that is currently running is consuming 16 slots so theoretically the space is allocated. However, at the moment it only appears to be using a single processor.

                Thanks

                Comment


                • #9
                  Not all tasks are multi-threaded, some allocate a higher number of slots than threads in order to balance memory usage, i.e. some filtering tasks will allocate a high number of slots, but only use one processor (thread), high computation tasks like blasr (aligner) should be running multi-threaded and consume the 16 cores.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Strategies for Sequencing Challenging Samples
                    by seqadmin


                    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                    03-22-2024, 06:39 AM
                  • seqadmin
                    Techniques and Challenges in Conservation Genomics
                    by seqadmin



                    The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                    Avian Conservation
                    Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                    03-08-2024, 10:41 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, Yesterday, 06:37 PM
                  0 responses
                  10 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, Yesterday, 06:07 PM
                  0 responses
                  9 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 03-22-2024, 10:03 AM
                  0 responses
                  49 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 03-21-2024, 07:32 AM
                  0 responses
                  67 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X