Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Titanium software (MPI mode)

    Is anyone running the new titanium software on an MPI cluster? I can get the software to run the verification dataset in 'multi' mode (i.e., using all of the CPUs on a single box) but when I try MPI-mode, even on a single box, then after a while program does SIGTERMs on the child processes and then stops.

    Advice or comments appreciated.

  • #2
    We ourselves have run into this problem as well. Basically, the processes get SIGTERM'd for us because the Linux kernel is killing the processes off because all the system's memory has been consumed by them... We also had aspirations of using Sun Grid Engine with a properly configured MPI parallel environment but I don't think we can get there until we can get direct MPI usage working reliably...

    Comment


    • #3
      Don't think this will help with your problems but I thought I would post some notes from my experience getting MPI working on a single 8 core redhat system.

      1. have had persistent problems getting the pipeline to work with more than 6 cores. Running with >6 cores leads to hard lockup, but have not had time to track this down have run several runs with 6 cores takes ~12 hours to process

      2.
      needed to add the following to .bashrc or similar
      ulimit -l unlimited
      export RIGDIR="/opt/454"
      export LD_LIBRARY_PATH=/usr/lib64/openmpi/1.2.5-gcc/lib/
      export PATH=$PATH:/opt/454:/usr/lib/openmpi/1.2.5-gcc/bin
      export GS_LAUNCH_MODE="MPI"
      export GS_MPIARGS=" --n 6 "
      export GS_XML_PORT=4540
      export GS_CACHEDIR=/cache

      note you may need to make some changes to /etc/security/limits.conf to have the ulimit work. You will know that this is a problem if the runPipeAnalysis complains that it can only allocate 32k

      the command to run the analysis then would be runAnalysisPipe R_2008_11...

      this will run with 6 process (based on GS_MPIARGS)

      Documentation on this was sparse so not sure if it is canonically correct but it seems to work.

      Any other info on MPI for titanium?

      Comment


      • #4
        Titanium off rig analysis

        I've been having 'fun' trying to get the titanium off-rig analysis to work properly on a small linux cluster running Sun Grid Engine. We've had limited success.

        I would be deeply grateful for anyone else's thoughts on this.

        Here are some notes, in case they might help anyone else:

        * The cluster consists of nine Linux servers running Centos 5.
        * Each machine has 8 cores of x86_64, and 8GB of RAM.
        * All nodes are connected via gigabit ethernet to a 90TB NFS share.
        * The cluster is in moderate use for BLAST and other standard bioinformatic processing, and has never seen lockups or crashes before.

        Environment variables that seem important to runAnalysisPipe are:

        * export GS_MPIARGS="--n $NSLOTS --machinefile $TMPDIR/machines"
        * export GS_LAUNCH_MODE=MPI
        * export PATH=${PATH}:/opt/454/bin

        I'm very curious to try this "GS_CACHEDIR", but I don't know what it does.

        Note that the lines above are from my SGE job submission script. $NSLOTS and $TMPDIR/machines are created by a wrapper script and get set up based on how I submit the job. $NSLOTS is "how many parallel threads to start.". The machines files is a list of hostnames to start them on.

        I found the "--progress" and "--verbose" flags to be quite useful in figuring out if processing is making progress or not.

        We also encountered the hard-lockup behavior. I still have no idea of the *cause* of these lockups - but we've managed to work around. Here are my observations:

        * openmpi jobs run on a single machine never finish, no matter how many threads I give them (1, 2, 4, 8, 16). I wave my hands in the direction of "16GB of RAM required".

        * If I start 8 threads, four on each of two machines, those jobs run in a few hours.

        * If I start more than 4 threads on any one machine, I have high odds of locking up (requiring a hard power cycle) at least one of the machines involved in that run.

        * If I run threads from more than one job at a time on a particular machine, odds are high that I will lock up that machine.

        * I can run 4 BLAST jobs and 4 threads of gsRunProcessor without too much contention on the same 8 core machine.

        * gsRunProcessor leaves zombie processes all over the place when one of the compute nodes locks up during a run. I encounter fewer lockups if I clean those up prior to starting a run. This is made simple by the observation that I can't run two jobs on the same node anyway.

        There is some correlation of the node lock-ups with heavy loads on the NFS file server - but I have yet to encounter any smoking gun with this.

        Anyone else?

        Comment


        • #5
          Originally posted by cdwan View Post
          I've been having 'fun' trying to get the titanium off-rig analysis to work properly on a small linux cluster running Sun Grid Engine. We've had limited success.

          I would be deeply grateful for anyone else's thoughts on this.

          Here are some notes, in case they might help anyone else:

          * The cluster consists of nine Linux servers running Centos 5.
          * Each machine has 8 cores of x86_64, and 8GB of RAM.
          * All nodes are connected via gigabit ethernet to a 90TB NFS share.
          * The cluster is in moderate use for BLAST and other standard bioinformatic processing, and has never seen lockups or crashes before.

          Environment variables that seem important to runAnalysisPipe are:

          * export GS_MPIARGS="--n $NSLOTS --machinefile $TMPDIR/machines"
          * export GS_LAUNCH_MODE=MPI
          * export PATH=${PATH}:/opt/454/bin

          I'm very curious to try this "GS_CACHEDIR", but I don't know what it does.

          Note that the lines above are from my SGE job submission script. $NSLOTS and $TMPDIR/machines are created by a wrapper script and get set up based on how I submit the job. $NSLOTS is "how many parallel threads to start.". The machines files is a list of hostnames to start them on.

          I found the "--progress" and "--verbose" flags to be quite useful in figuring out if processing is making progress or not.

          We also encountered the hard-lockup behavior. I still have no idea of the *cause* of these lockups - but we've managed to work around. Here are my observations:

          * openmpi jobs run on a single machine never finish, no matter how many threads I give them (1, 2, 4, 8, 16). I wave my hands in the direction of "16GB of RAM required".

          * If I start 8 threads, four on each of two machines, those jobs run in a few hours.

          * If I start more than 4 threads on any one machine, I have high odds of locking up (requiring a hard power cycle) at least one of the machines involved in that run.

          * If I run threads from more than one job at a time on a particular machine, odds are high that I will lock up that machine.

          * I can run 4 BLAST jobs and 4 threads of gsRunProcessor without too much contention on the same 8 core machine.

          * gsRunProcessor leaves zombie processes all over the place when one of the compute nodes locks up during a run. I encounter fewer lockups if I clean those up prior to starting a run. This is made simple by the observation that I can't run two jobs on the same node anyway.

          There is some correlation of the node lock-ups with heavy loads on the NFS file server - but I have yet to encounter any smoking gun with this.

          Anyone else?
          Hello,

          First of all, I want to thank you for sharing your experience. I've been searching the web for information on how to setup the GS FLX Titanium software with Sun Grid Engine and your post is the first concrete reference that I've found.

          So, basically, I don't have experience with the software nor Sun Grid Engine and I'm trying to setup an "off rig" cluster with SGE (though challenge). The (would be) cluster specs are:

          * 1 head node with 8 cores, x86_64, 16 GB of RAM;
          * 3 nodes with 4 cores, x86_64, 8 GB of RAM;
          * CentOS 5.3;
          * ~ 4 TB of storage.

          The GS FLX Titanium software is already installed, on all nodes, with OpenMPI support.

          I would be really great if you could share any information about how to setup Sun Grid Engine with this software like a tutorial, howtos, wikis, or even any *good* documentation about setting up SGE, its architecture, etc., would be excellent!.

          Regarding the NFS server lockup: you could try sending the system and kernel logs to a remote syslog and see if the (high) load triggers some sort of kernel panic, just a thought...

          If I can put the cluster together, I'll be happy to share our experiences

          Thank you!

          Best regards,
          Joao

          Comment


          • #6
            Does the Newbler application 'runAssembly' work in an SGE environment? The Celera Assembler at least has some docs on this (although they look complex). I don't even know how to begin to submit my runAssembly to 'the cluster'.

            AFAICT, we have several 16 Gb 8 core boxes. I am trying to assemble 2 full runs of GS FLX Titanium (~1 Bn bases at ~400 bp per run).

            The progress of the assembly seems to get slower and slower ... (or perhaps I'm getting increasingly impatient). I did get CA to run on this data, giving me an assembly in about 1 day (on one box).

            Thanks for any hints,

            Dan.
            Last edited by dan; 06-22-2009, 05:32 AM. Reason: Made it clear that runAssembly is distributed by Roche as a part of Newbler
            Homepage: Dan Bolser
            MetaBase the database of biological databases.

            Comment


            • #7
              hi everybody.

              I am having a hard time keeping up with analysing data those lab people keep generating... So i just need a quickie answer on my question before it start losing time again with figuring it out myself...

              I configured the sequencer so that it automatically transfers everything after the imageprocessing step to our monster server to do basecalling.

              My questions. This is the first shotgun experiment ever and also the first Titanium run. So I am a bit confused because I know Titanium has software updates but i dont know if I already installed it on the server.

              So my questions: gsRunProcessor 2.0.00.22 (Build 184) -> is that Titanium ok software? Second question, is anything different for shotgun basecalling compared to amplicon basecalling?

              I have the .cwf files here, and i was planning on hitting the 'runanalysispipe'. Will that work all right?

              Everybody here keeps telling me i have to lookout for the software versions when using Titanium, but i guess there is not much i can do wrong when all i need to do is basecalling, right? I have my own tools to process the fasta/fastq files so i think i should not worry, right?

              greetings from belgium

              Comment


              • #8
                [QUOTE=joa_ds;6081]hi everybody.
                So my questions: gsRunProcessor 2.0.00.22 (Build 184) -> is that Titanium ok software?
                It certainly is not the latest. I am running 2.0.01.12 for my Titanium runs. Is there any reason why you can't upgrade to the latest version in order to make sure everything is compatible?

                Second question, is anything different for shotgun basecalling compared to amplicon basecalling?
                I am not sure but I believe not.

                Comment


                • #9
                  Version sounds fine. Don't know about other questions.

                  Oh... I just saw westerman's reply. I'd run the latest version... where is that downloadable from?
                  Homepage: Dan Bolser
                  MetaBase the database of biological databases.

                  Comment


                  • #10
                    hi, i just received an email with the new version 1.12 thingie. I installed it and I started my basecalling again and it appears that is working from the beginning now... I am hopefully

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Essential Discoveries and Tools in Epitranscriptomics
                      by seqadmin


                      The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
                      Yesterday, 07:01 AM
                    • seqadmin
                      Current Approaches to Protein Sequencing
                      by seqadmin


                      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                      04-04-2024, 04:25 PM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 04-11-2024, 12:08 PM
                    0 responses
                    39 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 10:19 PM
                    0 responses
                    41 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 09:21 AM
                    0 responses
                    35 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-04-2024, 09:00 AM
                    0 responses
                    55 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X