Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Originally posted by talioto View Post
    I compiled Ray with openmpi 1.4.2, gcc version 4.1.2 20080704 (Red Hat 4.1.2-44), x86_64 architecture and run it with "mpirun -mca btl ^sm". The data is 3 simulated Illumina libraries comprising 52x coverage of a 225MB chromosome: 40x 500bp PE 95nt reads (inward facing), 8x 5kb mate paired 36nt reads (outward facing), 4x 10kb mate paired 36nt reads(outward facing).

    Using 128 cores (16 8-core nodes), it runs fine up until the "Extending seeds" step. After a while the printing of the dots seem to slow down to glacial speeds. I've let it sit for several days with no progress. Is this an open mpi problem, you think? Any ideas on getting around this problem?
    The same to u!

    Comment


    • sheepyuan: what do you mean by "The same to u!" ?

      If you refer to the post of taliato, I don't think it is a good idea to disable shared memory
      as it is the fastest way to do message passing between processes on the same machine.

      Also, Open-MPI 1.4.2 is very old. The current stable release of Open-MPI is 1.6.1.

      A lot of improvements were added in Open-MPI since 1.4.2 !

      And gcc 4.1.2 is very old too although I don't think this will change much.

      Originally posted by sheepyuan View Post
      The same to u!
      Last edited by seb567; 09-25-2012, 03:48 AM.

      Comment


      • I'm attempting to run Ray on our local cluster and after building I get an error:

        Ray: error while loading shared libraries: libmpi_cxx.so.0: cannot open shared object file: No such file or directory

        Thoughts?

        Comment


        • You need to install the openmpi package. For example, if you are using Fedora do a 'yum install openmpi openmpi-devel'. If the packages are already installed, make sure that they are in your path (you can add them to your .bash_profile). If you are trying to run Ray from a remote 'screen' job, make sure you source your .bash_profile too.

          Comment


          • Problem solved- I had forgotten to set my mpi version on the cluster using mpi-selector.

            Comment


            • Guys,

              I examine dthis tread from the very beginning but could not find answer for my problem. Sorry for silly question. I tried to install Ray 2.0.0 and failed on two machines, one SciLinux 5.5 and another RHEL 55, which are esentially the same. Here is the output:

              [Code]
              [yaximik@SciLinux55 Ray-v2.0.0]$ make PREFIX=ray-build
              make[1]: Entering directory `/home/yaximik/Bioinformatics/Ray-v2.0.0/RayPlatform'
              mpic++ -Wall -ansi -O3 -D MAXKMERLENGTH=32 -D RAY_VERSION=\"2.0.0\" -D RAYPLATFORM_VERSION=\"1.0.3\" -I. -c -o memory/ReusableMemoryStore.o memory/ReusableMemoryStore.cpp
              make[1]: mpic++: Command not found
              make[1]: *** [memory/ReusableMemoryStore.o] Error 127
              make[1]: Leaving directory `/home/yaximik/Bioinformatics/Ray-v2.0.0/RayPlatform'
              make[1]: Entering directory `/home/yaximik/Bioinformatics/Ray-v2.0.0/code'
              mpic++ -Wall -ansi -O3 -D MAXKMERLENGTH=32 -D RAY_VERSION=\"2.0.0\" -I ../RayPlatform -I. -c -o application_core/ray_main.o application_core/ray_main.cpp
              make[1]: mpic++: Command not found
              make[1]: *** [application_core/ray_main.o] Error 127
              make[1]: Leaving directory `/home/yaximik/Bioinformatics/Ray-v2.0.0/code'
              mpic++ code/TheRayGenomeAssembler.a RayPlatform/libRayPlatform.a -o Ray
              make: mpic++: Command not found
              make: *** [Ray] Error 127
              [yaximik@SciLinux55 Ray-v2.0.0]$
              [Code]


              Her is output from RHEL55

              [code]
              [[yaximik@G5NNJN1 Ray-v2.0.0]$ make PREFIX=ray-build
              make[1]: Entering directory `/home/yaximik/Bioinformatics/Ray-v2.0.0/RayPlatform'
              mpicxx -Wall -ansi -O3 -D MAXKMERLENGTH=32 -D RAY_VERSION=\"2.0.0\" -D RAYPLATFORM_VERSION=\"1.0.3\" -I. -c -o memory/ReusableMemoryStore.o memory/ReusableMemoryStore.cpp
              make[1]: mpicxx: Command not found
              make[1]: *** [memory/ReusableMemoryStore.o] Error 127
              make[1]: Leaving directory `/home/yaximik/Bioinformatics/Ray-v2.0.0/RayPlatform'
              make[1]: Entering directory `/home/yaximik/Bioinformatics/Ray-v2.0.0/code'
              mpicxx -Wall -ansi -O3 -D MAXKMERLENGTH=32 -D RAY_VERSION=\"2.0.0\" -I ../RayPlatform -I. -c -o application_core/ray_main.o application_core/ray_main.cpp
              make[1]: mpicxx: Command not found
              make[1]: *** [application_core/ray_main.o] Error 127
              make[1]: Leaving directory `/home/yaximik/Bioinformatics/Ray-v2.0.0/code'
              mpicxx code/TheRayGenomeAssembler.a RayPlatform/libRayPlatform.a -o Ray
              make: mpicxx: Command not found
              make: *** [Ray] Error 127
              [yaximik@G5NNJN1 Ray-v2.0.0]$
              [code]


              Essentially tghe same. I have

              openmpiwrappers-openmpi-1-4.el5.x86_64
              openmpi-1.4.-4.el5.x86_64
              openmpi-devel-1.4-4. el5-x86_64
              openmpi-libs-1.4-4.el5 x86_64

              installed. Both machines are 64 bit, one is 2 processor, 8 GB RAM, another is 16 processor 96GB RAM. Please help as II'd like to try Ray 2.0.0 on my project.

              Comment


              • If you look at both outputs:

                make[1]: mpicxx: Command not found

                Make sure to add the directory containing the openmpi execs to your path. Should fix the problem.

                Comment


                • Ray runs well when I use a single node, but when utilizing more than this I get an MPI exit code- like this

                  Ray:25109 terminated with signal 11 at PC=5718e0 SP=7fff9eb8a838. Backtrace:
                  /home/bstamps/Ray/Ray-v2.0.0/Ray(_ZNK14ReadAnnotation7getRankEv+0x0)[0x5718e0]
                  /home/bstamps/Ray/Ray-v2.0.0/Ray(_ZN40Adapter_RAY_MPI_TAG_REQUEST_VERTEX_READS4$
                  /home/bstamps/Ray/Ray-v2.0.0/Ray(_ZN18MessageTagExecutor11callHandlerEiP7Messag$
                  /home/bstamps/Ray/Ray-v2.0.0/Ray(_ZN11ComputeCore3runEv+0x3cc)[0x5985ec]
                  /home/bstamps/Ray/Ray-v2.0.0/Ray(_ZN7Machine5startEv+0x1d8d)[0x46906d]
                  /home/bstamps/Ray/Ray-v2.0.0/Ray(main+0x73)[0x464d73]
                  /lib64/libc.so.6(__libc_start_main+0xfd)[0x2b3fbc934cdd]
                  /home/bstamps/Ray/Ray-v2.0.0/Ray[0x464c39]
                  --------------------------------------------------------------------------
                  mpirun has exited due to process rank 4 with PID 25094 on
                  node c310 exiting without calling "finalize". This may
                  have caused other processes in the application to be
                  terminated by signals sent by mpirun (as reported here).
                  --------------------------------------------------------------------------

                  Thoughts?

                  Comment


                  • Originally posted by bstamps View Post
                    Ray runs well when I use a single node, but when utilizing more than this I get an MPI exit code- like this
                    ...
                    --------------------------------------------------------------------------
                    mpirun has exited due to process rank 4 with PID 25094 on
                    node c310 exiting without calling "finalize". This may
                    have caused other processes in the application to be
                    terminated by signals sent by mpirun (as reported here).
                    --------------------------------------------------------------------------
                    I am not a big Ray user but I will sometimes get the above problem and then when I do a re-run the problem goes away. I think that it has to do with my cluster's setup. I suggest trying a small run and put one job per node just to make sure that everything will work.

                    Not much help, I know, but the general idea is that the problem may be with your hardware setup and not with ray.

                    Comment


                    • It appears setting my ptile below the maximum per node (16) has solved the problem...I'll have to go bug my computing center as to why 15 is kosher and 16 causes MPI to die. Either way I'm very happy with Ray's performance- being able to span my job across 4500 cores has sped assembly up quite a bit...

                      Comment


                      • I spoke a little too soon- Ray appears to be throwing segmentation faults randomly through the assembly process on random nodes. Adding in "route-messages" seems to have helped, but my jobs still fail every so often. The computing center seem to think it's an issue with Ray, but I'm curious as to what the community thinks.

                        Comment


                        • Originally posted by bstamps View Post
                          Ray runs well when I use a single node, but when utilizing more than this I get an MPI exit code- like this

                          Ray:25109 terminated with signal 11 at PC=5718e0 SP=7fff9eb8a838. Backtrace:
                          /home/bstamps/Ray/Ray-v2.0.0/Ray(_ZNK14ReadAnnotation7getRankEv+0x0)[0x5718e0]
                          /home/bstamps/Ray/Ray-v2.0.0/Ray(_ZN40Adapter_RAY_MPI_TAG_REQUEST_VERTEX_READS4$
                          /home/bstamps/Ray/Ray-v2.0.0/Ray(_ZN18MessageTagExecutor11callHandlerEiP7Messag$
                          /home/bstamps/Ray/Ray-v2.0.0/Ray(_ZN11ComputeCore3runEv+0x3cc)[0x5985ec]
                          /home/bstamps/Ray/Ray-v2.0.0/Ray(_ZN7Machine5startEv+0x1d8d)[0x46906d]
                          /home/bstamps/Ray/Ray-v2.0.0/Ray(main+0x73)[0x464d73]
                          /lib64/libc.so.6(__libc_start_main+0xfd)[0x2b3fbc934cdd]
                          /home/bstamps/Ray/Ray-v2.0.0/Ray[0x464c39]
                          --------------------------------------------------------------------------
                          mpirun has exited due to process rank 4 with PID 25094 on
                          node c310 exiting without calling "finalize". This may
                          have caused other processes in the application to be
                          terminated by signals sent by mpirun (as reported here).
                          --------------------------------------------------------------------------

                          Thoughts?
                          Hi,

                          Ray v2.1.0 was released today. There are a lot of bug fixes, with 2 fixes for 2 bugs that could lead to segmentation faults.

                          Comment


                          • Originally posted by westerman View Post
                            I am not a big Ray user but I will sometimes get the above problem and then when I do a re-run the problem goes away. I think that it has to do with my cluster's setup. I suggest trying a small run and put one job per node just to make sure that everything will work.

                            Not much help, I know, but the general idea is that the problem may be with your hardware setup and not with ray.
                            It sounds like a race condition. The bug may be in Ray, who knows.

                            Comment


                            • Originally posted by bstamps View Post
                              It appears setting my ptile below the maximum per node (16) has solved the problem...I'll have to go bug my computing center as to why 15 is kosher and 16 causes MPI to die. Either way I'm very happy with Ray's performance- being able to span my job across 4500 cores has sped assembly up quite a bit...
                              What is "ptile" ? Are you using a fancy architecture (Cray XE6 or Blue Gene /Q for instance) ?

                              Originally posted by bstamps View Post
                              across 4500 cores
                              I guess you are playing with fancy hardware, right ?

                              Comment


                              • Originally posted by bstamps View Post
                                I spoke a little too soon- Ray appears to be throwing segmentation faults randomly through the assembly process on random nodes. Adding in "route-messages" seems to have helped, but my jobs still fail every so often. The computing center seem to think it's an issue with Ray, but I'm curious as to what the community thinks.
                                It can possibly be a bug in Ray. Every software has bugs. Can you try with the new Ray v2.1.0 to see if the numerous bug fixes alleviate your problem ?

                                Can you send an email on the list with your hardware and Ray command ?

                                Pure MPI applications may not be the answer for very large clusters, hybrid programming models are likely better.

                                We have work in progress on a new hybrid programming model. At the moment, Ray only uses MPI (v2.1.0 for instance). So when you run on 8 nodes * 24 cores / node = 192 cores, Ray is launched on 192 processes, with 24 processes per node.

                                We have devised a new programming model called "mini-ranks". If you Google "mini-ranks", you will mostly find hits about Lego blocks because "mini-ranks" in parallel programming is new as I believe we invented that ourselves !

                                Our implementation of the mini-ranks model can use 1 MPI process per node, 23 POSIX threads per process and an additional communication thread for each node. The mini-ranks run inside POSIX threads and the MPI rank actually does not do much.

                                Ray is already ported to that model (mini-ranks implemented with MPI+POSIX threads) in the git source tree.

                                Instead of launching like this:

                                mpiexec -n 192 Ray ...

                                You launch it like this:

                                mpiexec -n 8 -bynode Ray -mini-ranks-per-rank 23 ...

                                Note that our "mini-ranks" implementation needs 1 thread for communication for each node.

                                Although this is experimental, you may be interested to test that on your hardware.


                                The branch is called minirank-model should you want to check that.


                                Sébastien Boisvert
                                Ray maintainer

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Essential Discoveries and Tools in Epitranscriptomics
                                  by seqadmin




                                  The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                                  04-22-2024, 07:01 AM
                                • seqadmin
                                  Current Approaches to Protein Sequencing
                                  by seqadmin


                                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                  04-04-2024, 04:25 PM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, Yesterday, 08:47 AM
                                0 responses
                                12 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-11-2024, 12:08 PM
                                0 responses
                                60 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 10:19 PM
                                0 responses
                                59 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 09:21 AM
                                0 responses
                                54 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X