Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Release of Ray v2.1.0 (mostly bug fixes)

    Hello,

    Ray v2.0.0 was released on 2012-06-22. It is time to release Ray v2.1.0 !

    It is available directly at



    Documentation was added for the metagenomics solutions called 'Ray Méta',
    'Ray Communities', and 'Ray Ontologies' that are implemented in Ray plugins.

    Changes in bioinformatics algorithm implementations:

    Changes include a new data reliability option, options to control the maximum (or
    minimum) accepted k-mer coverage, a fix for a race condition in the plugin that colors
    the graph, new options for the storage engine, faster network tests, fixes for input files
    compressed with bunzip2, ability to disable scaffolding, various portability fixes, patches
    for twin k-mers (efficient storage), faster building of the distributed graph,

    Changes in the runtime engine:

    The distributed storage backend was optimized, added hardware acceleration with pop count
    when available, new registration system for plugins, bug fixes in the hash table, default
    communication model is now MPI_Iprobe / MPI_ANY_SOURCE, new routines for dirty buffer
    management, polytope communication graph.

    Full list:

    ---
    Changes between Ray v2.0.0 and Ray v2.1.0:

    100 files changed, 4294 insertions(+), 2398 deletions(-)

    Pier-Luc Plante (3):
    Scaffolder is not required when using unpaired reads.
    Patch Koala: Added an option (-use-maximum-seed-coverage) so that higly-covered seeds can be ignored.
    Corrected the tet that determines the quality control results. There was too much false negatives. The returned value is more reliable now.

    Sébastien Boisvert (142):
    The copyright was updated to add 2012.
    When there are 508 reads and 32 MPI ranks, the number of reads per rank is 508/32= 15. Therefore, assuming a perfect division read number 495 would be on MPI rank 33 (495/15 = 33). This makes Ray crash. This change set corrects this.
    A list of releases was added.
    The codename of the next release will be "Ancient Granularity of Epochs".
    An assertion was added for the performance scaled messaging related bug.
    Two assertions were added to detect possible message corruption.
    The help page was update to add the data reliability option. Signed-off-by: Sébastien Boisvert <[email protected]>
    The peak finder was modified to pass new tests.
    I edited the guide to submit changes.
    The manual now includes the new option for overly-covered seeds.
    A error was fixed in the file that says how to submit changes.
    The return statement was misplaced in a recent patch.
    I added the names 'Ray Méta', 'Ray Communities', and 'Ray Ontologies'.
    An assertion was added to make sure that data is not overwritten.
    Searcher: added verbose statements
    Searcher: fixed a race condition
    Searcher: added a missing value.
    SeedExtender: moved system calls inside this plugin
    SeedExtender: modified the code for hot skipping
    SeedExtender: implemented hot skipping
    Parameters: 4 options were added to change distributed storage behavior.
    Documentation: Ray can be run with a single configuration file containing options.
    The default load factor threshold was changed to 0.75.
    The methods setKey() and getKey() were added to KmerCandidate and Vertex classes for compatibility with MyHashTable.
    If the hash table is verbose, ask it to display its status.
    NetworkTest: added the option -skip-network-test to skip the network test.
    Added a new option to enable genome neighbourhood calculation. The option is -find-neighbourhoods
    I added some code to detect windows 32 bits and windows 64 bits.
    More parameters for compilation can be provided with EXTRA=...
    Porting Ray to the new RayPlatform: removed macro calls in .h files.
    Porting Ray to the new RayPlatform: removed remaining codes in .h.
    Porting Ray to the new RayPlatform: removed token 'generated_automatically'.
    Porting Ray to the new RayPlatform: added CreatePlugin and BindPlugin instructions.
    Porting Ray to the new RayPlatform: updated the macro names in C++ plugin files.
    Porting Ray to the new RayPlatform: removed adapter from plugin class definitions.
    Porting Ray to the new RayPlatform: remove calls to setObject.
    Porting Ray to the new RayPlatform: Ray compiles with the simplified RayPlatform adapters now.
    I removed handlers from the cmake file.
    Updating the manual.
    SeedExtender: changed the verbosity period.
    Removed some output from the computation of seeds.
    The manual was updated to include pointers to documentation.
    If you run Ray with a configuration file (mpiexec -n 4 Ray Ray.conf) you can start comments with the '#' symbol like in python.
    Information to compile Ray with gcc was added.
    The default number of buckets is now 1048576. The default number of buckets per group is still 64, so that is only 16384 groups with almost no memory usage because it is sparse.
    This fixes a input/output bug for the Ray configuration file.
    The code that randomizes the arguments was removed because it can lead to bugs. This also simplifies checkpointing.
    The edge purging should be done in a massively parallel way unless the option -write-kmers was provided.
    Merge branch 'master' of https://github.com/plpla/ray into pl
    I added a script to build Ray with link time optimization.
    The EXTRA commands are also given to the linking command.
    I added -fwhole-program for better optimization.
    I added compilation flags for compression.
    I added instructions to build Ray with link time optimization.
    NetworkTest: the number of test messages is now constant regardless of the number of MPI ranks in the communicator.
    application_core: added a call to obtain a string configuration token.
    KmerAcademyBuilder: option -bloom-filter-bits can sets the number of bits.
    KmerAcademyBuilder: Bloom filter has 64 M bits by default.
    Merge branch 'master' of github.com:sebhtml/ray
    Merge branch 'master' of github.com:sebhtml/ray
    SequencesLoader: added a 'please wait' before counting entries in a file.
    SequencesLoader: a bz2 file can contain many compressed streams. Each of them needs to be opened, read (until BZ_STREAM_END), and closed.
    application_core: bugs were fixed in the configuration routines.
    GeneOntology: removed the use of argv
    Merge branch 'master' of github.com:sebhtml/ray
    Merge branch 'master' of github.com:sebhtml/ray
    Fixed an integer overflow in the distributed storage engine.
    A path with 0 k-mers has 0 nucleotides, not 0-k+1.
    Merge branch 'master' of github.com:sebhtml/ray
    A new routing graph is available: the hypercube.
    Documentation: documented the hypercube features of Ray.
    core: the default number of buckets is now 268435456 per rank.
    scaffolder: it can be disabled with -disable-scaffolder
    normalized option names with -enable-* and -disable-*
    documentation: moved assembly options up
    core: added documentation for class Parameters.
    SeedingData: -use-minimum-seed-coverage changes the minimum
    documentation: added missing operands in the manual and -help page
    core: Ray -version provides more compile flags like popcnt and sse
    SeedingData: seeds can not contain k-mers with too low coverage
    build: the C++ standard is C++ 1998. gcc -ansi provides that
    Searcher: large integer constants needs ULL for portability
    SeedExtender: added additional information for an error
    MessageProcessor: k-mer data messages should never be discarded
    VerticesExtractor: don't flush while waiting for messages
    KmerAcademyBuilder: only send the forward k-mer, not the lower
    VerticesExtractor: improved the code quality for easier reading
    MessageProcessor: don't discard k-mers while receiving messages
    VerticesExtractor: store twin edges in a single source
    EdgePurger: any edge is removed only if a end is not in the graph
    MessageProcessor: removed a call to a private attribute
    Documentation: added a document about profiling Ray
    Documentation: added information about elapsed time
    BuildSystem: added a strip command to reduce the memory footprint
    BuildSystem: replaced -ansi with -std=c++98 for more verbosity
    Documentation: updated the author file
    KmerAcademyBuilder: removed the k-mer academy
    VerticesExtractor: this module extracts vertices to add edges
    Merge branch 'kill-kmer-academy'
    MessageProcessor: new text to show when the Bloom filter is created
    KmerAcademyBuilder: added the number of set bits in the Bloom filter
    MessageProcessor: added a warning when the oracle is half full
    KmerAcademyBuilder: the Bloom filter can have any number of bits
    Merge branch 'bloom-features'
    MessageProcessor: coverage depth starts at 1 with Bloom filters
    MessageProcessor: the thresold is 50.0 (50.0%), not 0.5
    KmerAcademyBuilder: added the number of filtered k-mers
    Merge branch 'bug-hunting'
    application_core: added routing with a convex regular polytope
    NetworkTest: the number of exchange can be changed with -exchanges
    Documentation: added options for a 64-rank polytope
    Documentation: updated the taxonomy documentation
    NetworkTest: added average round trip latency
    scripts: initial version of a script to create NCBI taxonomy
    scripts: download NCBI bacterial genomes too
    Merge branch 'master' of github.com:sebhtml/ray
    Documentation: added documentation for NCBI taxonomy
    Documentation: simplified the usage of the tool to pull NCBI data Signed-off-by: Sébastien Boisvert <[email protected]>
    scripts: the script that pulls NCBI data is almost ready
    scripts: the script that pulls NCBI stuff is ready
    Documentation: added information about XML files
    Partitioner: also create a file FilePartition.txt
    MachineHelper: don't run the AMOS code path if not necessary
    Parameters: throw a warning when distances are invalid
    Merge branch 'for-seb-September-2012'
    Searcher: fixed a race condition where a message was lost
    Calls to deprecated methods were eliminated.
    This is Ray v2.1.0-rc0 "Ancient Granularity of Epochs"
    Searcher: browsing the distributed colored de Bruijn subgraph
    Searcher: find or create a virtual color from physical colors
    Searcher: added physical color in SequenceAbundances.xml
    Searcher: fixed assertion code
    scripts: don't ship the example and only ship the bz2 distribution
    SequencesLoader: fixed the scope of a buffer
    Searcher: removed debug messages from stable release
    Documentation: added more documentation for gene ontology.
    Searcher: fixed buffer overflow
    Searcher: fixed compilation warnings
    Searcher: GraphBrowsing.xml needs -one-color-per-file
    This is the branch for Ray v2.1.0-rc1
    Related git repositories were added in the README.
    Ray v2.1.0

    ---
    Changes between RayPlatform v1.0.3 and RayPlatform v1.1.0:

    52 files changed, 3215 insertions(+), 1244 deletions(-)

    Sébastien Boisvert (58):
    A release list was added.
    Message checksum are calculated by default for any non-empty message by RayPlatform.
    The option -verify-message-integrity must be provided to enable message integrity verification in RayPlatform. By default, the checksum is calculated by the software.
    A integer comparison was fixed.
    I implemented a system of annotation for buffers. With this, RayPlatform knows which buffer is dirty (possibly available, but maybe not) and which buffer is available.
    I fixed a typographical error in the documentation.
    I added a comment for dirty buffers. Because MPI_Request objects are usually "completed" before the message is actually on the destination, I don't think the RayPlatform virtual machine is going to run out of non-dirty buffer.
    The latency on a IBM iDataPlex (guillimin at McGill) for a Ray job of 36 cores was reduced from 23 to 17 microseconds (back and forth).
    I cleaned the persistent communication code.
    Merge branch 'master' of github.com:sebhtml/RayPlatform
    The three communication models were documented in the source code. The three models are:
    The constructor of the hash table now takes the number of buckets, the number of buckets per group, and load factor threshold as well as the verbosity.
    structures: increased portability of the hash table code.
    The class for hash table groups was moved to its own file.
    This fixes a bug introduced while working on the portability.
    The table prints its status after completion of the resizing, when in verbose mode.
    I added David Weese of Free University of Berlin in the code as he reviewed the hash table code.
    structure: using compiler builtins for some processing in the hash table.
    The specific code was moved inside one portable method.
    I added some comments in the ring allocator.
    Status is not printed if verbosity is not enabled.
    The registration system for plugins was changed. Now it uses function pointers instead of virtual methods, which can be slow as they can not be inlined.
    I added MessageWarden in the README.
    I added some documentation for handlers.
    Some more documentation was added.
    This fixes a bug in the insert() operation of the hash table during incremental resizing.
    h1 must return something between 0 and M-1 whereas h2 must return something odd between 1 and M-1. This was fixed in the code.
    The hash table also prints memory allocation information when printing its status.
    communication: switched the model to MPI_ANY_SOURCE.
    Added routines to clean dirty buffers when they are all dirty.
    A new routing graph is available: it is the hypercube.
    The hypercube prints its status before the end.
    routing: added status code for hypercube.
    communication: improved the last step in routing.
    routing: started to implement a round-robin policy for hypercube routing.
    routing: the round-robin hypercube is available in the code.
    routing: the hypercube can be modified to be a pseudo-hypercube
    communitation: increased the number of buffers for messaging
    communication: removed a useless line in the code
    Updated the code name for the upcoming release.
    communication: registration of dirty buffers is more efficient.
    communication: errors related to dirty buffers are more verbose
    cryptography: now using __SSE4_2__ provided by gcc -march=native
    Documentation: updated the author file
    structures/MyHashTable: added missing headers
    communication: show a warning when at least 64 buffers are dirty
    routing: added routing with a convex regular polytope
    MessageRouter: store the routing information in the buffer
    routing: don't write routes for the polytope surface (called hypercube)
    core: fixed a buffer allocation bug in the core
    communication: the real-time sweeper is better configured
    the upper bound for the number of sent messages is not m_size
    This is RayPlatform (the engine) v1.1.0-rc0 "Chariot of Complexity"
    ComputeCore: routed messages must be purged
    communication: introducing the CONFIG_COMM_IRECV_TESTANY model
    communication: non-blocking communication is bad on Blue Gene /Q
    This is the branch development version for RayPlatform v1.1.0-rc1
    RayPlatform v1.1.0

    Comment


    • Seb- the update has fixed my segfault issues. Thank you! 28 minutes for 1 2x150 and 1 2x250 MiSeq library to assemble a single fungal genome (160 cores), I am impressed!

      Comment


      • Originally posted by seb567 View Post
        What is "ptile" ? Are you using a fancy architecture (Cray XE6 or Blue Gene /Q for instance) ?



        I guess you are playing with fancy hardware, right ?
        Just saw this one, apologies for the double reply- we're running on an intel sandy bridge cluster. http://www.oscer.ou.edu/hardsoft_del...dge_boomer.php

        Not huge, but it certainly gets the job done. Ptile is in reference to my LSF batch handling (BSUB). You have to specify number of MPI processes (p) and how many processes per node (ptile). We also have a hybrid MPI/OpenMP(or POSIX) system in place to do hybrid jobs with an MPI ptile of 1, and 16 threads per node.

        Comment


        • Originally posted by bstamps View Post
          Seb- the update has fixed my segfault issues. Thank you! 28 minutes for 1 2x150 and 1 2x250 MiSeq library to assemble a single fungal genome (160 cores), I am impressed!
          That's one bug less to deal with then !

          In fact, one of the patches included in v2.1.0 guarantees the coherency of DNA strands in the de Bruijn graph. In v2.0.0 and before, it was not necessarily depending on various factors. A few random bugs occurred because of incoherency in the distributed storage engine.

          Sébastien

          Comment


          • Hello,

            Originally posted by bstamps View Post
            Just saw this one, apologies for the double reply- we're running on an intel sandy bridge cluster. http://www.oscer.ou.edu/hardsoft_del...dge_boomer.php

            Not huge, but it certainly gets the job done. Ptile is in reference to my LSF batch handling (BSUB). You have to specify number of MPI processes (p) and how many processes per node (ptile).
            That's a nice machine.

            Originally posted by bstamps View Post
            We also have a hybrid MPI/OpenMP(or POSIX) system in place to do hybrid jobs with an MPI ptile of 1, and 16 threads per node.
            As you may know, Ray ships with a library called RayPlatform, which abstracts all the parallel stuff from the programmer. In Ray v2.0.0 and v2.1.0, the associated RayPlatform library (versions 1.0.3 and 1.1.0, respectively) only utilizes MPI.

            Pure MPI applications work well on some machines, and not so much on others, usually because the Host Communication Adapter is being used by too many MPI processes on each node. That where hybrids come in.

            Hybrids are truly the future. I visited Argonne National Laboratory recently and I discussed with Professor Rick Stevens about hybrid programming models. Myself, Rick Stevens, and Fangfang Xia devised something called the "mini-ranks" hybrid programming model.

            The next release of Ray (likely something like 2.1.1) will run on RayPlatform 7.0.0, which will include support for our newly introduced "mini-ranks" hybrid programming model.

            So on your hybrid machine, you will be able to run Ray like this, (assuming 8 nodes and 16 hardware threads per node):

            mpiexec -n 8 -bynode \
            Ray -mini-ranks-per-rank 15 \
            -k 31 -o MiniRanksAreCool \
            -p joe1.fastq.bz2 joe2.fastq.bz2 \
            -p thor1.fastq.gz thor2.fastq.gz \

            This will launch 1 MPI process per node. Each MPI process will have exactly 15 mini-ranks. Each mini-rank will run in 1 IEEE POSIX thread and an additional thread
            (the origin control thread of the process) will do MPI calls.

            If you feel this is interesting for your laboratory, there is a preliminary implementation of this available for testing.

            You need to do this to install (copy and paste in a terminal):

            mkdir Ray-mini-ranks-MPI+pthread
            cd Ray-mini-ranks-MPI+pthread

            git clone git://github.com/sebhtml/RayPlatform.git
            cd RayPlatform
            git checkout minirank-model
            cd ..

            git clone git://github.com/sebhtml/ray.git
            cd ray
            git checkout minirank-model
            cd ..

            make

            mpiexec -n 2 ./Ray -mini-ranks-per-rank 2 -o Test -test-network-only &> /dev/null

            Sébastien


            Sent from my IBM Blue Gene/Q

            Comment


            • Hi Sebastien,
              I am going to try Ray. Make and install is successful when I don't turn on LIBZ or LIBBZ2.

              However, make error if I turn on "HAVE_LIBZ = y" and/or "HAVE_LIBBZ2 = y".

              The system here has both library installed:
              Code:
              $ ll /usr/lib64/libz*
              -rwxr-xr-x 1 root root 108628 Mar 16  2011 /usr/lib64/libz.a*
              lrwxrwxrwx 1 root root     19 Sep 16  2011 /usr/lib64/libz.so -> ../../lib64/libz.so*
              lrwxrwxrwx 1 root root     21 Sep 16  2011 /usr/lib64/libz.so.1 -> ../../lib64/libz.so.1*
              lrwxrwxrwx 1 root root     25 Sep 16  2011 /usr/lib64/libz.so.1.2.3 -> ../../lib64/libz.so.1.2.3*
              ll /usr/lib64/libbz2*
              -rwxr-xr-x 1 root root 77606 Sep 20  2010 /usr/lib64/libbz2.a*
              lrwxrwxrwx 1 root root    11 Dec 10  2010 /usr/lib64/libbz2.so -> libbz2.so.1*
              lrwxrwxrwx 1 root root    15 Dec 10  2010 /usr/lib64/libbz2.so.1 -> libbz2.so.1.0.3*
              -rwxr-xr-x 1 root root 67792 Sep 20  2010 /usr/lib64/libbz2.so.1.0.3*
              I am not sure, but I think the problem might be that the /usr/lib64 is not in my search path. I have bellow line in my ~/.bashrc
              Code:
              export LD_RUN_PATH=$MYSOFT/openmpi-1.6.3/lib:/usr/lib64
              So, the question is that if the problem comes from the library search path, how to edit the Makefile to get it work? If the problem is not from the library search path, then how? Thank you.
              Here is the error messages:
              Code:
              mpicxx  -lz -lbz2  code/TheRayGenomeAssembler.a RayPlatform/libRayPlatform.a -o Ray
              code/TheRayGenomeAssembler.a(Loader.o): In function `Loader::load(std::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool)':
              Loader.cpp:(.text+0x8a3): undefined reference to `FastqBz2Loader::open(std::basic_string<char, std::char_traits<char>, std::allocator<char> >, int)'
              Loader.cpp:(.text+0x8c3): undefined reference to `FastqBz2Loader::getSize()'
              Loader.cpp:(.text+0xb3b): undefined reference to `FastqGzLoader::open(std::basic_string<char, std::char_traits<char>, std::allocator<char> >, int)'
              Loader.cpp:(.text+0xb5b): undefined reference to `FastqGzLoader::getSize()'
              Loader.cpp:(.text+0xc5e): undefined reference to `FastqGzLoader::open(std::basic_string<char, std::char_traits<char>, std::allocator<char> >, int)'
              Loader.cpp:(.text+0xd0c): undefined reference to `FastqBz2Loader::open(std::basic_string<char, std::char_traits<char>, std::allocator<char> >, int)'
              code/TheRayGenomeAssembler.a(Loader.o): In function `Loader::loadSequences()':
              Loader.cpp:(.text+0x1ac): undefined reference to `FastqGzLoader::load(int, ArrayOfReads*, MyAllocator*, int)'
              Loader.cpp:(.text+0x20c): undefined reference to `FastqBz2Loader::load(int, ArrayOfReads*, MyAllocator*, int)'
              collect2: ld returned 1 exit status
              make: *** [Ray] Error 1

              Comment


              • You don't need to edit the Makefile.

                Are you compiling with this:
                make HAVE_LIBZ=y HAVE_LIBBZ2=y
                ?

                This should work as is if you have openmpi, zlib, and bzip2 (and associated
                -devel packages depending of your system).

                It seems that in your case that FastqBz2Loader.o and FastqGzLoader.o are not compiled
                and therefore not linked. FastqBz2Loader.o is compiled and linked only with HAVE_LIBBZ2=y and FastqGzLoader.o is only compiled and linked with HAVE_LIBZ=y.

                I suspect that you edited the Makefile.

                Can you provide your make command with all its output in pastebin [1] ?

                Hopefully, Ray will soon be available as precompiled packages for Debian [2], Fedora [3], and ArchLinux [4]. Packages will probably be distributed in Ubuntu (via Debian) and Red Hat Enterprise Linux (via Fedora).

                Just out of curiosity, what operating system are you running Ray on ?

                ---

                [1] http://pastebin.com/
                [2] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=692238
                [3] https://bugzilla.redhat.com/show_bug.cgi?id=872783
                [4] https://github.com/sebhtml/Ray-on-ArchLinux
                Originally posted by cwzkevin View Post
                Hi Sebastien,
                I am going to try Ray. Make and install is successful when I don't turn on LIBZ or LIBBZ2.

                However, make error if I turn on "HAVE_LIBZ = y" and/or "HAVE_LIBBZ2 = y".

                The system here has both library installed:
                Code:
                $ ll /usr/lib64/libz*
                -rwxr-xr-x 1 root root 108628 Mar 16  2011 /usr/lib64/libz.a*
                lrwxrwxrwx 1 root root     19 Sep 16  2011 /usr/lib64/libz.so -> ../../lib64/libz.so*
                lrwxrwxrwx 1 root root     21 Sep 16  2011 /usr/lib64/libz.so.1 -> ../../lib64/libz.so.1*
                lrwxrwxrwx 1 root root     25 Sep 16  2011 /usr/lib64/libz.so.1.2.3 -> ../../lib64/libz.so.1.2.3*
                ll /usr/lib64/libbz2*
                -rwxr-xr-x 1 root root 77606 Sep 20  2010 /usr/lib64/libbz2.a*
                lrwxrwxrwx 1 root root    11 Dec 10  2010 /usr/lib64/libbz2.so -> libbz2.so.1*
                lrwxrwxrwx 1 root root    15 Dec 10  2010 /usr/lib64/libbz2.so.1 -> libbz2.so.1.0.3*
                -rwxr-xr-x 1 root root 67792 Sep 20  2010 /usr/lib64/libbz2.so.1.0.3*
                I am not sure, but I think the problem might be that the /usr/lib64 is not in my search path. I have bellow line in my ~/.bashrc
                Code:
                export LD_RUN_PATH=$MYSOFT/openmpi-1.6.3/lib:/usr/lib64
                So, the question is that if the problem comes from the library search path, how to edit the Makefile to get it work? If the problem is not from the library search path, then how? Thank you.
                Here is the error messages:
                Code:
                mpicxx  -lz -lbz2  code/TheRayGenomeAssembler.a RayPlatform/libRayPlatform.a -o Ray
                code/TheRayGenomeAssembler.a(Loader.o): In function `Loader::load(std::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool)':
                Loader.cpp:(.text+0x8a3): undefined reference to `FastqBz2Loader::open(std::basic_string<char, std::char_traits<char>, std::allocator<char> >, int)'
                Loader.cpp:(.text+0x8c3): undefined reference to `FastqBz2Loader::getSize()'
                Loader.cpp:(.text+0xb3b): undefined reference to `FastqGzLoader::open(std::basic_string<char, std::char_traits<char>, std::allocator<char> >, int)'
                Loader.cpp:(.text+0xb5b): undefined reference to `FastqGzLoader::getSize()'
                Loader.cpp:(.text+0xc5e): undefined reference to `FastqGzLoader::open(std::basic_string<char, std::char_traits<char>, std::allocator<char> >, int)'
                Loader.cpp:(.text+0xd0c): undefined reference to `FastqBz2Loader::open(std::basic_string<char, std::char_traits<char>, std::allocator<char> >, int)'
                code/TheRayGenomeAssembler.a(Loader.o): In function `Loader::loadSequences()':
                Loader.cpp:(.text+0x1ac): undefined reference to `FastqGzLoader::load(int, ArrayOfReads*, MyAllocator*, int)'
                Loader.cpp:(.text+0x20c): undefined reference to `FastqBz2Loader::load(int, ArrayOfReads*, MyAllocator*, int)'
                collect2: ld returned 1 exit status
                make: *** [Ray] Error 1

                Comment


                • Yes, I edited the Makefile. Changed was made as below:
                  Code:
                  MAXKMERLENGTH = 96
                  HAVE_LIBZ = y
                  HAVE_LIBBZ2 = y
                  My make command is just simple as
                  Code:
                  $ make PREFIX=bin
                  My system is
                  Code:
                  $ uname -mrs
                  Linux 2.6.18-308.8.2.el5 x86_64
                  $ lsb_release -a
                  LSB Version:    :core-4.0-amd64:core-4.0-ia32:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-ia32:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-ia32:printing-4.0-noarch
                  Distributor ID: RedHatEnterpriseServer
                  Description:    Red Hat Enterprise Linux Server release 5.8 (Tikanga)
                  Release:        5.8
                  Codename:       Tikanga
                  Here is the link for full output http://pastebin.com/Kf35v5SK

                  Thank you for your help.

                  Originally posted by seb567 View Post
                  You don't need to edit the Makefile.

                  Are you compiling with this:
                  make HAVE_LIBZ=y HAVE_LIBBZ2=y
                  ?

                  This should work as is if you have openmpi, zlib, and bzip2 (and associated
                  -devel packages depending of your system).

                  It seems that in your case that FastqBz2Loader.o and FastqGzLoader.o are not compiled
                  and therefore not linked. FastqBz2Loader.o is compiled and linked only with HAVE_LIBBZ2=y and FastqGzLoader.o is only compiled and linked with HAVE_LIBZ=y.

                  I suspect that you edited the Makefile.

                  Can you provide your make command with all its output in pastebin [1] ?

                  Hopefully, Ray will soon be available as precompiled packages for Debian [2], Fedora [3], and ArchLinux [4]. Packages will probably be distributed in Ubuntu (via Debian) and Red Hat Enterprise Linux (via Fedora).

                  Just out of curiosity, what operating system are you running Ray on ?

                  ---

                  [1] http://pastebin.com/
                  [2] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=692238
                  [3] https://bugzilla.redhat.com/show_bug.cgi?id=872783
                  [4] https://github.com/sebhtml/Ray-on-ArchLinux

                  Comment


                  • Oh, I see. With below, it is good now.
                    Code:
                    $ make PREFIX=bin HAVE_LIBZ=y HAVE_LIBBZ2=y
                    Seems my below question is more related to common sense of linux/compiler instead of Ray:
                    Q: What is the difference between Method 1 and Method 2, shouldn't they be the same?
                    Method 1: I edited the Makefile, changed to "HAVE_LIBZ = y", "HAVE_LIBBZ2 = y", then $ make PREFIX=bin
                    Method 2: $ make PREFIX=bin HAVE_LIBZ=y HAVE_LIBBZ2=y
                    Thanks.
                    Last edited by cwzkevin; 11-04-2012, 04:59 PM.

                    Comment


                    • Hi !

                      Thanks for the logs, that really helps understanding what's going
                      on.

                      You can build Ray with these options without any Makefile modification:

                      $ make clean
                      $ make MAXKMERLENGTH=96 HAVE_LIBZ=y HAVE_LIBBZ2=y PREFIX=bin
                      $ make install

                      $ mpiexec -n 1 bin/Ray -version
                      Ray version 2.1.0
                      License for Ray: GNU General Public License version 3
                      RayPlatform version: 1.1.0
                      License for RayPlatform: GNU Lesser General Public License version 3

                      MAXKMERLENGTH: 96 <=========== Here you go !
                      KMER_U64_ARRAY_SIZE: 3
                      Maximum coverage depth stored by CoverageDepth: 4294967295
                      MAXIMUM_MESSAGE_SIZE_IN_BYTES: 4000 bytes
                      FORCE_PACKING = n
                      ASSERT = n
                      HAVE_LIBZ = y <=========== Here you go !
                      HAVE_LIBBZ2 = y <=========== Here you go !
                      CONFIG_PROFILER_COLLECT = n
                      CONFIG_CLOCK_GETTIME = n
                      __linux__ = y
                      _MSC_VER = n
                      __GNUC__ = y
                      RAY_32_BITS = n
                      RAY_64_BITS = y
                      MPI standard version: MPI 2.1
                      MPI library: Open-MPI 1.5.4
                      Compiler: GNU gcc/g++ 4.7.2 20120921 (Red Hat 4.7.2-2)

                      Originally posted by cwzkevin View Post
                      Oh, I see. With below, it is good now.
                      Code:
                      $ make PREFIX=bin HAVE_LIBZ=y HAVE_LIBBZ2=y
                      Seems my below question is more related to common sense of linux/compiler instead of Ray:
                      Q: What is the difference between Method 1 and Method 2, shouldn't they be the same?
                      Method 1: I edited the Makefile, changed to "HAVE_LIBZ = y", "HAVE_LIBBZ2 = y", then $ make PREFIX=bin
                      Method 2: $ make PREFIX=bin HAVE_LIBZ=y HAVE_LIBBZ2=y
                      Thanks.

                      Now, if why editing the Makefile fails ?

                      There are many Makefile files actually (distributed Makefiles)

                      Ray-v2.1.0/Makefile
                      Ray-v2.1.0/code/Makefile
                      Ray-v2.1.0/code/*/Makefile (23)

                      When you provide the variables in the make command line, they will
                      be given to child processes because they are exported. However,
                      variables within a Makefile are not exported.

                      It fails because of this:

                      Ray-v2.1.0/code/plugin_SequencesLoader/Makefile:

                      SequencesLoader-$(HAVE_LIBBZ2) += plugin_SequencesLoader/BzReader.o
                      SequencesLoader-$(HAVE_LIBBZ2) += plugin_SequencesLoader/FastqBz2Loader.o
                      SequencesLoader-$(HAVE_LIBZ) += plugin_SequencesLoader/FastqGzLoader.o

                      These configuration options are used by the Makefiles, but also by the
                      C++ code. For example, HAVE_LIBZ is valued to y in all the Makefiles,
                      and the -D HAVE_LIBZ passed to gcc defines HAVE_LIBZ in all C++ files
                      too.

                      If you really want to edit the Makefile, you have to do it like this:

                      --- Ray-v2.1.0/Makefile 2012-10-30 18:29:34.000000000 -0400
                      +++ Ray-v2.1.0-copy/Makefile 2012-11-04 20:05:54.099217300 -0500
                      @@ -33,13 +33,13 @@
                      # needs libz
                      # set to no if you don't have libz
                      # y/n
                      -HAVE_LIBZ = n
                      +export HAVE_LIBZ = y

                      # support for .bz2 files
                      # needs libbz2
                      # set to no if you don't have libbz2
                      # y/n
                      -HAVE_LIBBZ2 = n
                      +export HAVE_LIBBZ2 = y

                      # use Intel's compiler
                      # the name of the Intel MPI C++ compiler is mpiicpc

                      If you know programming, you can send me a patch that fixes this bug
                      in the Makefile that would add 'export ' in front of build options.

                      If you have other questions regarding the Ray build system,
                      let me know.


                      Otherwise, I'll put this in my patchwork queue !

                      ***
                      Cheers, Sébastien


                      Originally posted by cwzkevin View Post
                      Yes, I edited the Makefile. Changed was made as below:
                      Code:
                      MAXKMERLENGTH = 96
                      HAVE_LIBZ = y
                      HAVE_LIBBZ2 = y
                      My make command is just simple as
                      Code:
                      $ make PREFIX=bin
                      My system is
                      Code:
                      $ uname -mrs
                      Linux 2.6.18-308.8.2.el5 x86_64
                      $ lsb_release -a
                      LSB Version:    :core-4.0-amd64:core-4.0-ia32:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-ia32:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-ia32:printing-4.0-noarch
                      Distributor ID: RedHatEnterpriseServer
                      Description:    Red Hat Enterprise Linux Server release 5.8 (Tikanga)
                      Release:        5.8
                      Codename:       Tikanga
                      Here is the link for full output http://pastebin.com/Kf35v5SK

                      Thank you for your help.
                      Last edited by seb567; 11-04-2012, 05:22 PM.

                      Comment


                      • Thank you very much for the detail, I understand now. Great appreciate it!

                        Sorry, I am not a programmer. (I think a programmer should already know there could be distributed Makefiles instead of the one that I edited. ^_^)

                        Now, it is time to try it out.
                        Thanks!

                        Comment


                        • Hi all,

                          Quick question: I have a paired-end data from a MiSeq that I would like to assemble in Ray. The library was made with a Nextera kit and sequenced using the new 2 X 250 reagent kits. The average size distribution of my library was around 500 bp but some smaller fragments were present. For those fragments, the read pairs will at least partially overalp. Does Ray have a problem when the two members of a pair of reads overlap? Should I treat the data as non paired end?

                          Thanks!
                          Kevin

                          Comment


                          • Hi, I have a question here.
                            Does Ray expect the sequence order in two pair-end files the same?
                            I ask because the sequence order in my two pair-end fastq files happened to be different the last time. They are indeed pair files, just the sequences are in different order. And I ran these pair-end files with Ray, got output1. After I realized the sequence order are not the same, I sort the fastq files to make them same order. I then re-ran Ray, got output2. It seems the two run results are different.
                            Thank you.

                            Comment


                            • Originally posted by kmkocot View Post
                              Hi all,

                              Quick question: I have a paired-end data from a MiSeq that I would like to assemble in Ray. The library was made with a Nextera kit and sequenced using the new 2 X 250 reagent kits. The average size distribution of my library was around 500 bp but some smaller fragments were present. For those fragments, the read pairs will at least partially overalp. Does Ray have a problem when the two members of a pair of reads overlap? Should I treat the data as non paired end?

                              Thanks!
                              Kevin
                              Hi,

                              Ray will be fine with those.

                              I suggest you run something like this:

                              mpiexec -n 16 Ray -k 71 -p file_R1.fastq.gz file_R2.fastq.gz -o MiSeq+Ray


                              Also, you can use Ray Cloud Browser too to visualize your assembly in your web browser.

                              Demo: http://genome.ulaval.ca/corbeillab/Ray-Cloud-Browser


                              p.s.: you'll need to compile with this:

                              make MAXKMERLENGTH=96 HAVE_LIBZ=y

                              ---
                              -Sébastien
                              Last edited by seb567; 02-04-2013, 08:26 PM. Reason: added http://; added -k 71

                              Comment


                              • Originally posted by cwzkevin View Post
                                Hi, I have a question here.
                                Does Ray expect the sequence order in two pair-end files the same?
                                yes

                                Originally posted by cwzkevin View Post
                                I ask because the sequence order in my two pair-end fastq files happened to be different the last time. They are indeed pair files, just the sequences are in different order. And I ran these pair-end files with Ray, got output1. After I realized the sequence order are not the same, I sort the fastq files to make them same order. I then re-ran Ray, got output2. It seems the two run results are different.
                                Thank you.
                                Ray need both files to list sequences in the same order.

                                By default, most sequencing technologies do that by default, and the dominant sequencing technology is just like that too.

                                Thanks for the feedback !

                                -Sébastien

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Techniques and Challenges in Conservation Genomics
                                  by seqadmin



                                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                  Avian Conservation
                                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                  03-08-2024, 10:41 AM
                                • seqadmin
                                  The Impact of AI in Genomic Medicine
                                  by seqadmin



                                  Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
                                  02-26-2024, 02:07 PM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 03-14-2024, 06:13 AM
                                0 responses
                                32 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-08-2024, 08:03 AM
                                0 responses
                                72 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-07-2024, 08:13 AM
                                0 responses
                                80 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-06-2024, 09:51 AM
                                0 responses
                                68 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X