Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Originally posted by dp05yk View Post
    Hi Sebastien,



    A mod value is the remainder after performing integer division. For instance, 7 % 3 = 1, since 3*2 = 7 - 1. So the way we handle sequence distribution for threading is:

    Loop i = 1 to (num_seqs)

    if (i % num_threads) = thread_id then process
    else skip

    End loop

    Since we are performing the modulus on the loop counter with the number of threads, we are guaranteed to cycle through the numbers 0...num_threads for each consecutive sequence. This ensures that the sequences are evenly divided, and it also ensures that no threads will be competing, since thread i only processes sequences (i, i+num_threads, i+(2*num_threads), etc.

    Does that make sense? Previously, threads would essentially fight over sequence distribution by locking and "reserving" sequences for processing. This is responsible for the 20% efficiency difference.
    I know what a modulo is.

    In Ray, a de novo assembler, we were using this approach to split sequences across MPI ranks. Now, we just do a simple partition on the sequences.

    Given N sequences (which can be in many files, of course) and M MPI ranks, MPI rank 0 takes sequences 0 to (N/M)-1, MPI rank 1 takes sequences (N/M) to 2*(N/M)-1, and so on. Finally, the MPI rank M-1 (the last one) also takes the remaining N%M sequences.

    The partition-wise approach has the advantage that each MPI rank knows where to start and where to end.

    Originally posted by dp05yk View Post

    This function in bwaseqio.c performs the input reads file indexing. In order to distribute reads evenly over processes, each process receives a contiguous block of reads. However, with paired reads, we cannot assume that both reads files will be of exactly the same length (especially when dealing with SOLiD reads), so we need to index the files to find the start and end location of each contiguous block of reads. This is a one-processor job, and processor 0 essentially scans the file, marking the start and end locations of an evenly distributed reads block. Once it finds these positions, it sends them to processor i and this becomes processor i's block of input reads. The reason these are 8-byte numbers is because some input reads files are extremely large and are larger than 2^32 bytes.
    Regardless, I think you could enhance your already-enhanced approach using message aggregation.

    Example with 4 MPI ranks and 9 integers so send:

    Without message aggregation

    Rank 0 sends value 0 to Rank 1
    Rank 0 sends value 1 to Rank 2
    Rank 0 sends value 2 to Rank 3
    Rank 0 sends value 3 to Rank 1
    Rank 0 sends value 4 to Rank 2
    Rank 0 sends value 5 to Rank 3
    Rank 0 sends value 6 to Rank 1
    Rank 0 sends value 7 to Rank 2
    Rank 0 sends value 8 to Rank 3

    (9 messages)

    With message aggregation


    Rank 0 sends values 0,3,6 to Rank 1
    Rank 0 sends values 1,4,7 to Rank 2
    Rank 0 sends values 2,5,8 to Rank 3

    (3 messages)

    In this toy example, agglomerated messages contains 3 values.

    You can bundle 500 8-byte integers (4000 bytes) in a 4096-byte message, assuming the the envelope is at most 96 bytes.

    So, in your case, agglomerated messages would contain 500 values and you would divide your number of sent messages by 500, which is good given that transiting a message between two MPI ranks that are not on the same computer is costly.

    Sébastien http://Boisvert.info

    Comment


    • #17
      We are seeing more and more that re-alignment is an amazing benefit but terribly slow.. it makes the regular bwa alignment seem so fast... I hope there are solutions in the works for re-alignment aspect, where one needs to take all reads in a window and cannot arbitrarily split and parallelize..
      --
      bioinfosm

      Comment


      • #18
        How does pBWI work with single-end reads? It's not really clear from the tutorial.
        I aligned single-end reads using 3 processors. Now I have 3 *.sai files: a1-0.sai ... a1-2.sai. To get the sam file I tried:
        Code:
        pBWA samse -f out.sam ~/hg18/hg18 a1-0.sai all.fq 29424134
        pBWA crashes and produces the following:
        Code:
        [bwa_sai2sam_se_core] fail to open file 'a1-0.sai--1.sai'. Abort!
        [bart:29010] *** Process received signal ***
        [bart:29010] Signal: Aborted (6)
        [bart:29010] Signal code:  (-6)
        [bart:29010] [ 0] /lib64/libpthread.so.0 [0x7f8f5393bc00]
        [bart:29010] [ 1] /lib64/libc.so.6(gsignal+0x35) [0x7f8f528984e5]
        [bart:29010] [ 2] /lib64/libc.so.6(abort+0x180) [0x7f8f528999b0]
        [bart:29010] [ 3] pBWA [0x405309]
        [bart:29010] [ 4] pBWA(bwa_sai2sam_se_core+0xca) [0x41597a]
        [bart:29010] [ 5] pBWA(bwa_sai2sam_se+0x14a) [0x415e5a]
        [bart:29010] [ 6] pBWA(main+0xe3) [0x427263]
        [bart:29010] [ 7] /lib64/libc.so.6(__libc_start_main+0xfd) [0x7f8f52884a7d]
        [bart:29010] [ 8] pBWA [0x404f69]
        [bart:29010] *** End of error message ***
        Aborted (core dumped)
        I tried the same command with 'regular' bwa (minus the last argument) and it executed without a problem. What am I missing?

        I am using 0.5.9-r21-MPI.

        Comment


        • #19
          Hi YEG,

          You want to input the .sai prefix (a1)... Then pBWA will align every .sai file that you have with that prefix! Also, you just want to specify "out" as your -f parameter as pBWA will add the rank and .SAMs to the output files.

          Also, you may want to use revision 30, always best to stay current! :-)

          Comment


          • #20
            Originally posted by dp05yk View Post
            Hi YEG,

            You want to input the .sai prefix (a1)... Then pBWA will align every .sai file that you have with that prefix! Also, you just want to specify "out" as your -f parameter as pBWA will add the rank and .SAMs to the output files.

            Also, you may want to use revision 30, always best to stay current! :-)
            I should probably just have showed you:

            ./pBWA samse -f out ~/hg18/hg18 a1 all.fq 29424134


            And that will align all of your .sai files at the same time!

            Cheers,
            Darren

            Comment


            • #21
              Originally posted by dp05yk View Post
              I should probably just have showed you:

              ./pBWA samse -f out ~/hg18/hg18 a1 all.fq 29424134
              This may be a small bug. I had to rename *.sai files for the above command to work. The files need to have an extra '-'. So [prefix]-0.sai needs to be named [prefix]--0.sai and so on for every file made with pBWA align.

              Here's the pBWA align command I used :

              Code:
              mpirun -np 3 -hostfile hostfile pBWA aln -f a1 -t 24 ~/hg18/hg18 all.fq 29424134

              Comment


              • #22
                Originally posted by YEG View Post
                This may be a small bug. I had to rename *.sai files for the above command to work. The files need to have an extra '-'. So [prefix]-0.sai needs to be named [prefix]--0.sai and so on for every file made with pBWA align.

                Here's the pBWA align command I used :

                Code:
                mpirun -np 3 -hostfile hostfile pBWA aln -f a1 -t 24 ~/hg18/hg18 all.fq 29424134
                That's... really strange. I just checked the code (for both revisions 21 and 30) and it seems like it should be functioning properly... both bwase and bwape take the entered prefix and concatenate "-%d.sai", where %d = processor rank.

                Comment


                • #23
                  Actually YEG, I did find a bug. Thanks for pointing this out to me. It was assigning the processor rank AFTER determining the filename. I guess every system behaves differently, so yours was assigning a rank of -1, hence the additional dash.

                  I hadn't caught this because I did most if not all of my testing with the sampe command as it seemed to be more popular.

                  I'll be uploading the latest revision to the sourceforge page today, thanks for the input!

                  Comment


                  • #24
                    Just to let everyone know, an alternate version of pBWA is now available that cleans up the workflow a bit. The user is no longer required to enter the number of reads in the FASTQ file, and SAM information is output to one file in parallel by all processors. There are also a few minor stability enhancements that should make pBWA compatible with MPICH. Performance appears to be similar to pBWA-r32. Thanks go to Rob Egan for the enhancements.

                    It's available at http://sourceforge.net/projects/pbwa ... thanks!
                    Last edited by dp05yk; 07-05-2011, 06:12 AM.

                    Comment


                    • #25
                      Hi dp05yk,

                      Thanks for releasing the pBWA! The discussion is very helpful for the usage of pBWA. However, I found problems installing pBWA and I could not find any README file in the source code directory. Would you please help me with the following error message I got when trying to compile it? I read the home page of PBWA and know about the requirement for MPI-"pBWA requires a multi-node (or multi-core) *nix system with a parallel scheduler alongside the OpenMPI C library in order to compile and run. " But I am not sure how to add the multi-node (or multi-core) *nix system with a parallel scheduler alongside the OpenMPI C library to compile it.

                      Thanks a lot!


                      make
                      #################Error################
                      make[1]: Entering directory `/panda_scratch_homes001/shl2018/software/alignment/pBWA'
                      make[1]: Nothing to be done for `lib'.
                      make[1]: Leaving directory `/panda_scratch_homes001/shl2018/software/alignment/pBWA'
                      make[1]: Entering directory `/panda_scratch_homes001/shl2018/software/alignment/pBWA/bwt_gen'
                      mpicc -c -g -Wall -m64 -O2 -DHAVE_PTHREAD -D_LARGEFILE64_SOURCE bwt_gen.c -o bwt_gen.o
                      make[1]: mpicc: Command not found
                      make[1]: *** [bwt_gen.o] Error 127
                      make[1]: Leaving directory `/panda_scratch_homes001/shl2018/software/alignment/pBWA/bwt_gen'
                      make: *** [lib-recur] Error 1
                      ############### Error ###################

                      Originally posted by dp05yk View Post
                      Just to let everyone know, an alternate version of pBWA is now available that cleans up the workflow a bit. The user is no longer required to enter the number of reads in the FASTQ file, and SAM information is output to one file in parallel by all processors. There are also a few minor stability enhancements that should make pBWA compatible with MPICH. Performance appears to be similar to pBWA-r32. Thanks go to Rob Egan for the enhancements.

                      It's available at http://sourceforge.net/projects/pbwa ... thanks!

                      Comment


                      • #26
                        Hi sheng,

                        These requirements can be broken down as follows. pBWA is a _parallel_ implementation of BWA. This means that unless your computer system has multiple processors, this software will be of no use to you. Essentially what pBWA does is distribute massive input reads files over multiple processors in order to execute BWA in parallel. If you do not have access to a computer cluster or parallel machine, this is impossible for you since you do not have multiple processors to distribute over If you have a standard home computer with a multi-_core_ processor, just use the multithreading option available in the latest release of BWA.

                        As for the MPICC compiler - if you in fact do have access to a computing cluster, you'll need to ask one of the administrators if the MPI compiler is installed (MPICH or OpenMPI work, actually). If it is installed, it could have an alias over than "mpicc", at which point you'll have to modify the makefile accordingly.

                        I hope this clears some issues up for you! I have a suspicion you may have been trying to install this on your home or basic lab PC, in which case you will be better off using BWA.

                        Thanks for posting!

                        Comment


                        • #27
                          pBWA installation

                          Hi dp05yk,

                          Thanks a lot for your reply! I am working on a cluster which have multiple node and core. I am sure we have Openmpi installed in the cluster. So what is the information about openmpi that I need to change the makefile and which part of makefile do I need to change? When I compile it, I just type make? Any other steps?

                          Cheers,
                          Sheng

                          Originally posted by dp05yk View Post
                          Hi sheng,

                          These requirements can be broken down as follows. pBWA is a _parallel_ implementation of BWA. This means that unless your computer system has multiple processors, this software will be of no use to you. Essentially what pBWA does is distribute massive input reads files over multiple processors in order to execute BWA in parallel. If you do not have access to a computer cluster or parallel machine, this is impossible for you since you do not have multiple processors to distribute over If you have a standard home computer with a multi-_core_ processor, just use the multithreading option available in the latest release of BWA.

                          As for the MPICC compiler - if you in fact do have access to a computing cluster, you'll need to ask one of the administrators if the MPI compiler is installed (MPICH or OpenMPI work, actually). If it is installed, it could have an alias over than "mpicc", at which point you'll have to modify the makefile accordingly.

                          I hope this clears some issues up for you! I have a suspicion you may have been trying to install this on your home or basic lab PC, in which case you will be better off using BWA.

                          Thanks for posting!

                          Comment


                          • #28
                            Hi Sheng,

                            You need to figure out the alias to use to call the MPI compiler. On most clusters this will be "mpicc"... you'll have to contact your system administrator to figure out what this is, or perform a google search for more popular aliases.

                            Then, in both makefiles (one in the root folder and one in the bwt_gen folder), change
                            CC = mpicc
                            to
                            CC = youralias

                            Where youralias = the alias used to call your MPI compiler.

                            Comment


                            • #29
                              pBWA and fastq.gz

                              I notice that when I run mpirun and gzipped fastq files it returns a sam file containing only the header. If I run without mpirun it works just fine.

                              BTW I am using v2.

                              Thanks,

                              Ilya
                              Last edited by ichorny; 09-14-2011, 01:52 PM.

                              Comment


                              • #30
                                That's interesting... as the website for pBWA notes, gzipped FASTQ files are not supported since we required random file access to split up the input files.

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Strategies for Sequencing Challenging Samples
                                  by seqadmin


                                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                  03-22-2024, 06:39 AM
                                • seqadmin
                                  Techniques and Challenges in Conservation Genomics
                                  by seqadmin



                                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                  Avian Conservation
                                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                  03-08-2024, 10:41 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 03-27-2024, 06:37 PM
                                0 responses
                                12 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-27-2024, 06:07 PM
                                0 responses
                                11 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-22-2024, 10:03 AM
                                0 responses
                                53 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-21-2024, 07:32 AM
                                0 responses
                                69 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X