Seqanswers Leaderboard Ad

**seb567** · 04-29-2011, 01:03 PM

Originally posted by dp05yk View Post

Hi Sebastien,

A mod value is the remainder after performing integer division. For instance, 7 % 3 = 1, since 3*2 = 7 - 1. So the way we handle sequence distribution for threading is:

Loop i = 1 to (num_seqs)

if (i % num_threads) = thread_id then process
else skip

End loop

Since we are performing the modulus on the loop counter with the number of threads, we are guaranteed to cycle through the numbers 0...num_threads for each consecutive sequence. This ensures that the sequences are evenly divided, and it also ensures that no threads will be competing, since thread i only processes sequences (i, i+num_threads, i+(2*num_threads), etc.

Does that make sense? Previously, threads would essentially fight over sequence distribution by locking and "reserving" sequences for processing. This is responsible for the 20% efficiency difference.

I know what a modulo is.

In Ray, a de novo assembler, we were using this approach to split sequences across MPI ranks. Now, we just do a simple partition on the sequences.

Given N sequences (which can be in many files, of course) and M MPI ranks, MPI rank 0 takes sequences 0 to (N/M)-1, MPI rank 1 takes sequences (N/M) to 2*(N/M)-1, and so on. Finally, the MPI rank M-1 (the last one) also takes the remaining N%M sequences.

The partition-wise approach has the advantage that each MPI rank knows where to start and where to end.

Originally posted by dp05yk View Post

This function in bwaseqio.c performs the input reads file indexing. In order to distribute reads evenly over processes, each process receives a contiguous block of reads. However, with paired reads, we cannot assume that both reads files will be of exactly the same length (especially when dealing with SOLiD reads), so we need to index the files to find the start and end location of each contiguous block of reads. This is a one-processor job, and processor 0 essentially scans the file, marking the start and end locations of an evenly distributed reads block. Once it finds these positions, it sends them to processor i and this becomes processor i's block of input reads. The reason these are 8-byte numbers is because some input reads files are extremely large and are larger than 2^32 bytes.

Regardless, I think you could enhance your already-enhanced approach using message aggregation.

Example with 4 MPI ranks and 9 integers so send:

Without message aggregation

Rank 0 sends value 0 to Rank 1
Rank 0 sends value 1 to Rank 2
Rank 0 sends value 2 to Rank 3
Rank 0 sends value 3 to Rank 1
Rank 0 sends value 4 to Rank 2
Rank 0 sends value 5 to Rank 3
Rank 0 sends value 6 to Rank 1
Rank 0 sends value 7 to Rank 2
Rank 0 sends value 8 to Rank 3

(9 messages)

With message aggregation

Rank 0 sends values 0,3,6 to Rank 1
Rank 0 sends values 1,4,7 to Rank 2
Rank 0 sends values 2,5,8 to Rank 3

(3 messages)

In this toy example, agglomerated messages contains 3 values.

You can bundle 500 8-byte integers (4000 bytes) in a 4096-byte message, assuming the the envelope is at most 96 bytes.

So, in your case, agglomerated messages would contain 500 values and you would divide your number of sent messages by 500, which is good given that transiting a message between two MPI ranks that are not on the same computer is costly.

Sébastien http://Boisvert.info

**bioinfosm** · 05-01-2011, 02:00 PM

We are seeing more and more that re-alignment is an amazing benefit but terribly slow.. it makes the regular bwa alignment seem so fast... I hope there are solutions in the works for re-alignment aspect, where one needs to take all reads in a window and cannot arbitrarily split and parallelize..

**YEG** · 05-05-2011, 04:34 PM

How does pBWI work with single-end reads? It's not really clear from the tutorial.
I aligned single-end reads using 3 processors. Now I have 3 *.sai files: a1-0.sai ... a1-2.sai. To get the sam file I tried:

Code:

pBWA samse -f out.sam ~/hg18/hg18 a1-0.sai all.fq 29424134

pBWA crashes and produces the following:

Code:

[bwa_sai2sam_se_core] fail to open file 'a1-0.sai--1.sai'. Abort!
[bart:29010] *** Process received signal ***
[bart:29010] Signal: Aborted (6)
[bart:29010] Signal code:  (-6)
[bart:29010] [ 0] /lib64/libpthread.so.0 [0x7f8f5393bc00]
[bart:29010] [ 1] /lib64/libc.so.6(gsignal+0x35) [0x7f8f528984e5]
[bart:29010] [ 2] /lib64/libc.so.6(abort+0x180) [0x7f8f528999b0]
[bart:29010] [ 3] pBWA [0x405309]
[bart:29010] [ 4] pBWA(bwa_sai2sam_se_core+0xca) [0x41597a]
[bart:29010] [ 5] pBWA(bwa_sai2sam_se+0x14a) [0x415e5a]
[bart:29010] [ 6] pBWA(main+0xe3) [0x427263]
[bart:29010] [ 7] /lib64/libc.so.6(__libc_start_main+0xfd) [0x7f8f52884a7d]
[bart:29010] [ 8] pBWA [0x404f69]
[bart:29010] *** End of error message ***
Aborted (core dumped)

I tried the same command with 'regular' bwa (minus the last argument) and it executed without a problem. What am I missing?

I am using 0.5.9-r21-MPI.

**dp05yk** · 05-05-2011, 06:27 PM

Hi YEG,

You want to input the .sai prefix (a1)... Then pBWA will align every .sai file that you have with that prefix! Also, you just want to specify "out" as your -f parameter as pBWA will add the rank and .SAMs to the output files.

Also, you may want to use revision 30, always best to stay current! :-)

**dp05yk** · 05-06-2011, 03:45 AM

Originally posted by dp05yk View Post

Hi YEG,

You want to input the .sai prefix (a1)... Then pBWA will align every .sai file that you have with that prefix! Also, you just want to specify "out" as your -f parameter as pBWA will add the rank and .SAMs to the output files.

Also, you may want to use revision 30, always best to stay current! :-)

I should probably just have showed you:

./pBWA samse -f out ~/hg18/hg18 a1 all.fq 29424134

And that will align all of your .sai files at the same time!

Cheers,
Darren

**YEG** · 05-06-2011, 04:08 AM

Originally posted by dp05yk View Post

I should probably just have showed you:

./pBWA samse -f out ~/hg18/hg18 a1 all.fq 29424134

This may be a small bug. I had to rename *.sai files for the above command to work. The files need to have an extra '-'. So [prefix]-0.sai needs to be named [prefix]--0.sai and so on for every file made with pBWA align.

Here's the pBWA align command I used :

Code:

mpirun -np 3 -hostfile hostfile pBWA aln -f a1 -t 24 ~/hg18/hg18 all.fq 29424134

**dp05yk** · 05-06-2011, 04:43 AM

Originally posted by YEG View Post

This may be a small bug. I had to rename *.sai files for the above command to work. The files need to have an extra '-'. So [prefix]-0.sai needs to be named [prefix]--0.sai and so on for every file made with pBWA align.

Here's the pBWA align command I used :

Code:

mpirun -np 3 -hostfile hostfile pBWA aln -f a1 -t 24 ~/hg18/hg18 all.fq 29424134

That's... really strange. I just checked the code (for both revisions 21 and 30) and it seems like it should be functioning properly... both bwase and bwape take the entered prefix and concatenate "-%d.sai", where %d = processor rank.

**dp05yk** · 05-06-2011, 05:06 AM

Actually YEG, I did find a bug. Thanks for pointing this out to me. It was assigning the processor rank AFTER determining the filename. I guess every system behaves differently, so yours was assigning a rank of -1, hence the additional dash.

I hadn't caught this because I did most if not all of my testing with the sampe command as it seemed to be more popular.

I'll be uploading the latest revision to the sourceforge page today, thanks for the input!

**dp05yk** · 07-05-2011, 05:56 AM

Just to let everyone know, an alternate version of pBWA is now available that cleans up the workflow a bit. The user is no longer required to enter the number of reads in the FASTQ file, and SAM information is output to one file in parallel by all processors. There are also a few minor stability enhancements that should make pBWA compatible with MPICH. Performance appears to be similar to pBWA-r32. Thanks go to Rob Egan for the enhancements.

It's available at http://sourceforge.net/projects/pbwa ... thanks!

**sheng** · 08-22-2011, 07:57 PM

Hi dp05yk,

Thanks for releasing the pBWA! The discussion is very helpful for the usage of pBWA. However, I found problems installing pBWA and I could not find any README file in the source code directory. Would you please help me with the following error message I got when trying to compile it? I read the home page of PBWA and know about the requirement for MPI-"pBWA requires a multi-node (or multi-core) *nix system with a parallel scheduler alongside the OpenMPI C library in order to compile and run. " But I am not sure how to add the multi-node (or multi-core) *nix system with a parallel scheduler alongside the OpenMPI C library to compile it.

Thanks a lot!

make
#################Error################
make[1]: Entering directory `/panda_scratch_homes001/shl2018/software/alignment/pBWA'
make[1]: Nothing to be done for `lib'.
make[1]: Leaving directory `/panda_scratch_homes001/shl2018/software/alignment/pBWA'
make[1]: Entering directory `/panda_scratch_homes001/shl2018/software/alignment/pBWA/bwt_gen'
mpicc -c -g -Wall -m64 -O2 -DHAVE_PTHREAD -D_LARGEFILE64_SOURCE bwt_gen.c -o bwt_gen.o
make[1]: mpicc: Command not found
make[1]: *** [bwt_gen.o] Error 127
make[1]: Leaving directory `/panda_scratch_homes001/shl2018/software/alignment/pBWA/bwt_gen'
make: *** [lib-recur] Error 1
############### Error ###################

Originally posted by dp05yk View Post

Just to let everyone know, an alternate version of pBWA is now available that cleans up the workflow a bit. The user is no longer required to enter the number of reads in the FASTQ file, and SAM information is output to one file in parallel by all processors. There are also a few minor stability enhancements that should make pBWA compatible with MPICH. Performance appears to be similar to pBWA-r32. Thanks go to Rob Egan for the enhancements.

It's available at http://sourceforge.net/projects/pbwa ... thanks!

**dp05yk** · 08-23-2011, 03:22 AM

Hi sheng,

These requirements can be broken down as follows. pBWA is a _parallel_ implementation of BWA. This means that unless your computer system has multiple processors, this software will be of no use to you. Essentially what pBWA does is distribute massive input reads files over multiple processors in order to execute BWA in parallel. If you do not have access to a computer cluster or parallel machine, this is impossible for you since you do not have multiple processors to distribute over

If you have a standard home computer with a multi-_core_ processor, just use the multithreading option available in the latest release of BWA.

As for the MPICC compiler - if you in fact do have access to a computing cluster, you'll need to ask one of the administrators if the MPI compiler is installed (MPICH or OpenMPI work, actually). If it is installed, it could have an alias over than "mpicc", at which point you'll have to modify the makefile accordingly.

I hope this clears some issues up for you! I have a suspicion you may have been trying to install this on your home or basic lab PC, in which case you will be better off using BWA.

Thanks for posting!

**sheng** · 08-23-2011, 05:31 AM

pBWA installation

Hi dp05yk,

Thanks a lot for your reply! I am working on a cluster which have multiple node and core. I am sure we have Openmpi installed in the cluster. So what is the information about openmpi that I need to change the makefile and which part of makefile do I need to change? When I compile it, I just type make? Any other steps?

Cheers,
Sheng

Originally posted by dp05yk View Post

Hi sheng,

These requirements can be broken down as follows. pBWA is a _parallel_ implementation of BWA. This means that unless your computer system has multiple processors, this software will be of no use to you. Essentially what pBWA does is distribute massive input reads files over multiple processors in order to execute BWA in parallel. If you do not have access to a computer cluster or parallel machine, this is impossible for you since you do not have multiple processors to distribute over

If you have a standard home computer with a multi-_core_ processor, just use the multithreading option available in the latest release of BWA.

As for the MPICC compiler - if you in fact do have access to a computing cluster, you'll need to ask one of the administrators if the MPI compiler is installed (MPICH or OpenMPI work, actually). If it is installed, it could have an alias over than "mpicc", at which point you'll have to modify the makefile accordingly.

I hope this clears some issues up for you! I have a suspicion you may have been trying to install this on your home or basic lab PC, in which case you will be better off using BWA.

Thanks for posting!

**dp05yk** · 08-23-2011, 05:36 AM

Hi Sheng,

You need to figure out the alias to use to call the MPI compiler. On most clusters this will be "mpicc"... you'll have to contact your system administrator to figure out what this is, or perform a google search for more popular aliases.

Then, in both makefiles (one in the root folder and one in the bwt_gen folder), change
CC = mpicc
to
CC = youralias

Where youralias = the alias used to call your MPI compiler.

**ichorny** · 09-14-2011, 01:40 PM

pBWA and fastq.gz

I notice that when I run mpirun and gzipped fastq files it returns a sam file containing only the header. If I run without mpirun it works just fine.

BTW I am using v2.

Thanks,

Ilya

**dp05yk** · 09-14-2011, 02:21 PM

That's interesting... as the website for pBWA notes, gzipped FASTQ files are not supported since we required random file access to split up the input files.

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Yesterday, 11:49 AM	0 responses 15 views 0 likes	Last Post by seqadmin Yesterday, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News