Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #46
    Dear all,

    Just passing by and saw this thread. It seems lots of people would rather have BWA running on multi-core. Given INDEX is a one-off task for a certain batch of files, and ALN is supporting -t, thus only left SAMPE, SAMSE single-threaded.

    I have a version of SAMPE able to run multithreaded. Although it's done on Windows under CRT (C Runtime), the basic idea can be easily transfered back to the Linux code base, only needed knowledge of MemoryMappedFiles, and some threading concepts, it's just some plumbing wrapped around some core function calls. SAMSE can be done in the same way ( I didn't do that because I only have PE data in my hand).

    My project site is here http://bow.codeplex.com/, you can find some performance data on the release page. Source code is on GitHub: https://github.com/xied75, (oh, should be this branch https://github.com/xied75/bwa/tree/mt-sampe)

    I recently tested this MT-SAMPE on Windows Azure (Cloud) large instance with 4 cores 8GB memory, it runs without any problem on -t 4.

    Best,

    dong

    Comment


    • #47
      Hello,

      I'm trying the following code for MPI use on my cluster. Which has 2 nodes with 8 CPUs each and 32GB ram per node.

      It spawns 16 pbwa processes over the 2 compute nodes. Which seems ok, but checking execution log it seems that pbwa is running the same align process 16 times.

      Is my job wrong?
      Would appreciate some support.

      Of course running this on SGE with Open MPI.

      Code:
      #!/bin/bash
      ### shell
      #$ -S /bin/bash
      ### env path
      #$ -V
      ### name
      #$ -N aln_left
      ### current work directory
      #$ -cwd
      ### merge outputs
      #$ -j y
      ### PE
      #$ -pe mpi 16
      ### select all.q
      #$ -q all.q
      
      
      mpirun pBWA aln -f aln_left /data_in/references/genomes/human/hg19/bwa_ref/hg19.fa /data_in/rawdata/HapMap_1.fastq > /data_out_2/tmp/mpi/HapMap_1.cloud.left.sai

      Comment


      • #48
        I'm stuck at sampe step. pBWA internal documentation shows that pBWA sampe usage is the following one:

        Code:
        Usage:   pBWA sampe -f <output.sam> [options] <prefix> <SAI_FILE_PREFIX1> <SAI_FILE_PREFIX2> <in1.fq> <in2.fq>
        
        Options: -a INT   maximum insert size [500]
                 -o INT   maximum occurrences for one end [100000]
                 -n INT   maximum hits to output for paired reads [3]
                 -N INT   maximum hits to output for discordant pairs [10]
                 -c FLOAT prior of chimeric rate (lower bound) [1.0e-05]
                 -f FILE  sam file name/prefix to output results to
                 -M       merge all sam file prefixes into one file
                 -r STR   read group header line such as `@RG\tID:foo\tSM:bar' [null]
                 -P       preload index into memory (for base-space reads only)
                 -s       disable Smith-Waterman for the unmapped mate
                 -A       disable insert size estimate (force -s)
        
        Notes: 1. For SOLiD reads, <in1.fq> corresponds R3 reads and <in2.fq> to F3.
               2. For reads shorter than 30bp, applying a smaller -o is recommended to
                  to get a sensible speed at the cost of pairing accuracy.
               3. For the SAI prefixes, do NOT include the _1 and _2 generated by aln
                  as sampe will auto-detect these.
        While pBWA website show the following command as an example:

        Code:
        ./pBWA sampe -f SamPrefix /path/to/Index.fa SaiPrefix SaiPrefix[2] /path/to/Read_1.fq /path/to/Read_2.fq
        I don't understand why in website example reference fa file is an arg and in shell man it isn't. And what's the meaning of the first <prefix> arg.

        I'm trying with the following command and i'm getting an error. I've already tried to re-build the index with bwa index and error persists:

        Code:
        pBWA sampe -f output.sam /share/references/genomes/human/hg19/bwa_ref/hg19.fa aln_left aln_right /home/gmarco/input/data/rawdata/HapMap_1.fastq /home/gmarco/input/data/rawdata/HapMap_2.fastq
        
        Proc 0: Found second SAI file - aln_right-1-00000.sai
        Proc 0: [bwa_seq_open] seeked to 0 in /home/gmarco/input/data/rawdata/HapMap_1.fastq
        Proc 0: [bwa_seq_open] seeked to 0 in /home/gmarco/input/data/rawdata/HapMap_2.fastq
        Proc 0: [bwa_sai2sam_pe_core] 262144 reads
        Proc 0: [bwa_sai2sam_pe_core] convert to sequence coordinate...
        Broadcasting BWT (this may take a while)... done!
        [bwt_restore_sa] SA-BWT inconsistency: seq_len is not the same. Abort!
        [sg13:30874] *** Process received signal ***
        [sg13:30874] Signal: Aborted (6)
        [sg13:30874] Signal code:  (-6)
        [sg13:30874] [ 0] /lib64/libpthread.so.0 [0x347b60eb10]
        [sg13:30874] [ 1] /lib64/libc.so.6(gsignal+0x35) [0x3cf8c30265]
        [sg13:30874] [ 2] /lib64/libc.so.6(abort+0x110) [0x3cf8c31d10]
        [sg13:30874] [ 3] pBWA [0x404c52]
        [sg13:30874] [ 4] pBWA(bwt_restore_sa+0xce) [0x4081de]
        [sg13:30874] [ 5] pBWA(bwa_cal_pac_pos_pe+0x1b35) [0x41ad05]
        [sg13:30874] [ 6] pBWA(bwa_sai2sam_pe_core+0x3e3) [0x41b203]
        [sg13:30874] [ 7] pBWA(bwa_sai2sam_pe+0x450) [0x41bef0]
        [sg13:30874] [ 8] pBWA(main+0x96) [0x428206]
        [sg13:30874] [ 9] /lib64/libc.so.6(__libc_start_main+0xf4) [0x3cf8c1d994]
        [sg13:30874] [10] pBWA [0x404b79]
        [sg13:30874] *** End of error message ***
        Aborted
        Thanks.
        Last edited by gmarco; 11-22-2012, 01:06 AM.

        Comment


        • #49
          Dear all,
          Currently we try to run pBWA on our cluster as well.

          Unfortunately, we have problems with proper index generation. I suppose pBWA index differs from bwa one - there is an additional file test_ref.fa.rbwt which we can not produce using regular bwa index.

          Proc 0: [bwa_seq_open] seeked to 0 in reads_1.fq
          Proc 0: [bwa_seq_open] seeked to 0 in reads_2.fq
          Broadcasting BWT (this may take a while)... done!
          Broadcasting BWT (this may take a while)... [bwt_restore_bwt] fail to open file 'test_ref.fa.rbwt'. Abort!

          How to prepare a valid pBWA index out of fasta file?
          Tomasz Stokowy
          www.sequencing.io.gliwice.pl

          Comment


          • #50
            Originally posted by stoker View Post
            Dear all,
            Currently we try to run pBWA on our cluster as well.

            Unfortunately, we have problems with proper index generation. I suppose pBWA index differs from bwa one - there is an additional file test_ref.fa.rbwt which we can not produce using regular bwa index.

            Proc 0: [bwa_seq_open] seeked to 0 in reads_1.fq
            Proc 0: [bwa_seq_open] seeked to 0 in reads_2.fq
            Broadcasting BWT (this may take a while)... done!
            Broadcasting BWT (this may take a while)... [bwt_restore_bwt] fail to open file 'test_ref.fa.rbwt'. Abort!

            How to prepare a valid pBWA index out of fasta file?
            Are you generating the index with the latest BWA version 0.6.2?

            I think i got the same problem. If pBWA version is 0.5.9 you should be using an index generated by BWA 0.5.9 also. It won't work otherwise.

            Regards,
            G.

            Comment


            • #51
              Is there an updated version of pBWA ?

              Comment


              • #52
                Does this work for BWA-MEM ?

                Yes, an updated version of this software will be highly desired since the BWA itself has changed to incorporate MEM option and better indexing algorithm.

                Comment


                • #53
                  I guess the only one who knows if an updated version will be released is dp05yk.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Essential Discoveries and Tools in Epitranscriptomics
                    by seqadmin




                    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                    04-22-2024, 07:01 AM
                  • seqadmin
                    Current Approaches to Protein Sequencing
                    by seqadmin


                    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                    04-04-2024, 04:25 PM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, Yesterday, 11:49 AM
                  0 responses
                  13 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-24-2024, 08:47 AM
                  0 responses
                  16 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-11-2024, 12:08 PM
                  0 responses
                  61 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 10:19 PM
                  0 responses
                  60 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X