Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Why no multithreading for BWA sampe/samse?

    BWA in the align mode (generating the indices of reads on chromosomes) has multiple threads as an option but doesn't seem to have this for generating the actual alignments



    Is it generally I/O-bound (waiting on disk) when doing the SAM generation? Or is this just something on the to-do list?

  • #2
    To implement multi-threading, we need a lock-free hash table; otherwise the hash table will be frequently locked and I guess a lot of CPU time will be spent on frequent locking. More importantly, samse is much faster than aln; sampe is also faster especially for >50bp reads. Multithreading them will not help the wall clock speed greatly. Aln is the speed bottleneck, so it gets multithreaded.

    Comment


    • #3
      THANKS! That clears things up.

      Comment


      • #4
        I've been spending today doing performance testing on Illumina reads - 36 bp per read.

        I am seeing the following performance:

        aln: 2900 reads per second per CPU core
        sampe: 3300 reads per second

        So with four cores, aln is 3 times faster than sampe. Are you seeing different performance?

        With these numbers, the performance is limited by sampe and implementing multithreading will be a big win.

        Comment


        • #5
          I guess sampe is 3300 read pairs per second. It is twice faster than aln in terms of #reads per CPU core. In addition, you will find sampe is even faster for 70bp reads which is becoming available to many labs. A 36bp read has many locations and bwa will consider all of them in pairing. 70bp has much fewer occurrences. That is also why bwa does not work well for 25bp SOLiD reads; sampe will be very slower.

          I know this issue from the very beginning, but implementing a thread-safe/lock-free hash table is not that easy. Thanks anyway.

          EDIT: what is this hash table for, in case someone is curious. The bottleneck in pairing is to convert suffix array coordinates to chromosomal coordinates especially for a highly repetitive read. Bwa uses a hash table to cache large SA intervals such that a large interval that has been converted to chromosome positions will not be converted again. This hash table is global, which adds difficulty to multithreading.
          Last edited by lh3; 02-19-2010, 08:23 PM.

          Comment


          • #6
            The sampe figure is per read, not per pair. Are you seeing different numbers in your experience?

            Also, because sampe requires 3.5GB of RAM, it's not possible to run more than one on an 8GB machine where other things are going on.

            I do understand that there are challenges in implementation and that read lengths are probably going to continue increasing.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM
            • seqadmin
              Techniques and Challenges in Conservation Genomics
              by seqadmin



              The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

              Avian Conservation
              Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
              03-08-2024, 10:41 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, Yesterday, 06:37 PM
            0 responses
            10 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, Yesterday, 06:07 PM
            0 responses
            9 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-22-2024, 10:03 AM
            0 responses
            49 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-21-2024, 07:32 AM
            0 responses
            67 views
            0 likes
            Last Post seqadmin  
            Working...
            X