Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Why no multithreading for BWA sampe/samse?

    BWA in the align mode (generating the indices of reads on chromosomes) has multiple threads as an option but doesn't seem to have this for generating the actual alignments



    Is it generally I/O-bound (waiting on disk) when doing the SAM generation? Or is this just something on the to-do list?

  • #2
    To implement multi-threading, we need a lock-free hash table; otherwise the hash table will be frequently locked and I guess a lot of CPU time will be spent on frequent locking. More importantly, samse is much faster than aln; sampe is also faster especially for >50bp reads. Multithreading them will not help the wall clock speed greatly. Aln is the speed bottleneck, so it gets multithreaded.

    Comment


    • #3
      THANKS! That clears things up.

      Comment


      • #4
        I've been spending today doing performance testing on Illumina reads - 36 bp per read.

        I am seeing the following performance:

        aln: 2900 reads per second per CPU core
        sampe: 3300 reads per second

        So with four cores, aln is 3 times faster than sampe. Are you seeing different performance?

        With these numbers, the performance is limited by sampe and implementing multithreading will be a big win.

        Comment


        • #5
          I guess sampe is 3300 read pairs per second. It is twice faster than aln in terms of #reads per CPU core. In addition, you will find sampe is even faster for 70bp reads which is becoming available to many labs. A 36bp read has many locations and bwa will consider all of them in pairing. 70bp has much fewer occurrences. That is also why bwa does not work well for 25bp SOLiD reads; sampe will be very slower.

          I know this issue from the very beginning, but implementing a thread-safe/lock-free hash table is not that easy. Thanks anyway.

          EDIT: what is this hash table for, in case someone is curious. The bottleneck in pairing is to convert suffix array coordinates to chromosomal coordinates especially for a highly repetitive read. Bwa uses a hash table to cache large SA intervals such that a large interval that has been converted to chromosome positions will not be converted again. This hash table is global, which adds difficulty to multithreading.
          Last edited by lh3; 02-19-2010, 08:23 PM.

          Comment


          • #6
            The sampe figure is per read, not per pair. Are you seeing different numbers in your experience?

            Also, because sampe requires 3.5GB of RAM, it's not possible to run more than one on an 8GB machine where other things are going on.

            I do understand that there are challenges in implementation and that read lengths are probably going to continue increasing.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Recent Advances in Sequencing Analysis Tools
              by seqadmin


              The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
              05-06-2024, 07:48 AM
            • seqadmin
              Essential Discoveries and Tools in Epitranscriptomics
              by seqadmin




              The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
              04-22-2024, 07:01 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 05-07-2024, 06:57 AM
            0 responses
            12 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 05-06-2024, 07:17 AM
            0 responses
            16 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 05-02-2024, 08:06 AM
            0 responses
            22 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-30-2024, 12:17 PM
            0 responses
            24 views
            0 likes
            Last Post seqadmin  
            Working...
            X