Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Speed up sequence alignments using your video card!

    Schatz, et al have just published a paper in BioMed Central Bioinformatics describing a novel sequence alignment algorithm designed and optimized to run on commonly available graphics processors.

    Why use a graphics processor? My understanding is that processing data for 3D renderings involves specialized parallel processors that can do certain types of calculations extremely fast. This means that if one has optimized code to take advantage of this architecture, it is possible to perform calculations faster than on a standard CPU of the same price. That is the extent to which I can explain the advantages of a graphics processor, as a biologist.

    They demonstrate up to a 10-fold improvement in alignment speed when compared to a standard CPU. The current fastest commercial graphics adapter, the nVidia 8800GTX, was able to run the project 3.79X faster than a single core 3Ghz Xeon processor (which costs the same).

    A link to the paper PDF can be found here: http://www.biomedcentral.com/content...2105-8-474.pdf

    A link the project sourceforge homepage can be found here: http://mummergpu.sourceforge.net

  • #2
    I had a conversation with someone about this, back in 2004. I don't think there was any question, then or now, that using a processor that's specially designed for vector processing would be a lot faster than using a general purpose CPU for vector calculations. Even the article points to several instances of earlier use of GPUs for processing.

    Still, the most interesting part of the article to me is that the improvement vs. processing on a CPU decreases as the sequence length becomes longer. This is probably an artefact of the necessity of caching small chunks of their suffix tree to the GPU at a time. The larger the suffix tree, the more time you need to spend pre-caching suffix tree elements. (Just a guess.. someone tell me if I'm wrong.) That tells me that there's *probably* a dramatically better algorithm out there for this application than a suffix tree.

    In the end, I'm just surprised to see that they managed to get a speed up at all. Sequence alignments are a non-vector application, so the use of a vector processor seems non-intuitive. If this were a molecular simulation, on the other hand... but then again, I believe that's been done before, as well.
    The more you know, the more you know you don't know. —Aristotle

    Comment


    • #3
      Maybe I can answer your questions for you. GPUs aren't exactly vector processors, and have a lot more flexibility than those. Instead think of them as single-board mini-grids containing many lightweight processors that all run the same program at the same time (SIMD, not vector architecture). The processors are optimized for the number crunching needed for rendering 3D graphics, but the programs they run can perform arbitrary computations using regular programming statements like loops and conditionals. This means that if you have a problem that requires the same computation for many different inputs, you can probably use a GPU to speed up your application.

      Current GPUs only cost ~$500, but have up to 256 processors! As such they becoming really attractive platforms for high-throughput computation in many different fields (including molecular dynamics, meteorology, financial, cryptography, ...) . Some applications that perform a lot of number crunching have achieved 100x speedup over the CPU. In contrast, MUMmerGPU performs very little number crunching, but is very data intensive. As such, the processors on the GPU can't run at full speed, and have to wait for data to move around on the board. Even still, MUMmerGPU gets ~10x speedup on the 8800 GTX with 128 processors for short reads. Over the last couple months we reworked how the data is organized, and we have managed to double that speed. Check the MUMmerGPU Sourceforge page for a new release soon.

      As for apfejes' comment about decreasing performance with longer reads, this is an artifact of how we organize the suffix tree on the board. The GPU has a very small cache, so we put the tree on the board in a very specific way to try to get as much use of the cache as possible (see the paper for all the gory details). It wasn't until recently that we fully understood the problem, but the way that we place the tree on the board is sub-optimal for longer reads. Again we are actively working on this and the next release should have much more consistent performance.

      If you have any more questions, feel free to post here or email me directly.

      Thanks for you interest,

      Michael Schatz

      Comment


      • #4
        Thanks for the reply - that was really helpful. I look forward to reading about the future releases!

        Anthony
        The more you know, the more you know you don't know. —Aristotle

        Comment


        • #5
          Michael,

          thanks for the explanation. What is the minimum requirements for the graphichs card and what is most important, mem size / bus /speed or number of processors? Does it work in SLI with two cards? Also, have you done any speed comparisons to other short read aligners?

          Comment


          • #6
            Sequence alignment is vectorizable, and there are various SIMD implementations. There is a brute force sequence aligner in the FASTA package that uses SIMD, for example.

            If you want to align multiple sequences, it's even easier. I've been working on a brute force aligner of short reads to a reference that runs on Cell processors such as the PlayStation 3, available here: http://savannah.nongnu.org/projects/myrialign/

            I am impressed that they've managed to do MUMmer on a GPU, it uses quite a different algorithm to the usual dynamic programming sequence alignment, afaik.

            Comment


            • #7
              Originally posted by mschatz View Post
              Maybe I can answer your questions for you. GPUs aren't exactly vector processors, and have a lot more flexibility than those. Instead think of them as single-board mini-grids containing many lightweight processors that all run the same program at the same time (SIMD, not vector architecture). The processors are optimized for the number crunching needed for rendering 3D graphics, but the programs they run can perform arbitrary computations using regular programming statements like loops and conditionals. This means that if you have a problem that requires the same computation for many different inputs, you can probably use a GPU to speed up your application.

              Current GPUs only cost ~$500, but have up to 256 processors! As such they becoming really attractive platforms for high-throughput computation in many different fields (including molecular dynamics, meteorology, financial, cryptography, ...) . Some applications that perform a lot of number crunching have achieved 100x speedup over the CPU. In contrast, MUMmerGPU performs very little number crunching, but is very data intensive. As such, the processors on the GPU can't run at full speed, and have to wait for data to move around on the board. Even still, MUMmerGPU gets ~10x speedup on the 8800 GTX with 128 processors for short reads. Over the last couple months we reworked how the data is organized, and we have managed to double that speed. Check the MUMmerGPU Sourceforge page for a new release soon.

              As for apfejes' comment about decreasing performance with longer reads, this is an artifact of how we organize the suffix tree on the board. The GPU has a very small cache, so we put the tree on the board in a very specific way to try to get as much use of the cache as possible (see the paper for all the gory details). It wasn't until recently that we fully understood the problem, but the way that we place the tree on the board is sub-optimal for longer reads. Again we are actively working on this and the next release should have much more consistent performance.

              If you have any more questions, feel free to post here or email me directly.

              Thanks for you interest,

              Michael Schatz

              Any work still going on in this field or are the bowtie-type aligners on cpu superior?

              Comment


              • #8
                Originally posted by Chipper View Post
                Any work still going on in this field or are the bowtie-type aligners on cpu superior?
                Mike and I submitted a second paper on MUMmerGPU a couple of months back, but it's still under review. The paper contains a new GPGPU algorithm for translating suffix tree node coordinates into reference coordinates. It also contains a very detailed exploration about how seemingly orthogonal design decisions interact because of the peculiarities of the GPU architecture. The new paper is more targeted to the GPGPU community than to bioinformaticians.

                Mike, Ben Langmead, and I have actually spent some time thinking about putting Bowtie on the GPU, but we're worried about the relatively long latency of the GPU's memory bus. The architecture is organized so that sucking down big streams of data (e.g. large textures) is fast, but other than the initial loading of the reads, that's not the access pattern of Burrows-Wheeler search. Bowtie's performance essentially comes down to waiting for small chunks of data to come in from the memory bus (i.e. cache misses). Since recent nVidia GPUs have a global memory latency that is substantially longer than that of your typical x86 cache miss, I worry that you'd wipe out all your gains from massively parallel processing in the longer per-read processing time.

                That said, suffix tree traversal was supposed to be a bad fit for GPGPU for the same reasons, and the MUMmerGPU search kernel was substantially faster on the GPU than on the CPU. I doubt the three of us will get to putting Bowtie on the GPU, but if there's some brave soul out there willing to give it a try... nVidia makes cards now that have big enough memories to store the Bowtie index of the human genome.

                Comment


                • #9
                  Thanks Cole. It would be fun though to see if a set-up like htttp://fastra.ua.ac.be/en/index.html or http://www.asrock.com/news/pop/X58/index.htm could be used for sequence analysis.

                  Comment


                  • #10
                    unfortunately CUDA will not work with xen kernel, which uses for instant RHEL5

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Strategies for Sequencing Challenging Samples
                      by seqadmin


                      Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                      03-22-2024, 06:39 AM
                    • seqadmin
                      Techniques and Challenges in Conservation Genomics
                      by seqadmin



                      The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                      Avian Conservation
                      Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                      03-08-2024, 10:41 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, Yesterday, 06:37 PM
                    0 responses
                    8 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, Yesterday, 06:07 PM
                    0 responses
                    8 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 03-22-2024, 10:03 AM
                    0 responses
                    49 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 03-21-2024, 07:32 AM
                    0 responses
                    66 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X