![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
[ sequence alignments, big data, bio informatics ] x [ programming and UI design ] | delinquentme | Bioinformatics | 1 | 07-04-2011 09:54 PM |
visualization tools for sequence alignments and assembling | dicty | Bioinformatics | 4 | 03-01-2011 01:48 PM |
finding polymorphisms by large sequence alignments | Alex Clop | Bioinformatics | 8 | 11-24-2008 07:52 PM |
In Sequence: At AGBT, 454, Illumina, ABI Vow to Improve Speed, Yield, Quality of Next | Newsbot! | Illumina/Solexa | 0 | 02-19-2008 02:13 PM |
![]() |
|
Thread Tools |
![]() |
#1 |
--Site Admin--
Location: SF Bay Area, CA, USA Join Date: Oct 2007
Posts: 1,358
|
![]()
Schatz, et al have just published a paper in BioMed Central Bioinformatics describing a novel sequence alignment algorithm designed and optimized to run on commonly available graphics processors.
Why use a graphics processor? My understanding is that processing data for 3D renderings involves specialized parallel processors that can do certain types of calculations extremely fast. This means that if one has optimized code to take advantage of this architecture, it is possible to perform calculations faster than on a standard CPU of the same price. That is the extent to which I can explain the advantages of a graphics processor, as a biologist. ![]() They demonstrate up to a 10-fold improvement in alignment speed when compared to a standard CPU. The current fastest commercial graphics adapter, the nVidia 8800GTX, was able to run the project 3.79X faster than a single core 3Ghz Xeon processor (which costs the same). A link to the paper PDF can be found here: http://www.biomedcentral.com/content...2105-8-474.pdf A link the project sourceforge homepage can be found here: http://mummergpu.sourceforge.net |
![]() |
![]() |
![]() |
#2 |
Senior Member
Location: Oakland, California Join Date: Feb 2008
Posts: 236
|
![]()
I had a conversation with someone about this, back in 2004. I don't think there was any question, then or now, that using a processor that's specially designed for vector processing would be a lot faster than using a general purpose CPU for vector calculations. Even the article points to several instances of earlier use of GPUs for processing.
Still, the most interesting part of the article to me is that the improvement vs. processing on a CPU decreases as the sequence length becomes longer. This is probably an artefact of the necessity of caching small chunks of their suffix tree to the GPU at a time. The larger the suffix tree, the more time you need to spend pre-caching suffix tree elements. (Just a guess.. someone tell me if I'm wrong.) That tells me that there's *probably* a dramatically better algorithm out there for this application than a suffix tree. In the end, I'm just surprised to see that they managed to get a speed up at all. Sequence alignments are a non-vector application, so the use of a vector processor seems non-intuitive. If this were a molecular simulation, on the other hand... but then again, I believe that's been done before, as well.
__________________
The more you know, the more you know you don't know. —Aristotle |
![]() |
![]() |
![]() |
#3 |
Junior Member
Location: CSHL Join Date: Apr 2008
Posts: 3
|
![]()
Maybe I can answer your questions for you. GPUs aren't exactly vector processors, and have a lot more flexibility than those. Instead think of them as single-board mini-grids containing many lightweight processors that all run the same program at the same time (SIMD, not vector architecture). The processors are optimized for the number crunching needed for rendering 3D graphics, but the programs they run can perform arbitrary computations using regular programming statements like loops and conditionals. This means that if you have a problem that requires the same computation for many different inputs, you can probably use a GPU to speed up your application.
Current GPUs only cost ~$500, but have up to 256 processors! As such they becoming really attractive platforms for high-throughput computation in many different fields (including molecular dynamics, meteorology, financial, cryptography, ...) . Some applications that perform a lot of number crunching have achieved 100x speedup over the CPU. In contrast, MUMmerGPU performs very little number crunching, but is very data intensive. As such, the processors on the GPU can't run at full speed, and have to wait for data to move around on the board. Even still, MUMmerGPU gets ~10x speedup on the 8800 GTX with 128 processors for short reads. Over the last couple months we reworked how the data is organized, and we have managed to double that speed. Check the MUMmerGPU Sourceforge page for a new release soon. As for apfejes' comment about decreasing performance with longer reads, this is an artifact of how we organize the suffix tree on the board. The GPU has a very small cache, so we put the tree on the board in a very specific way to try to get as much use of the cache as possible (see the paper for all the gory details). It wasn't until recently that we fully understood the problem, but the way that we place the tree on the board is sub-optimal for longer reads. Again we are actively working on this and the next release should have much more consistent performance. If you have any more questions, feel free to post here or email me directly. Thanks for you interest, Michael Schatz |
![]() |
![]() |
![]() |
#4 |
Senior Member
Location: Oakland, California Join Date: Feb 2008
Posts: 236
|
![]()
Thanks for the reply - that was really helpful. I look forward to reading about the future releases!
Anthony
__________________
The more you know, the more you know you don't know. —Aristotle |
![]() |
![]() |
![]() |
#5 |
Senior Member
Location: Sweden Join Date: Mar 2008
Posts: 324
|
![]()
Michael,
thanks for the explanation. What is the minimum requirements for the graphichs card and what is most important, mem size / bus /speed or number of processors? Does it work in SLI with two cards? Also, have you done any speed comparisons to other short read aligners? |
![]() |
![]() |
![]() |
#6 |
Junior Member
Location: Melbourne Join Date: May 2008
Posts: 7
|
![]()
Sequence alignment is vectorizable, and there are various SIMD implementations. There is a brute force sequence aligner in the FASTA package that uses SIMD, for example.
If you want to align multiple sequences, it's even easier. I've been working on a brute force aligner of short reads to a reference that runs on Cell processors such as the PlayStation 3, available here: http://savannah.nongnu.org/projects/myrialign/ I am impressed that they've managed to do MUMmer on a GPU, it uses quite a different algorithm to the usual dynamic programming sequence alignment, afaik. |
![]() |
![]() |
![]() |
#7 | |
Senior Member
Location: Sweden Join Date: Mar 2008
Posts: 324
|
![]() Quote:
Any work still going on in this field or are the bowtie-type aligners on cpu superior? |
|
![]() |
![]() |
![]() |
#8 | |
Senior Member
Location: Boston, MA Join Date: Nov 2008
Posts: 212
|
![]() Quote:
Mike, Ben Langmead, and I have actually spent some time thinking about putting Bowtie on the GPU, but we're worried about the relatively long latency of the GPU's memory bus. The architecture is organized so that sucking down big streams of data (e.g. large textures) is fast, but other than the initial loading of the reads, that's not the access pattern of Burrows-Wheeler search. Bowtie's performance essentially comes down to waiting for small chunks of data to come in from the memory bus (i.e. cache misses). Since recent nVidia GPUs have a global memory latency that is substantially longer than that of your typical x86 cache miss, I worry that you'd wipe out all your gains from massively parallel processing in the longer per-read processing time. That said, suffix tree traversal was supposed to be a bad fit for GPGPU for the same reasons, and the MUMmerGPU search kernel was substantially faster on the GPU than on the CPU. I doubt the three of us will get to putting Bowtie on the GPU, but if there's some brave soul out there willing to give it a try... nVidia makes cards now that have big enough memories to store the Bowtie index of the human genome. |
|
![]() |
![]() |
![]() |
#9 |
Senior Member
Location: Sweden Join Date: Mar 2008
Posts: 324
|
![]()
Thanks Cole. It would be fun though to see if a set-up like htttp://fastra.ua.ac.be/en/index.html or http://www.asrock.com/news/pop/X58/index.htm could be used for sequence analysis.
|
![]() |
![]() |
![]() |
#10 |
Junior Member
Location: Berlin, Germany Join Date: Dec 2009
Posts: 7
|
![]()
unfortunately CUDA will not work with xen kernel, which uses for instant RHEL5
|
![]() |
![]() |
![]() |
Thread Tools | |
|
|