Fast and accurate long read alignment with Burrows-Wheeler transform.
Bioinformatics. 2010 Jan 15. [Epub ahead of print]
Li H, Durbin R.
Wellcome Trust Sanger Institute, Wellcome Genome Campus, Cambridge, CB10 1SA, UK.
MOTIVATION: Many programs for aligning short sequencing reads to a reference genome have been developed in the last two years. Most of them are very efficient for short reads but inefficient or not applicable for reads longer than 200bp because the algorithms are heavily and specifically tuned for short queries with low sequencing error rate. However, some sequencing platforms already produce longer reads and others are expected to become available soon. For longer reads, hashing based software such as BLAT and SSAHA2 remain the only choices. Nonetheless, these methods are substantially slower than short-read aligners in terms of aligned bases per unit time. RESULTS: We designed and implemented a new algorithm, BWA Smith-Waterman alignment (BWA-SW), to align long sequences up to 1 megabases against a large sequence database (e.g. the human genome) with a few gigabytes of memory. The algorithm is as accurate as SSAHA2, more accurate than BLAT, and is several to tens of times faster than both. AVAILABILITY: http://bio-bwa.sourceforge.net CONTACT: [email protected].
PMID: 20080505 [PubMed - as supplied by publisher]
Bioinformatics. 2010 Jan 15. [Epub ahead of print]
Li H, Durbin R.
Wellcome Trust Sanger Institute, Wellcome Genome Campus, Cambridge, CB10 1SA, UK.
MOTIVATION: Many programs for aligning short sequencing reads to a reference genome have been developed in the last two years. Most of them are very efficient for short reads but inefficient or not applicable for reads longer than 200bp because the algorithms are heavily and specifically tuned for short queries with low sequencing error rate. However, some sequencing platforms already produce longer reads and others are expected to become available soon. For longer reads, hashing based software such as BLAT and SSAHA2 remain the only choices. Nonetheless, these methods are substantially slower than short-read aligners in terms of aligned bases per unit time. RESULTS: We designed and implemented a new algorithm, BWA Smith-Waterman alignment (BWA-SW), to align long sequences up to 1 megabases against a large sequence database (e.g. the human genome) with a few gigabytes of memory. The algorithm is as accurate as SSAHA2, more accurate than BLAT, and is several to tens of times faster than both. AVAILABILITY: http://bio-bwa.sourceforge.net CONTACT: [email protected].
PMID: 20080505 [PubMed - as supplied by publisher]
Comment