![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Bowtie: Ultrafast and memory-efficient alignment of short reads to the human genome | Ben Langmead | Literature Watch | 2 | 03-04-2013 03:06 AM |
The best short read aligner | Deutsche | Bioinformatics | 4 | 04-14-2011 08:12 PM |
Short Read Micro re-Aligner Paper | nilshomer | Literature Watch | 0 | 10-29-2010 10:59 AM |
New Short Read Aligner | sparks | Bioinformatics | 48 | 08-26-2009 09:01 AM |
Very Short Read aligner | Rupinder | Bioinformatics | 1 | 06-02-2009 08:10 PM |
![]() |
|
Thread Tools |
![]() |
#1 |
Senior Member
Location: Baltimore, MD Join Date: Sep 2008
Posts: 200
|
![]()
Hello all,
If you work with large genomes and large sets of short reads, please take a look at Bowtie (http://bowtie-bio.sf.net), a new open source short read aligner written by myself and Cole Trapnell at the University of Maryland. Bowtie is an ultrafast, memory-efficient short read aligner. It aligns short reads to the human genome at a rate of 25 million reads per hour on a typical workstation with 2 gigabytes of memory. Bowtie indexes the genome with a Burrows-Wheeler index to keep its memory footprint small: about 1.3 GB for the human genome. It supports alignment policies equivalent to Maq and SOAP, but at much greater speeds. As a denizen of these forums, you probably appreciate that there are now many, many short read aligners to choose from. Our goal with Bowtie was to exploit an algorithmic "sweet spot" to bring ultrafast read alignment to typical desktop computers. These days, a typical desktop has 2 or 4 gigabytes of RAM and multiple (2 or 4) processor cores. I recently used Bowtie on my own 4-core, 2 GB desktop to align 14.3x coverage worth of Illumina/Solexa reads from the 1000-Genomes project to the human genome in a single overnight (14 hours). This is significantly faster than both Eland and ZOOM, and makes it much easier and faster to extract biological evidence from these huge datasets. Here is a brief feature list, but if you are interested then please check our site regularly because Bowtie is actively being developed and maintained:
thread, Bowtie does not yet support paired-end alignment or indels. Both features are very much on our to-do list, though, so please keep an eye out new versions over the coming months. Thanks very much! Ben Langmead |
![]() |
![]() |
![]() |
#2 |
--Site Admin--
Location: SF Bay Area, CA, USA Join Date: Oct 2007
Posts: 1,358
|
![]()
Nice work Ben. Happy to have you here! Any plans for colorspace?
![]() |
![]() |
![]() |
![]() |
#3 |
Senior Member
Location: Baltimore, MD Join Date: Sep 2008
Posts: 200
|
![]()
Hi ECO. We've talked through how we would add colorspace support, and it's conceptually pretty simple. It is work, though! Right now, we consider indel and paired-end support the two biggest missing pieces.
Is ABI support valuable to you? We're always interested to hear what features people want. Thanks, Ben |
![]() |
![]() |
![]() |
#4 |
--Site Admin--
Location: SF Bay Area, CA, USA Join Date: Oct 2007
Posts: 1,358
|
![]()
Good to hear it's on the feature list somewhere!
It's definitely in my interest to have fast cutting edge tools that support colorspace. I'm drooling at 35x faster than maq. ![]() |
![]() |
![]() |
![]() |
#5 |
Member
Location: northern hemisphere Join Date: Mar 2008
Posts: 50
|
![]()
What license is it released under?
|
![]() |
![]() |
![]() |
#6 |
Senior Member
Location: Baltimore, MD Join Date: Sep 2008
Posts: 200
|
![]()
It's released under the Artistic License, which is free and lacks a reciprocity clause (the thing that scares some people about the GPL).
|
![]() |
![]() |
![]() |
#7 |
Junior Member
Location: Dundee Join Date: Jul 2008
Posts: 4
|
![]()
OK, so I have downloaded ZOOM this week having seen the paper in Bioinformatics and found that for my purposes it is much faster than vmatch.
I rewrite my scripts and start data processing and then come across your announcement above. There are some programs which claim a massive speedup that is only detectable by using sophisticated benchmarks, or carefully designed datasets. So I used the first chunk of my analysis to benchmark as that would be realistic for my purposes. I'm looking for matches where the oligo can have up to 2 mismatches and may match up to 4 times per chromosome. I'm not using quality scores as I have already prefiltered the data by quality so have mixed length input data. 20K sequences vs human chr1 is the benchmark test. All performed on the same hardware which is (I think) a quad core 8GB RAM machine reading and writing to a fibrechannel connected disk array. vmatch - 240 mins or thereabouts. ZOOM - 23 mins Bowtie - 20 seconds. I'll be sending you the medical bill for my bruised jaw. No longer can I stall my collaborators by telling them that the analysis is still running and they should leave me to my coffee.. ..d |
![]() |
![]() |
![]() |
#8 |
Junior Member
Location: Dundee Join Date: Jul 2008
Posts: 4
|
![]() |
![]() |
![]() |
![]() |
#9 |
NGS specialist
Location: Malaysia Join Date: Apr 2008
Posts: 249
|
![]()
Dmamartin,
Could you perhaps report the results of your mapping benchmark with novoalign (www.novocraft.com)? It will be interesting to see how it performs on your reads in terms of speed and any other metrics e.g. specificity/sensitivity. Bowtie is really good. I tried it out and it gets the job done in an incredibly short time so that's a huge benefit. Building an index of the human genome with bowtie-index took almost 4 hours (2.4 GHz Xeon, 32Gb RAM) but that's only a once off thing and I can see how the BW method shows superiority in alignment seeding. We could probably adapt it in later versions if there is a major differential on short read alignment performance. |
![]() |
![]() |
![]() |
#10 |
Junior Member
Location: Dundee Join Date: Jul 2008
Posts: 4
|
![]()
We'll see what we can do. Having no need in the immediate future to rerun the analysis it may take a short while to get around to it, but we will definitely add it to the bench mark test one of my colleagues will be doing (in a more elegant and rigorous manner than my quick and dirty run).
..d |
![]() |
![]() |
![]() |
#11 |
Senior Member
Location: Sweden Join Date: Mar 2008
Posts: 324
|
![]()
Ok, 9 million reads in less than 2 minutes???
![]() ![]() ![]() And this with reads of different lengths which I think no other program allows. Did not believe it first, but alignments seems to be valid. Amazing stuff. |
![]() |
![]() |
![]() |
#12 |
NGS specialist
Location: Malaysia Join Date: Apr 2008
Posts: 249
|
![]()
Novoalign does variable length reads for both single and paired-end runs.
|
![]() |
![]() |
![]() |
#13 |
--Site Admin--
Location: SF Bay Area, CA, USA Join Date: Oct 2007
Posts: 1,358
|
![]()
Added Bowtie.
|
![]() |
![]() |
![]() |
#14 |
Senior Member
Location: Sweden Join Date: Mar 2008
Posts: 324
|
![]() |
![]() |
![]() |
![]() |
#15 |
NGS specialist
Location: Malaysia Join Date: Apr 2008
Posts: 249
|
![]()
Like Eland, Bowtie is exceptional because it is fast and has many of the desirable features we want out of a short read aligner. The Burrows-Wheeler index is one of the most efficient methods for rapid K-mer searching. In the future I think we will see more of these efficient techniques being used for solving the problem of high-throughput mapping.
I feel as though the standard should be that we align them faster than we can sequence them ![]() |
![]() |
![]() |
![]() |
#16 |
Senior Member
Location: Kuala Lumpur, Malaysia Join Date: Mar 2008
Posts: 126
|
![]()
Hi Chipper,
With novoalign try setting option -t60. This will limit to 2 mismatches at high quality base positions or maybe a 1 base insert/delete. It should run a bit faster. If you want to try novolaign with no indel capability set -o200 or something like that. It'll make a gap open so expensive that novoalign will do an ungapped alignment. It should improve performance further. The -t option of Novoalign is a bit like -e option of Bowtie. Novolaign will limit penalty (quality) to 30 for all bases so even a base that has Phred quality of 50 will only get penalised 30 points for a mismatch - this allows for SNP rates. Memory will still be higher than Bowtie. Cheers, Colin |
![]() |
![]() |
![]() |
#17 |
Senior Member
Location: San Diego Join Date: May 2008
Posts: 912
|
![]()
Could Bowtie be altered to have an interative trimming function, like SOAP has? I just did a quick comparison, and while untrimmed SOAP and Bowtie had about the same number of aligned reads with no trimming, (and Bowtie was much faster) I find that iteratively trimming the last bases with SOAP, 8 at a time, gives a huge boost to the number of reads that align, up to 30%.
|
![]() |
![]() |
![]() |
#18 |
Senior Member
Location: Sweden Join Date: Mar 2008
Posts: 324
|
![]()
It has an option already to trim before alignment (-3 / -5) so why not try with that. It would help though if the unaligned reads were saved separately.
|
![]() |
![]() |
![]() |
#19 |
Senior Member
Location: San Diego Join Date: May 2008
Posts: 912
|
![]()
But I don't want to trim accurate bases needlessly. The virture of iterative trimming is that it only trims as many as it needs.
I could run the program a bunch of times with different trimming, and recombine the data after, but that's a pain, and might not be as efficient as having the program trim each read as it is handling it. |
![]() |
![]() |
![]() |
#20 |
Senior Member
Location: Baltimore, MD Join Date: Sep 2008
Posts: 200
|
![]()
Hi swbarnes2 - I'm going to add your suggestion to the sourceforge feature request list. You seem invested in SOAP-style alignment, but I would note that Maq-style (the default) accomplishes something like iterative trimming by simply discounting the penalty associated with mismatches at low-quality positions (usually clustered at the 3' end).
Out of curiosity, do you use SOAP's mode for aligning with indels? |
![]() |
![]() |
![]() |
Thread Tools | |
|
|