SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Bowtie: Ultrafast and memory-efficient alignment of short reads to the human genome Ben Langmead Literature Watch 2 03-04-2013 02:06 AM
The best short read aligner Deutsche Bioinformatics 4 04-14-2011 07:12 PM
Short Read Micro re-Aligner Paper nilshomer Literature Watch 0 10-29-2010 09:59 AM
New Short Read Aligner sparks Bioinformatics 48 08-26-2009 08:01 AM
Very Short Read aligner Rupinder Bioinformatics 1 06-02-2009 07:10 PM

Reply
 
Thread Tools
Old 10-22-2008, 05:48 PM   #1
Ben Langmead
Senior Member
 
Location: Baltimore, MD

Join Date: Sep 2008
Posts: 199
Default Bowtie, an ultrafast, memory-efficient, open source short read aligner

Hello all,

If you work with large genomes and large sets of short reads, please
take a look at Bowtie (http://bowtie-bio.sf.net), a new open source
short read aligner written by myself and Cole Trapnell at the
University of Maryland. Bowtie is an ultrafast, memory-efficient short
read aligner. It aligns short reads to the human genome at a rate of 25
million reads per hour on a typical workstation with 2 gigabytes of
memory. Bowtie indexes the genome with a Burrows-Wheeler index to keep
its memory footprint small: about 1.3 GB for the human genome. It
supports alignment policies equivalent to Maq and SOAP, but at much
greater speeds.

As a denizen of these forums, you probably appreciate that there are
now many, many short read aligners to choose from. Our goal with
Bowtie was to exploit an algorithmic "sweet spot" to bring ultrafast
read alignment to typical desktop computers. These days, a typical
desktop has 2 or 4 gigabytes of RAM and multiple (2 or 4) processor
cores. I recently used Bowtie on my own 4-core, 2 GB desktop to align
14.3x coverage worth of Illumina/Solexa reads from the 1000-Genomes
project to the human genome in a single overnight (14 hours). This is
significantly faster than both Eland and ZOOM, and makes it much easier
and faster to extract biological evidence from these huge datasets.

Here is a brief feature list, but if you are interested then please
check our site regularly because Bowtie is actively being developed and
maintained:
  • Extremely fast!
  • Specify any number of parallel search threads with -p (uses pthreads) to exploit multiple processor cores
  • Small index: for human, memory footprint is ~1.3GB (with -z option), size on disk is ~2.2GB
  • Pre-built indexes available from website: http://bowtie-bio.sf.net
    • Human, chimp, dog, mouse, rat, chicken, a. thaliana, fruitfly, etc.
  • Input formats: FASTA, FASTQ, FASTQ w/ Solexa quals, raw, command-line
  • Includes tool to convert Bowtie output to a Maq .map file so that you can use Bowtie's output with, e.g., 'maq assemble' and 'maq cns2cnp'
  • Use -n option to activate a Maq-like policy
    • N (set with -n) mismatches allowed in first L (set with -l) bases
    • Sum of quality values at mismatched positions may not exceed E (set with -e)
  • Use -v option to activate a SOAP-like policy
    • V (set with -v) mismatches allowed in the whole alignment
    • Quality values are ignored
  • Flexible reporting:
    • Use -k to report K alignments
    • Use -a to report all alignments
    • Use --best to guarantee that the alignment(s) reported are "best" in terms of # of mismatches
    • These come at a cost to speed! See manual for details.
As mentioned in the "Software packages for next gen sequence analysis"
thread, Bowtie does not yet support paired-end alignment or indels.
Both features are very much on our to-do list, though, so please keep
an eye out new versions over the coming months.

Thanks very much!
Ben Langmead
Ben Langmead is offline   Reply With Quote
Old 10-22-2008, 06:19 PM   #2
ECO
--Site Admin--
 
Location: SF Bay Area, CA, USA

Join Date: Oct 2007
Posts: 1,297
Default

Nice work Ben. Happy to have you here! Any plans for colorspace?
ECO is offline   Reply With Quote
Old 10-22-2008, 06:27 PM   #3
Ben Langmead
Senior Member
 
Location: Baltimore, MD

Join Date: Sep 2008
Posts: 199
Default

Hi ECO. We've talked through how we would add colorspace support, and it's conceptually pretty simple. It is work, though! Right now, we consider indel and paired-end support the two biggest missing pieces.

Is ABI support valuable to you? We're always interested to hear what features people want.

Thanks,
Ben
Ben Langmead is offline   Reply With Quote
Old 10-22-2008, 07:28 PM   #4
ECO
--Site Admin--
 
Location: SF Bay Area, CA, USA

Join Date: Oct 2007
Posts: 1,297
Default

Good to hear it's on the feature list somewhere!

It's definitely in my interest to have fast cutting edge tools that support colorspace. I'm drooling at 35x faster than maq.
ECO is offline   Reply With Quote
Old 10-23-2008, 12:30 AM   #5
new300
Member
 
Location: northern hemisphere

Join Date: Mar 2008
Posts: 50
Default

What license is it released under?
new300 is offline   Reply With Quote
Old 10-23-2008, 07:05 AM   #6
Ben Langmead
Senior Member
 
Location: Baltimore, MD

Join Date: Sep 2008
Posts: 199
Default

It's released under the Artistic License, which is free and lacks a reciprocity clause (the thing that scares some people about the GPL).
Ben Langmead is offline   Reply With Quote
Old 10-24-2008, 12:04 AM   #7
dmamartin
Junior Member
 
Location: Dundee

Join Date: Jul 2008
Posts: 4
Default

OK, so I have downloaded ZOOM this week having seen the paper in Bioinformatics and found that for my purposes it is much faster than vmatch.
I rewrite my scripts and start data processing and then come across your announcement above.

There are some programs which claim a massive speedup that is only detectable by using sophisticated benchmarks, or carefully designed datasets. So I used the first chunk of my analysis to benchmark as that would be realistic for my purposes.

I'm looking for matches where the oligo can have up to 2 mismatches and may match up to 4 times per chromosome. I'm not using quality scores as I have already prefiltered the data by quality so have mixed length input data.

20K sequences vs human chr1 is the benchmark test. All performed on the same hardware which is (I think) a quad core 8GB RAM machine reading and writing to a fibrechannel connected disk array.

vmatch - 240 mins or thereabouts.
ZOOM - 23 mins
Bowtie - 20 seconds.

I'll be sending you the medical bill for my bruised jaw. No longer can I stall my collaborators by telling them that the analysis is still running and they should leave me to my coffee..

..d
dmamartin is offline   Reply With Quote
Old 10-24-2008, 12:05 AM   #8
dmamartin
Junior Member
 
Location: Dundee

Join Date: Jul 2008
Posts: 4
Default

Quote:
Originally Posted by dmamartin View Post
Bowtie - 20 seconds.
That is with --best -k 100, not the most speedy of searches.
dmamartin is offline   Reply With Quote
Old 10-25-2008, 05:00 PM   #9
zee
NGS specialist
 
Location: Malaysia

Join Date: Apr 2008
Posts: 247
Default

Dmamartin,

Could you perhaps report the results of your mapping benchmark with novoalign (www.novocraft.com)? It will be interesting to see how it performs on your reads in terms of speed and any other metrics e.g. specificity/sensitivity.

Bowtie is really good. I tried it out and it gets the job done in an incredibly short time so that's a huge benefit. Building an index of the human genome with bowtie-index took almost 4 hours (2.4 GHz Xeon, 32Gb RAM) but that's only a once off thing and I can see how the BW method shows superiority in alignment seeding.
We could probably adapt it in later versions if there is a major differential on short read alignment performance.
zee is offline   Reply With Quote
Old 10-26-2008, 01:00 AM   #10
dmamartin
Junior Member
 
Location: Dundee

Join Date: Jul 2008
Posts: 4
Default

We'll see what we can do. Having no need in the immediate future to rerun the analysis it may take a short while to get around to it, but we will definitely add it to the bench mark test one of my colleagues will be doing (in a more elegant and rigorous manner than my quick and dirty run).

..d
dmamartin is offline   Reply With Quote
Old 11-03-2008, 11:48 AM   #11
Chipper
Senior Member
 
Location: Sweden

Join Date: Mar 2008
Posts: 289
Smile

Ok, 9 million reads in less than 2 minutes???

And this with reads of different lengths which I think no other program allows. Did not believe it first, but alignments seems to be valid. Amazing stuff.
Chipper is offline   Reply With Quote
Old 11-03-2008, 12:51 PM   #12
zee
NGS specialist
 
Location: Malaysia

Join Date: Apr 2008
Posts: 247
Default

Quote:
Originally Posted by Chipper View Post
Ok, 9 million reads in less than 2 minutes???

And this with reads of different lengths which I think no other program allows. Did not believe it first, but alignments seems to be valid. Amazing stuff.
Novoalign does variable length reads for both single and paired-end runs.
zee is offline   Reply With Quote
Old 11-03-2008, 01:08 PM   #13
ECO
--Site Admin--
 
Location: SF Bay Area, CA, USA

Join Date: Oct 2007
Posts: 1,297
Default

Added Bowtie.
ECO is offline   Reply With Quote
Old 11-03-2008, 01:33 PM   #14
Chipper
Senior Member
 
Location: Sweden

Join Date: Mar 2008
Posts: 289
Default

Quote:
Originally Posted by zee View Post
Novoalign does variable length reads for both single and paired-end runs.
Thanks, now I know better. I tried it now and it seems to work well, just not as fast.
Chipper is offline   Reply With Quote
Old 11-03-2008, 01:52 PM   #15
zee
NGS specialist
 
Location: Malaysia

Join Date: Apr 2008
Posts: 247
Default

Like Eland, Bowtie is exceptional because it is fast and has many of the desirable features we want out of a short read aligner. The Burrows-Wheeler index is one of the most efficient methods for rapid K-mer searching. In the future I think we will see more of these efficient techniques being used for solving the problem of high-throughput mapping.

I feel as though the standard should be that we align them faster than we can sequence them
zee is offline   Reply With Quote
Old 11-10-2008, 11:55 PM   #16
sparks
Senior Member
 
Location: Kuala Lumpur, Malaysia

Join Date: Mar 2008
Posts: 126
Default

Hi Chipper,

With novoalign try setting option -t60. This will limit to 2 mismatches at high quality base positions or maybe a 1 base insert/delete. It should run a bit faster.

If you want to try novolaign with no indel capability set -o200 or something like that. It'll make a gap open so expensive that novoalign will do an ungapped alignment. It should improve performance further.

The -t option of Novoalign is a bit like -e option of Bowtie. Novolaign will limit penalty (quality) to 30 for all bases so even a base that has Phred quality of 50 will only get penalised 30 points for a mismatch - this allows for SNP rates.

Memory will still be higher than Bowtie.

Cheers, Colin
sparks is offline   Reply With Quote
Old 11-11-2008, 02:11 PM   #17
swbarnes2
Senior Member
 
Location: San Diego

Join Date: May 2008
Posts: 901
Default

Could Bowtie be altered to have an interative trimming function, like SOAP has? I just did a quick comparison, and while untrimmed SOAP and Bowtie had about the same number of aligned reads with no trimming, (and Bowtie was much faster) I find that iteratively trimming the last bases with SOAP, 8 at a time, gives a huge boost to the number of reads that align, up to 30%.
swbarnes2 is offline   Reply With Quote
Old 11-12-2008, 03:35 AM   #18
Chipper
Senior Member
 
Location: Sweden

Join Date: Mar 2008
Posts: 289
Default

It has an option already to trim before alignment (-3 / -5) so why not try with that. It would help though if the unaligned reads were saved separately.
Chipper is offline   Reply With Quote
Old 11-12-2008, 10:07 AM   #19
swbarnes2
Senior Member
 
Location: San Diego

Join Date: May 2008
Posts: 901
Default

But I don't want to trim accurate bases needlessly. The virture of iterative trimming is that it only trims as many as it needs.

I could run the program a bunch of times with different trimming, and recombine the data after, but that's a pain, and might not be as efficient as having the program trim each read as it is handling it.
swbarnes2 is offline   Reply With Quote
Old 11-13-2008, 04:37 AM   #20
Ben Langmead
Senior Member
 
Location: Baltimore, MD

Join Date: Sep 2008
Posts: 199
Default

Hi swbarnes2 - I'm going to add your suggestion to the sourceforge feature request list. You seem invested in SOAP-style alignment, but I would note that Maq-style (the default) accomplishes something like iterative trimming by simply discounting the penalty associated with mismatches at low-quality positions (usually clustered at the 3' end).

Out of curiosity, do you use SOAP's mode for aligning with indels?
Ben Langmead is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:40 AM.


Powered by vBulletin® Version 3.8.6
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.