Has anyone compared the speed of BWA and Bowtie 2? How about the accuracy for both point mutation and indels?
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
-
Updated to bowtie2-beta3 and added timing. If you wonder why the sensitivity in the plot is different from that in the bowtie2 poster, that is because 1) bwa-short is indeed not very sensitive on real single-end data without trimming; bwa-sw is much better; 2) That poster is counting all alignments, but I am counting "unique" alignments only. Bowtie2 can map many reads, but it has difficulty in distinguishing good and bad hits and thus give many good hits low mapping quality. Beta3 is much better than beta2 at this point, but still not perfect.
Basically bowtie2 chooses a nice balance point where it is the fastest without much loss of accuracy in comparison to others, but for variant calling for Illumina data, novoalign/smalt/bwa/gsnap may still be the mapper of choice. Things may change in future of course. Bowtie2 is still in beta, while bwa and bwa-sw are mature (i.e. not many improvements can be made).Last edited by lh3; 11-05-2011, 08:14 AM.
Comment
-
In fact, we have done extensive comparisons of Bowtie2 versus both BWA and BWA-SW. Across multiple parameter settings for both tools, we found that Bowtie2 is (a) faster and (b) more sensitive than both programs. We tested it on 2,000,000 human reads, paired and unpaired, from an Illumina HiSeq instrument. I would note that the test by user lh3 (Heng Li, the author of BWA) used only simulated reads, and only 200,000 of them. Our tests were larger and more realistic.
We have detailed figures that Ben Langmead just presented at the Genome Informatics conference. I can't post the figures here, which contain dozens of experiments, but I will just post a few points showing performance using the default settings of Bowtie2 and BWA (and SOAP2):
Aligner Options Running time % reads aligned Mem(GB)
Bowtie2 --sensitive 11m:17s 96.94% 2.3
BWA -k 2 -l 32 -o 1 30m:52s 91.80% 2.4
SOAP2 -l 256 -v 5 -g 0 5m:08s 84.43% 5.3
As you can see, Bowtie2 aligned 5% more of the reads than BWA, and was 3 times faster.
We also compared Bowtie2 to BWA-SW on Ion Torrent and 454 reads, which contain many indels. Bowtie2 was superior to BWA-SW on both speed and sensitivity for a wide range of parameter settings of both programs.
We also compared the accuracy of both BWA and Bowtie on human reads in a simulation using 3 million paired and unpaired 75 bp Illumina reads, simulated so we knew the "truth". Note that this is 30 times more data than lh3's simulated results on his website. Our findings were that Bowtie2 aligned approximately 3% more reads correctly from unpaired reads, and approximately 1% more reads correctly from paired reads. This test used default parameters of both programs.
Thus in our tests, Bowtie2 is faster, more sensitive, and more accurate than BWA across a wide range of parameter settings.Last edited by salzberg; 11-05-2011, 08:33 AM.
Comment
-
@salzberg
You still avoid talking about "unique" alignments. For the seeding strategy like bowtie2, it is trivial to find a hit. But as I said, a key flaw in bowtie2 as well as bowtie1 is that sometimes it is unable to distinguish unique hits and repetitive hits and thus give low mapping quality to unique hits. It is more sensitive to a hit, but not sensitive to a unique hit. Also for 100bp single-end reads, the bowtie2 equivalence is really bwa-sw, not bwa-short; for paired-end reads, BWA-short will gain a lot of sensitivity and be much more accurate. Users like 1000g/sanger/broad also enable trimming on real data, though this seems unfair to bowtie2 and bowtie2 should still outperform in terms of overall sensitivity.
I believe I am usually fair in all benchmarks even involving my own programs. In my benchmark, bwa/bwa-sw is clearly not the best and I am not hiding that at all. I am not trying to make bowtie2 worse.
Perhaps the different result on simulated data is only because the simulation is different. I would love to see a ROC curve, which in my view the most informative plot revealing the overall accuracy (sensitivity vs. specificity) of a mapper. In your post, you were only talking about sensitivity, not specificity.Last edited by lh3; 11-05-2011, 09:08 AM.
Comment
-
Originally posted by lh3 View Post@salzberg
You still avoid talking about "unique" alignments.
I am not sure that there is any special application that requires a very sensitive aligner, with lots of false positives.
Comment
-
@lh3:
>>I believe I am usually fair in all benchmarks even involving my own programs. In my
>>benchmark, bwa/bwa-sw is clearly not the best and I am not hiding that at all. I am not
>>trying to make bowtie2 worse.
I understand that you believe you were being fair. But a single test using 100,000 error-free reads is rather unrealistic. Our tests on real data showed very different results from yours. Our tests on simulated data (not error-free, though) also showed very different results, so I'm not sure how you measured false positives. Given that there are billions of real reads now available, I think there's no reason not to do tests on real data as well.
The notion of "correct" mapping for multi-reads is a subtle one that many users don't care about: i.e., finding just the right mapping for a read that maps to 10, 100, or 1000 places doesn't really matter for most applications, even if it is possible to find such a mapping. My guess is that other than repetitive reads, all the aligners generally get the mappings right - and then the issue is whether they can find a mapping if the reads have errors and polymorphisms, which is what users do care about.
Comment
-
Originally posted by salzberg View Post@lh3:
My guess is that other than repetitive reads, all the aligners generally get the mappings right
Comment
-
I never do simulation with error-free reads. The reads in my simulation contain variants, which is equivalent to 1% SNP+INDEL error rate. 100k reads are enough for investigating specificity around 0.01% - we still have 100 wrong mappings, so the variance is pretty small. Also, I have run simulations for tens of millions of reads. The relative performance of novoalign, bwa and bwa-sw always stays the same. I also wanted to use real data, but it is hard to evaluate specificity on real data because there is no ground truth. One of the viable measurements is described the bwa-sw paper, but it is quite complicated to apply in practice to multiple mappers.
Nearly all aligners use heuristics. Few of them can guarantee to find the best hit even if the top hit is clearly (i.e. in all sensible scoring schemes) better than other hits. Here are several examples. In the following table, each line consists of bowtie2 position, bowtie2 XM:XO:XG, correct position, bwa samse XM:XO:XG and bwa-sw AS:XS (these examples also prove that my simulation is not error-free):
9:134616048 7:0:0 1:12746267 2:0:0 (bwa-sw wrong)
17:5319362 7:1:1 1:28924148 1:1:1 88:77
X:70135101 7:0:0 1:185975011 2:0:0 76:72
1:153251402 4:1:1 2:116348184 2:1:1 85:77
19:42604275 8:0:0 5:178218515 3:0:0 (bwa-sw wrong)
4:260872 6:1:1 7:129633785 0:1:1 92:76
All these reads do not have multiple hits, but you can see that bowtie2 misses the optimal position and chooses a position with more mismatches/gaps. I am not using these examples to argue bwa is more accurate -- I can of course find examples where bowtie2 does a better job than bwa -- what I want to argue is that even for "unique" hits, different mappers give different answers. Finding the "unique" hits is a really hard task. We cannot assume all mappers created with the same specificity. The ROC curve has shown this already.
As to the differences between your and my evaluations, I think they mainly come from two aspects: 1) for sensitivity, I am only counting hits with mapping quality greater than 0-3 (depending on mappers), but you are counting all hits including mapQ=0 hits; 2) I am evaluating specificity, while all your measurements are essentially sensitivity. Your conclusion is not inconsistent with mine. We just have different focuses. If I follow the same philosophy of yours, I am sure I will come to your conclusion with my 100k SE/PE reads/pairs, but I believe specificity and sensitivity of hits clearly having optimal positions are more important to accuracy-critical applications like variant calling and the discovery of structural variations.
EDIT: genericforms reminds me that there is still a question about what accuracy is enough. I do not know the definite answer. It is possible that the difference between two mappers is so subtle that we do not observe differences in SNP/INDEL calls from real data, though my very limited experience seems to suggest the contrary. I could be wrong at the point.Last edited by lh3; 11-06-2011, 08:59 AM.
Comment
-
We are going to try a comparison as well. When we compare mappers on the basis of proper read placement we plot TP/FP and we do this for different MapQs.
I agree with Heng Li in that users will be interested in recall rates for point mutations as well indels of different sizes. So we will explicitly examine this as well. I will let you guys know what we find.
Comment
-
Originally posted by lh3 View PostEDIT: genericforms reminds me that there is still a question about what accuracy is enough. I do not know the definite answer. It is possible that the difference between two mappers is so subtle that we do not observe differences in SNP/INDEL calls from real data, though my very limited experience seems to suggest the contrary. I could be wrong at the point.
Comment
-
So Bowtie is definitely faster and we are able to reproduce the sensitivity gain, however if you account for false-positives, BWA clearly wins out. We simulated a fly genome (120MB) and 15X coverage and 100bp reads. There was a 0.1% mutation rate including 10% indels. The indels ranged from 1 to 10 bases.
So salzberg, what would be helpful is if you can try to reproduce what we have done with your 2million human reads. Tell me if you find a similar result.
Comment
Latest Articles
Collapse
-
by seqadmin
Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...-
Channel: Articles
12-16-2024, 07:57 AM -
-
by seqadmin
Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.
Long-Read Sequencing
Long-read sequencing has seen remarkable advancements,...-
Channel: Articles
12-02-2024, 01:49 PM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 12-17-2024, 10:28 AM
|
0 responses
26 views
0 likes
|
Last Post
by seqadmin
12-17-2024, 10:28 AM
|
||
Started by seqadmin, 12-13-2024, 08:24 AM
|
0 responses
42 views
0 likes
|
Last Post
by seqadmin
12-13-2024, 08:24 AM
|
||
Started by seqadmin, 12-12-2024, 07:41 AM
|
0 responses
28 views
0 likes
|
Last Post
by seqadmin
12-12-2024, 07:41 AM
|
||
Started by seqadmin, 12-11-2024, 07:45 AM
|
0 responses
42 views
0 likes
|
Last Post
by seqadmin
12-11-2024, 07:45 AM
|
Comment