SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Comparing output from Bowtie and BWA maasha Bioinformatics 15 10-25-2012 05:56 AM
Velvet Assembler: expected coverage versus estimated coverage versus effective covera DMCH Bioinformatics 1 11-30-2011 04:21 AM
Bowtie vs BWA sarbashis Illumina/Solexa 12 08-26-2011 04:46 AM
Bowtie vs BWA: only 50% overlapping SNPs a11msp Bioinformatics 4 10-14-2010 03:22 AM
bowtie vs bwa + samtools == confusion lletourn Bioinformatics 10 06-11-2010 04:06 AM

Reply
 
Thread Tools
Old 11-19-2011, 06:22 AM   #41
lh3
Senior Member
 
Location: Boston

Join Date: Feb 2008
Posts: 693
Default

@Thomas I quite like the GSNAP algorithm as well as its implementation and I have recommended it to others already. It is one of the top NGS mappers nowadays. My opinion is for GSNAP the only thing might be improved is a more useful mapping quality. I know GSNAP gives mapQ, but for "unique" hits, the vast majority get a mapQ 40 and the few hits with higher mapQ are actually not more accurate. Perhaps having higher mapQ may not help the standard SNP calling too much, but there are areas where extremely high mapping accurate is preferred.

I am not sure how to improve mapQ for single-end mapping, but I kinda think you should be able to derive better mapQ for paired-end mapping. It seems to me that GSNAP will visit more suboptimal hits in the PE mode. By seeing more hits and using the pairing information, you can know some hits can barely wrong.
lh3 is offline   Reply With Quote
Old 11-19-2011, 06:39 AM   #42
lh3
Senior Member
 
Location: Boston

Join Date: Feb 2008
Posts: 693
Default

BTW, here is an interesting observation on speed. I simulated 100k single-end (SE) reads and 100k pairs of paired-end (PE) reads (200k reads). One would think a program should run about twice as slow in the PE mode simply because there are twice as many reads. This is true for bowtie2/bwa/bwa-sw. Nonetheless, gsnap is much slower in the PE mode. My guess is in the PE mode, gsnap visits more suboptimal hits to get more reads paired. It is slower, but the accuracy is also higher. On the other hand, both novoalign and smalt are faster in the PE mode, but at a cost the false positive rate also goes slightly higher. My explanation is that they do not map each end separately and then pair them (bwa/bwa-sw does this), but rather map the pair as a whole.
lh3 is offline   Reply With Quote
Old 11-24-2011, 06:43 PM   #43
adaptivegenome
Super Moderator
 
Location: US

Join Date: Nov 2009
Posts: 437
Default

Heng, I have a quick question. In trying to use simulated data to recall mutations, would you base recalibrate BWA-mapped reads? With real human data you can use dbSNP, however with simulated data what would you use?
adaptivegenome is offline   Reply With Quote
Old 11-25-2011, 05:24 AM   #44
lh3
Senior Member
 
Location: Boston

Join Date: Feb 2008
Posts: 693
Default

BWA does not use base quality during alignment except for trimming. I have not been convinced that the difference between using base quality or not has a significant effect on downstream data analyses.
lh3 is offline   Reply With Quote
Old 11-25-2011, 10:31 AM   #45
adaptivegenome
Super Moderator
 
Location: US

Join Date: Nov 2009
Posts: 437
Default

I have been wondering the same thing. So if I was to compare the recall of mutations from BWA mapped reads to a mapper that does recalibrate base qualities, you do think it would matter if I use GATK to first recall rate the reads mapped by BWA? You I think did this previously in a paper with Nils, right? How did you do the comparison?

Sorry for all the questions!
adaptivegenome is offline   Reply With Quote
Old 12-06-2011, 11:22 AM   #46
lh3
Senior Member
 
Location: Boston

Join Date: Feb 2008
Posts: 693
Default

Updated to bowtie2-beta4. On accuracy, bowtie2-beta4 is similar to bwa-sw overall. I have also done the comparison on real data following the way I used in the bwa-sw paper. Out of 138k 454 reads with average read length 355bp, bwa-sw misses 1094+58 good alignments (~90% shorter than 100bp) and gives 31 questionable alignments, while bowtie2-beta4 misses 13+91 good alignments and gives 65 questionable alignments. The accuracy is largely indistinguishable for practical applications. On speed, Bowtie2 is about 20% faster and uses less memory.

In conclusion, bowtie2-beta4 has similar accuracy to bwa-sw for both 100bp simulated data and 350bp real 454 data. It is one of the best (accuracy+speed) mappers for hiseq and 454 reads. I will start to recommend it to others along with smalt/novoalign/gsnap. I think a missing feature in bowtie2 is to properly report chimeric alignments, which is essential to mapping even longer sequences. This should be fairly easy to implement.

Last edited by lh3; 12-06-2011 at 11:24 AM. Reason: typo
lh3 is offline   Reply With Quote
Old 12-06-2011, 11:36 AM   #47
twu
Developer of GMAP and GSNAP
 
Location: South San Francisco, CA

Join Date: Oct 2011
Posts: 17
Default

Heng, thanks for your comments about GSNAP. I will think more about how to get more informative mapping quality results, and would welcome any further suggestions you might have. Actually, one of the reasons I haven't done much with the mapping quality calculations, is that my colleagues here have used BWA+GATK for SNP calling, and they told me that GSNAP had similar behavior to BWA on its mapping quality calculations. But perhaps they were wrong.

I also noted your timing results where the GSNAP paired-end algorithm is more than 2 times slower than the single-end algorithm. One of the reasons is that for paired-end data, GSNAP looks deeper at suboptimal results on each of the two ends in order to get a concordant result. In some cases, GSNAP may need to do its own version of a Smith-Waterman alignment in the neighborhood of a good alignment for the other end. Instead of using Smith-Waterman, though, GSNAP uses its GMAP algorithm, which is good for finding splicing, because our main application so far has been RNA-Seq, rather than DNA-Seq.

GSNAP is also like BWA in that it does not use base quality scores for alignment. We also do not use base quality scores for trimming, but just pass the information on to the SNP caller.

Last edited by twu; 12-06-2011 at 11:40 AM.
twu is offline   Reply With Quote
Old 12-06-2011, 11:42 AM   #48
zee
NGS specialist
 
Location: Malaysia

Join Date: Apr 2008
Posts: 249
Default

Hi Heng,

Would you mind sharing the parameters you used for Bowtie2-beta4 on 100bp Illumina reads?

Thanks.

Quote:
Originally Posted by lh3 View Post
Updated to bowtie2-beta4. On accuracy, bowtie2-beta4 is similar to bwa-sw overall. I have also done the comparison on real data following the way I used in the bwa-sw paper. Out of 138k 454 reads with average read length 355bp, bwa-sw misses 1094+58 good alignments (~90% shorter than 100bp) and gives 31 questionable alignments, while bowtie2-beta4 misses 13+91 good alignments and gives 65 questionable alignments. The accuracy is largely indistinguishable for practical applications. On speed, Bowtie2 is about 20% faster and uses less memory.

In conclusion, bowtie2-beta4 has similar accuracy to bwa-sw for both 100bp simulated data and 350bp real 454 data. It is one of the best (accuracy+speed) mappers for hiseq and 454 reads. I will start to recommend it to others along with smalt/novoalign/gsnap. I think a missing feature in bowtie2 is to properly report chimeric alignments, which is essential to mapping even longer sequences. This should be fairly easy to implement.
zee is offline   Reply With Quote
Old 12-07-2011, 01:16 AM   #49
cjp
Member
 
Location: Cambridge, United Kingdom

Join Date: Jun 2011
Posts: 58
Default

I think they are here (and other aligners' parameters too):

http://lh3lh3.users.sourceforge.net/alnROC.shtml

Chris
cjp is offline   Reply With Quote
Old 01-23-2013, 01:09 AM   #50
wlangdon
Member
 
Location: ucl

Join Date: Nov 2012
Posts: 15
Default

I just did a speed accuracy v test on Cancer Institue paired end sequences. GP tweak to
Bowtie2 came out fastest (4 times speed of BWA) took less than half the memory and
had almost the same accuracy (82.1% v 83.1%)
See http://arxiv.org/abs/1301.5187
Bill
wlangdon is offline   Reply With Quote
Old 01-23-2013, 09:16 AM   #51
wlangdon
Member
 
Location: ucl

Join Date: Nov 2012
Posts: 15
Default

ps: Bowtie2 has a --very-sensitive command line option which can increase its
accuracy at the expense of increased run time. (In one case by 0.8% to 83.4% but
run time increased by 53%).
Bill
wlangdon is offline   Reply With Quote
Old 03-28-2013, 01:16 PM   #52
ka90
Junior Member
 
Location: USA

Join Date: Mar 2013
Posts: 1
Default

Sorry for bringing this thread back to life. I was wondering how they compared now that bowtie2 has a stable release and is not in beta anymore. Has anyone been able to compare the latest versions of bowtie2 with other mappers? If so, can you provide your observations.
ka90 is offline   Reply With Quote
Old 04-05-2013, 11:40 PM   #53
shi
Wei Shi
 
Location: Australia

Join Date: Feb 2010
Posts: 235
Default

Hi ka90,

We have recently made comparisons for a few aligners. Please see

http://nar.oxfordjournals.org/conten...kt214.abstract

Cheers
Wei
shi is offline   Reply With Quote
Old 04-13-2013, 06:24 PM   #54
oiiio
Senior Member
 
Location: USA

Join Date: Jan 2011
Posts: 104
Default

Since it appears this excellent thread has been resurrected recently already-

I'd like to show you all a comparison of bowtie2+GATK and other pipelines for variant calling on the NA12878 illumina exome with 150x coverage. These variant calling reports are generated by the GCAT resource on bioplanet.com:

http://www.bioplanet.com/gcat/report...x/bowtie-atlas

You can use the check box menu at left to choose other pipelines to compare to.
oiiio is offline   Reply With Quote
Old 04-13-2013, 06:34 PM   #55
zee
NGS specialist
 
Location: Malaysia

Join Date: Apr 2008
Posts: 249
Default

GCAT is great because it allows you to run and submit your own datasets for public scrutiny. We are going to make good use of it.
zee is offline   Reply With Quote
Old 04-14-2013, 06:18 AM   #56
lh3
Senior Member
 
Location: Boston

Join Date: Feb 2008
Posts: 693
Default

Very impressive website. There is though a question: when evaluating alignments, how do you tell if an alignment is correct? If there is a clipping in an alignment, how do you deal with that?

I ask this because bwa-sw is surprisingly bad. While bwa-sw is less accurate than bwa-mem, it should be similar to bowtie2. A possible cause is that among the mappers evaluated on your website, bwa-sw reports the most soft clipping. If you do not correct clipping, bwa-sw will be easily the worst.

Last edited by lh3; 04-14-2013 at 06:22 AM.
lh3 is offline   Reply With Quote
Old 04-14-2013, 07:54 AM   #57
oiiio
Senior Member
 
Location: USA

Join Date: Jan 2011
Posts: 104
Default

Thanks Heng -

It does account for soft clipping, and also allows +/- 5bp in deciding if the alignment is correct. I also find your observation very surprising, because in most of the reports BWA-SW is more accurate than Bowtie2. For example, this report is for a 100bp paired-end illumina sample: http://www.bioplanet.com/gcat/reports/21/alignment/100bp-pe-small-indel/bwa_sw . I see here that BWA-SW appears more accurate than Bowtie2 as more reads are considered, but you are right that it looks bad at the beginning of the graph.

Also, it would awesome for building the GCAT community if you could ask these types of questions on the surrounding forum. The lead developers are watching there, and continuously add improvements as users make suggestions. The user name "lh3" is also available
oiiio is offline   Reply With Quote
Old 04-14-2013, 08:00 AM   #58
adaptivegenome
Super Moderator
 
Location: US

Join Date: Nov 2009
Posts: 437
Default

Quote:
Originally Posted by oiiio View Post
Thanks Heng -

It does account for soft clipping, and also allows +/- 5bp in deciding if the alignment is correct. I also find your observation very surprising, because in most of the reports BWA-SW is more accurate than Bowtie2. For example, this report is for a 100bp paired-end illumina sample: http://www.bioplanet.com/gcat/reports/21/alignment/100bp-pe-small-indel/bwa_sw . I see here that BWA-SW appears more accurate than Bowtie2 as more reads are considered, but you are right that it looks bad at the beginning of the graph.

Also, it would awesome for building the GCAT community if you could ask these types of questions on the surrounding forum. The lead developers are watching there, and continuously add improvements as users make suggestions. The user name "lh3" is also available
I would be interested in hearing what suggestions Heng might have for better metrics. GCAT is awesome but we need to think of more ways to evaluate pipelines...
adaptivegenome is offline   Reply With Quote
Old 04-14-2013, 08:08 AM   #59
adaptivegenome
Super Moderator
 
Location: US

Join Date: Nov 2009
Posts: 437
Default

Here I started a new thread since we have sort of highjacked this one...

http://seqanswers.com/forums/showthr...688#post101688
adaptivegenome is offline   Reply With Quote
Old 04-14-2013, 08:30 AM   #60
lh3
Senior Member
 
Location: Boston

Join Date: Feb 2008
Posts: 693
Default

Well, fewer people will see if I comment there. Perhaps the developers might consider to open a thread here.

I am looking at the ROC-like curves. For all data sets, BWA-SW quickly picks up high mapQ wrong alignments. But as you have considered clipping, maybe that is really the fault of bwa-sw. I don't know for sure. Anyway, for typical illumina/454/iontorrent reads, bwa-sw is now deprecated by bwa-mem.

For exome variant calling, it would be better to give statistics in the target regions only.
lh3 is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:49 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO