Seqanswers Leaderboard Ad

**twu** · 12-06-2011, 12:36 PM

Heng, thanks for your comments about GSNAP. I will think more about how to get more informative mapping quality results, and would welcome any further suggestions you might have. Actually, one of the reasons I haven't done much with the mapping quality calculations, is that my colleagues here have used BWA+GATK for SNP calling, and they told me that GSNAP had similar behavior to BWA on its mapping quality calculations. But perhaps they were wrong.

I also noted your timing results where the GSNAP paired-end algorithm is more than 2 times slower than the single-end algorithm. One of the reasons is that for paired-end data, GSNAP looks deeper at suboptimal results on each of the two ends in order to get a concordant result. In some cases, GSNAP may need to do its own version of a Smith-Waterman alignment in the neighborhood of a good alignment for the other end. Instead of using Smith-Waterman, though, GSNAP uses its GMAP algorithm, which is good for finding splicing, because our main application so far has been RNA-Seq, rather than DNA-Seq.

GSNAP is also like BWA in that it does not use base quality scores for alignment. We also do not use base quality scores for trimming, but just pass the information on to the SNP caller.

**zee** · 12-06-2011, 12:42 PM

Hi Heng,

Would you mind sharing the parameters you used for Bowtie2-beta4 on 100bp Illumina reads?

Thanks.

Originally posted by lh3 View Post

Updated to bowtie2-beta4. On accuracy, bowtie2-beta4 is similar to bwa-sw overall. I have also done the comparison on real data following the way I used in the bwa-sw paper. Out of 138k 454 reads with average read length 355bp, bwa-sw misses 1094+58 good alignments (~90% shorter than 100bp) and gives 31 questionable alignments, while bowtie2-beta4 misses 13+91 good alignments and gives 65 questionable alignments. The accuracy is largely indistinguishable for practical applications. On speed, Bowtie2 is about 20% faster and uses less memory.

In conclusion, bowtie2-beta4 has similar accuracy to bwa-sw for both 100bp simulated data and 350bp real 454 data. It is one of the best (accuracy+speed) mappers for hiseq and 454 reads. I will start to recommend it to others along with smalt/novoalign/gsnap. I think a missing feature in bowtie2 is to properly report chimeric alignments, which is essential to mapping even longer sequences. This should be fairly easy to implement.

**cjp** · 12-07-2011, 02:16 AM

I think they are here (and other aligners' parameters too):

NGS mapper ROC curves

http://lh3lh3.users.sourceforge.net/alnROC.shtml

Chris

**wlangdon** · 01-23-2013, 02:09 AM

I just did a speed accuracy v test on Cancer Institue paired end sequences. GP tweak to
Bowtie2 came out fastest (4 times speed of BWA) took less than half the memory and
had almost the same accuracy (82.1% v 83.1%)
See http://arxiv.org/abs/1301.5187
Bill

**wlangdon** · 01-23-2013, 10:16 AM

ps: Bowtie2 has a --very-sensitive command line option which can increase its
accuracy at the expense of increased run time. (In one case by 0.8% to 83.4% but
run time increased by 53%).
Bill

**ka90** · 03-28-2013, 01:16 PM

Sorry for bringing this thread back to life. I was wondering how they compared now that bowtie2 has a stable release and is not in beta anymore. Has anyone been able to compare the latest versions of bowtie2 with other mappers? If so, can you provide your observations.

**shi** · 04-05-2013, 11:40 PM

Hi ka90,

We have recently made comparisons for a few aligners. Please see

http://nar.oxfordjournals.org/content/early/2013/04/03/nar.gkt214.abstract

Cheers
Wei

**oiiio** · 04-13-2013, 06:24 PM

Since it appears this excellent thread has been resurrected recently already-

I'd like to show you all a comparison of bowtie2+GATK and other pipelines for variant calling on the NA12878 illumina exome with 150x coverage. These variant calling reports are generated by the GCAT resource on bioplanet.com:

http://www.bioplanet.com/gcat/report...x/bowtie-atlas

You can use the check box menu at left to choose other pipelines to compare to.

**zee** · 04-13-2013, 06:34 PM

GCAT is great because it allows you to run and submit your own datasets for public scrutiny. We are going to make good use of it.

**lh3** · 04-14-2013, 06:18 AM

Very impressive website. There is though a question: when evaluating alignments, how do you tell if an alignment is correct? If there is a clipping in an alignment, how do you deal with that?

I ask this because bwa-sw is surprisingly bad. While bwa-sw is less accurate than bwa-mem, it should be similar to bowtie2. A possible cause is that among the mappers evaluated on your website, bwa-sw reports the most soft clipping. If you do not correct clipping, bwa-sw will be easily the worst.

**oiiio** · 04-14-2013, 07:54 AM

Thanks Heng -

It does account for soft clipping, and also allows +/- 5bp in deciding if the alignment is correct. I also find your observation very surprising, because in most of the reports BWA-SW is more accurate than Bowtie2. For example, this report is for a 100bp paired-end illumina sample: http://www.bioplanet.com/gcat/reports/21/alignment/100bp-pe-small-indel/bwa_sw . I see here that BWA-SW appears more accurate than Bowtie2 as more reads are considered, but you are right that it looks bad at the beginning of the graph.

Also, it would awesome for building the GCAT community if you could ask these types of questions on the surrounding forum. The lead developers are watching there, and continuously add improvements as users make suggestions. The user name "lh3" is also available

**adaptivegenome** · 04-14-2013, 08:00 AM

Originally posted by oiiio View Post

Thanks Heng -

It does account for soft clipping, and also allows +/- 5bp in deciding if the alignment is correct. I also find your observation very surprising, because in most of the reports BWA-SW is more accurate than Bowtie2. For example, this report is for a 100bp paired-end illumina sample: http://www.bioplanet.com/gcat/reports/21/alignment/100bp-pe-small-indel/bwa_sw . I see here that BWA-SW appears more accurate than Bowtie2 as more reads are considered, but you are right that it looks bad at the beginning of the graph.

Also, it would awesome for building the GCAT community if you could ask these types of questions on the surrounding forum. The lead developers are watching there, and continuously add improvements as users make suggestions. The user name "lh3" is also available

I would be interested in hearing what suggestions Heng might have for better metrics. GCAT is awesome but we need to think of more ways to evaluate pipelines...

**adaptivegenome** · 04-14-2013, 08:08 AM

Here I started a new thread since we have sort of highjacked this one...

New way to compare mappers and variant callers - SEQanswers

http://seqanswers.com/forums/showthread.php?p=101688#post101688

Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

**lh3** · 04-14-2013, 08:30 AM

Well, fewer people will see if I comment there. Perhaps the developers might consider to open a thread here.

I am looking at the ROC-like curves. For all data sets, BWA-SW quickly picks up high mapQ wrong alignments. But as you have considered clipping, maybe that is really the fault of bwa-sw. I don't know for sure. Anyway, for typical illumina/454/iontorrent reads, bwa-sw is now deprecated by bwa-mem.

For exome variant calling, it would be better to give statistics in the target regions only.

**oiiio** · 04-14-2013, 08:43 AM

I started to reply in adaptivegenome's purposed thread here: http://seqanswers.com/forums/showthr...688#post101688

Topics	Statistics	Last Post
Evaluating Genome Sequencing for ECMO Patients in the NICU by seqadmin Started by seqadmin, 12-17-2024, 10:28 AM	0 responses 27 views 0 likes	Last Post by seqadmin 12-17-2024, 10:28 AM
New Genetic Toolkit Refines Studies on Gene Function and Disease by seqadmin Started by seqadmin, 12-13-2024, 08:24 AM	0 responses 43 views 0 likes	Last Post by seqadmin 12-13-2024, 08:24 AM
Study Links Brain Mechanism to Emotional Responses in Animals and Humans by seqadmin Started by seqadmin, 12-12-2024, 07:41 AM	0 responses 29 views 0 likes	Last Post by seqadmin 12-12-2024, 07:41 AM
Study Identifies Ribosomal RNA Fingerprints as Early Cancer Biomarkers by seqadmin Started by seqadmin, 12-11-2024, 07:45 AM	0 responses 42 views 0 likes	Last Post by seqadmin 12-11-2024, 07:45 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News