Seqanswers Leaderboard Ad

**salzberg** · 11-07-2011, 10:56 AM

@lh3 (Heng Li): you above "I never do simulation with error free reads." Yet you wrote on your webpage that you "simulate error free reads from the diploid genome." That is why I pointed out that you used error-free reads - you said so yourself.

@genericforms: you assert without proof that BWA "clearly wins out" if you account for false positives. Our results contradict this. We simulated both sequencing error (using the ART simulator v1.1.5) and the results of variation between individuals in our experiments, using 3 million paired-end reads. Bowtie2 assigned more reads to their true point of origin than BWA.

We have submitted our results in a paper which is in the peer review process right now. I encourage both of you to do the same. Un-refereed claims on this forum are little more than anecdotes (which is true of my comments too, of course, so I won't be posting any more).

Meanwhile I encourage everyone to try Bowtie2, which in our experiments has demonstrated unparalleled speed, sensitivity, and accuracy.

**lh3** · 11-07-2011, 01:15 PM

Steven, the sentence following "error free" explains it: "Although reads are error free, many reads cannot be perfectly mapped to the reference genome due to the presence of variations." (This sentence was on the very first version of that webpage.)

Perhaps you are still mixing overall sensitivity with sensitivity to unique hits and specificity. It is probably my problem of not explaining it clearly. As many others are also reading this thread, I will try to do better. I only compare bwa-sw and bwa-short to avoid sensitive issues.

I have known for a long time that on single-end 100bp real data, bwa-sw almost always correctly maps more reads than bwa-short. However, as bwa-sw does not have sufficient power to distinguish a good and a bad hit, it has to assign low mapping quality to a lot of perfectly "unique" hits to avoid giving too many high-quality false alignments. The effect is if we run a SNP caller, we sometimes call more correct SNPs from the bwa-short alignment than from bwa-sw, although bwa-sw maps much more reads. To this end, the sensitivity is only meaningful to real applications when the mapper has the ability to disambiguate good and bad hits. Bwa-sw is much more sensitive than bwa-short overall, but not always more sensitive to real applications (EDIT: bwa-sw may have better specificity for 100bp SE data, though).

For variant calling, sensitivity is actually not the major concern. We have already dropped several percent of reads in repetitive regions and filtered tens percent of reads with the Illumina pipeline, it does not hurt too much if we have a marginally higher false negative rate. The sensitivity is even less of a concern given deep sequencing because the coverage compensates the missing alignments due to excessive sequencing errors. In contrast, specificity is much more important especially given that mapping errors tend to be recurrent: if we wrongly map one read, we are likely to wrongly map other reads in the same region affected by the same true variants. The mere sequencing coverage may not help greatly to correct wrong variant calls caused by mapping errors. It is to me critical to evaluate specificity which you have not talked about much in your posts. Note that to evaluate specificity, we have to count the fraction of reads misplaced out of mapped reads. The overall number of correctly mapped reads has little to do with specificity. If a mapper maps more correct reads but also much more wrong reads, it is still a mapper with low specificity. Take bwa-sw and bwa-short as an example again. If reads have low quality tail, bwa-sw can even map more reads than bwa-short given paired-end reads, but I know for sure that bwa-short will greatly outperform bwa-sw in terms of specificity because bwa-sw does not use the pairing information to correct wrong alignments while bwa-short does.

Again, as I revisited the whole thread, I think we are just focusing on different measurements. We are both correct on the measurements we are interested in. Genericforms actually confirms both of us.

IMHO, being peer-reviewed does not always mean to be more correct. If I really want to write a paper on this evaluation, I am sure with my track of record I can get it published, but this does not make me more correct than you or others. My previous evaluations on maq/bwa/bwa-sw were all flawed if I look back (I was thinking the evaluations were the best possible at the time of writing the manuscripts, but I was wrong), but they have all been accepted. My review on alignment algorithms uses a similar ROC plot, it gets peer-reviewed and published, too.

Actually 1000g took similar procedure to evaluate read mappers about 2 years ago. I was not involved except suggesting measurements (simulation, evaluation and program running were all done by others). In some way, this is better than peer-review in that the measurement has been reviewed by many more. Also, in my benchmark, the whole procedure is open sourced and every command line is given. Everyone can try by themselves to validate if I am biased, wrong or lying. Many published papers do not have reproducibility of this level.

Given that I always think you are correct on the measurements you are using, I will also stop posting, too. This discussion is very helpful to me. Thank you.

**maubp** · 11-07-2011, 01:19 PM

Originally posted by salzberg View Post

We have submitted our results in a paper which is in the peer review process right now. I encourage both of you to do the same. Un-refereed claims on this forum are little more than anecdotes (which is true of my comments too, of course, so I won't be posting any more).

You don't see this kind of online discussion as part of the future of peer review then?

**salzberg** · 11-07-2011, 02:36 PM

hi Heng:
I appreciate your clarifications which are helpful.

I do want to mention that you are using "specificity" where I am pretty sure you mean "precision". (This is a widespread problem in the field - but I'm trying to correct it where I can.) E.g., you wrote: "If a mapper maps more correct reads but also much more wrong reads, it is still a mapper with low specificity." The definition of specificity is:
number of true negatives/(num of true negatives + num of false positives)
A "true negative" in the short-read alignment world is not very well defined, but we could define it as not aligning a read that doesn't belong to the genome at all. In any case, that's not what you mean.

Precision is defined as TP/(TP+FP). So I think you mean "precision" in what you are describing.

We know that Bowtie2 is not perfect - far from it! But we think it is a substantial improvement over Bowtie1. Ben Langmead has already made some changes (just this past week) to improve Bowtie2's accuracy. We'll keep at it.

**lh3** · 11-07-2011, 02:55 PM

Yes, it is my fault to use a wrong term. Sorry for the confusion. To clarify, I mean we want to achieve low false positive rate (this should be right).

Bowtie2 is definitely a substantial improvement over bowtie1 in almost every aspect, and I can really see the encouraging improvement in terms of low FPR between beta2 and beta3, all in the right direction. When you also focus your development on low FPR, probably you will gain further improvement. This will be good for everyone.

**rskr** · 11-07-2011, 04:50 PM

So an algorithm that has a high sensitivity is likely to have a low specificity? I don't think these terms mean much outside of a hospital type test. What we want is accuracy.

**adaptivegenome** · 11-07-2011, 06:29 PM

This is not at all what Heng, Steve, or I suggested.

**lh3** · 11-07-2011, 07:09 PM

For a single mapper, it is true that the more it maps, the higher FPR it has. But when you compare two mappers, it is possible for one mapper to both map more reads and have lower FPR. Then that is the better one.

**jkbonfield** · 11-08-2011, 01:39 AM

I don't particularly wish to get drawn into a mapper war, and I'll say here that I haven't benchmarked these tools to compare. However thinking more downstream I think averaged sensitivity and specificity metrics aren't sufficient to show the whole story.

I agree with Heng that quality of the mapping score is very important for some forms of analysis. Furthermore I'd go to say the variance of depth is important too. Eg imagine we have two aligners that can map 95% of data and 90% of data each. The one mapping 95% maps well to 95% of the genome and atrociously to 5% of the genome, while the one mapping 90% maps across the entire genome in a relatively uniform manner - I think most would feel happier with the 90% sensitivity aligner.

So say we have 100mers of a simulated genome with X% of SNPs. We can algorithmically produce 100x depth by starting a new 100mer on every position in the genome, and then give them appropriate real looking quality profiles with error rates from real data, etc. (So as real as can be, but perfectly uniform distribution with known mapping locations.)

Then we can plot the depth distribution. How many sites are there were a particular combination of SNPs or errors has caused a dip in coverage? Given we're almost always looking for very specific locations, often around discrepancies, this is perhaps a key metric in analysis.

**rskr** · 11-08-2011, 06:57 AM

Originally posted by jkbonfield View Post

I think most would feel happier with the 90% sensitivity aligner.

Sensitivity in this context is a liability, since a high sensitivity aligner is likely to produce many erroneous alignments and base calls, which will be on the order of thousands or millions, for which there is no subsequent higher cost procedure to resolve, and manual curation is prohibitive. Furthermore they are likely to be both precise and biased, so given several identical reads it will make the same errors in the same way. Using a sensitive aligner for scaffolding for example would be a very large problem.

**lh3** · 11-08-2011, 07:33 AM

Originally posted by jkbonfield View Post

I don't particularly wish to get drawn into a mapper war, and I'll say here that I haven't benchmarked these tools to compare. However thinking more downstream I think averaged sensitivity and specificity metrics aren't sufficient to show the whole story.

Knowing the average FNR/FPR is not enough. This is where the ROC curve shows its power. It gives the full spectrum of the accuracy.

Originally posted by jkbonfield View Post

So say we have 100mers of a simulated genome with X% of SNPs. We can algorithmically produce 100x depth by starting a new 100mer on every position in the genome, and then give them appropriate real looking quality profiles with error rates from real data, etc. (So as real as can be, but perfectly uniform distribution with known mapping locations.)

Then we can plot the depth distribution. How many sites are there were a particular combination of SNPs or errors has caused a dip in coverage? Given we're almost always looking for very specific locations, often around discrepancies, this is perhaps a key metric in analysis.

But you are right that even the ROC for mappers is not informative enough for real applications. Gerton shares with your view, too. What is more informative is to know how the mapper reacts to variants, especially clustered variants or in semi-repetitive regions. The ROC for variants should be more indicative. It is just more difficult to do such analysis because we have to simulate and map much more reads to get a good picture. Most, if not all, read simulators do not get the SNP distribution right, either.

**nickloman** · 11-08-2011, 07:47 AM

Originally posted by lh3 View Post

Knowing the average FNR/FPR is not enough. This is where the ROC curve shows its power. It gives the full spectrum of the accuracy.

Heng - I like the look of the ROC curve, but I cannot work out exactly how it is derived from reading your web page. For example I don't understand why some mappers have many data points, but Bowtie, Soap2 and Gsnap have only one. Could you give a brief explanation how you get from the (single file?) SAM output of a specific aligner to the plot?

Sorry if this is a dumb question!

**brentp** · 11-08-2011, 01:53 PM

nickloman, I believe the thing that's changing in the figures for the other mappers is the mapping quality. GSNAP, bowtie and (apparently) soap2 do not calculate the mapping quality so there is nothing to vary to get a line.

**nickloman** · 11-08-2011, 03:29 PM

Hi Brent - that would make sense - varying minimum mapping quality thresholds and seeing the result. It would be nice if those values were also plotted on the graph somehow.

**cjp** · 11-09-2011, 03:04 AM

@nickloman

The output of the wgsim_eval.pl program looks a bit like the data below - bowtie 1 always gives a mapping score of 255 (column1). I'm guessing that bowtie 2 has many FP's at a mapping score of 1 (column3 if column1 == 1), but cumulatively finds more TP's with all mapping scores (column2 if column1 == 1). But I was also wondering the exact meaning from the output of the wgsim_eval.pl script.

% tail *.roc
==> bowtie2.roc <==
14 172922 11
13 172925 12
12 177943 27
11 177945 28
10 179990 37
9 179995 40
4 180250 40
3 187273 578
2 187324 580
1 199331 5877

==> bowtie.roc <==
255 86206 1740

==> bwa.roc <==
10 192354 72
9 192560 107
8 192595 107
7 192628 110
6 192652 115
5 192669 116
4 192681 117
3 192731 117
2 192741 118
1 192762 119

Chris

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Yesterday, 11:49 AM	0 responses 15 views 0 likes	Last Post by seqadmin Yesterday, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News