SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Comparing output from Bowtie and BWA maasha Bioinformatics 15 10-25-2012 05:56 AM
Velvet Assembler: expected coverage versus estimated coverage versus effective covera DMCH Bioinformatics 1 11-30-2011 04:21 AM
Bowtie vs BWA sarbashis Illumina/Solexa 12 08-26-2011 04:46 AM
Bowtie vs BWA: only 50% overlapping SNPs a11msp Bioinformatics 4 10-14-2010 03:22 AM
bowtie vs bwa + samtools == confusion lletourn Bioinformatics 10 06-11-2010 04:06 AM

Reply
 
Thread Tools
Old 11-07-2011, 01:55 PM   #21
lh3
Senior Member
 
Location: Boston

Join Date: Feb 2008
Posts: 693
Default

Yes, it is my fault to use a wrong term. Sorry for the confusion. To clarify, I mean we want to achieve low false positive rate (this should be right).

Bowtie2 is definitely a substantial improvement over bowtie1 in almost every aspect, and I can really see the encouraging improvement in terms of low FPR between beta2 and beta3, all in the right direction. When you also focus your development on low FPR, probably you will gain further improvement. This will be good for everyone.
lh3 is offline   Reply With Quote
Old 11-07-2011, 03:50 PM   #22
rskr
Senior Member
 
Location: Santa Fe, NM

Join Date: Oct 2010
Posts: 250
Default

So an algorithm that has a high sensitivity is likely to have a low specificity? I don't think these terms mean much outside of a hospital type test. What we want is accuracy.
rskr is offline   Reply With Quote
Old 11-07-2011, 05:29 PM   #23
adaptivegenome
Super Moderator
 
Location: US

Join Date: Nov 2009
Posts: 437
Default

This is not at all what Heng, Steve, or I suggested.
adaptivegenome is offline   Reply With Quote
Old 11-07-2011, 06:09 PM   #24
lh3
Senior Member
 
Location: Boston

Join Date: Feb 2008
Posts: 693
Default

For a single mapper, it is true that the more it maps, the higher FPR it has. But when you compare two mappers, it is possible for one mapper to both map more reads and have lower FPR. Then that is the better one.
lh3 is offline   Reply With Quote
Old 11-08-2011, 12:39 AM   #25
jkbonfield
Senior Member
 
Location: Cambridge, UK

Join Date: Jul 2008
Posts: 146
Default

I don't particularly wish to get drawn into a mapper war, and I'll say here that I haven't benchmarked these tools to compare. However thinking more downstream I think averaged sensitivity and specificity metrics aren't sufficient to show the whole story.

I agree with Heng that quality of the mapping score is very important for some forms of analysis. Furthermore I'd go to say the variance of depth is important too. Eg imagine we have two aligners that can map 95% of data and 90% of data each. The one mapping 95% maps well to 95% of the genome and atrociously to 5% of the genome, while the one mapping 90% maps across the entire genome in a relatively uniform manner - I think most would feel happier with the 90% sensitivity aligner.

So say we have 100mers of a simulated genome with X% of SNPs. We can algorithmically produce 100x depth by starting a new 100mer on every position in the genome, and then give them appropriate real looking quality profiles with error rates from real data, etc. (So as real as can be, but perfectly uniform distribution with known mapping locations.)

Then we can plot the depth distribution. How many sites are there were a particular combination of SNPs or errors has caused a dip in coverage? Given we're almost always looking for very specific locations, often around discrepancies, this is perhaps a key metric in analysis.
jkbonfield is offline   Reply With Quote
Old 11-08-2011, 05:57 AM   #26
rskr
Senior Member
 
Location: Santa Fe, NM

Join Date: Oct 2010
Posts: 250
Default

Quote:
Originally Posted by jkbonfield View Post
I think most would feel happier with the 90% sensitivity aligner.
Sensitivity in this context is a liability, since a high sensitivity aligner is likely to produce many erroneous alignments and base calls, which will be on the order of thousands or millions, for which there is no subsequent higher cost procedure to resolve, and manual curation is prohibitive. Furthermore they are likely to be both precise and biased, so given several identical reads it will make the same errors in the same way. Using a sensitive aligner for scaffolding for example would be a very large problem.
rskr is offline   Reply With Quote
Old 11-08-2011, 06:33 AM   #27
lh3
Senior Member
 
Location: Boston

Join Date: Feb 2008
Posts: 693
Default

Quote:
Originally Posted by jkbonfield View Post
I don't particularly wish to get drawn into a mapper war, and I'll say here that I haven't benchmarked these tools to compare. However thinking more downstream I think averaged sensitivity and specificity metrics aren't sufficient to show the whole story.
Knowing the average FNR/FPR is not enough. This is where the ROC curve shows its power. It gives the full spectrum of the accuracy.

Quote:
Originally Posted by jkbonfield View Post
So say we have 100mers of a simulated genome with X% of SNPs. We can algorithmically produce 100x depth by starting a new 100mer on every position in the genome, and then give them appropriate real looking quality profiles with error rates from real data, etc. (So as real as can be, but perfectly uniform distribution with known mapping locations.)

Then we can plot the depth distribution. How many sites are there were a particular combination of SNPs or errors has caused a dip in coverage? Given we're almost always looking for very specific locations, often around discrepancies, this is perhaps a key metric in analysis.
But you are right that even the ROC for mappers is not informative enough for real applications. Gerton shares with your view, too. What is more informative is to know how the mapper reacts to variants, especially clustered variants or in semi-repetitive regions. The ROC for variants should be more indicative. It is just more difficult to do such analysis because we have to simulate and map much more reads to get a good picture. Most, if not all, read simulators do not get the SNP distribution right, either.
lh3 is offline   Reply With Quote
Old 11-08-2011, 06:47 AM   #28
nickloman
Senior Member
 
Location: Birmingham, UK

Join Date: Jul 2009
Posts: 356
Default

Quote:
Originally Posted by lh3 View Post
Knowing the average FNR/FPR is not enough. This is where the ROC curve shows its power. It gives the full spectrum of the accuracy.
Heng - I like the look of the ROC curve, but I cannot work out exactly how it is derived from reading your web page. For example I don't understand why some mappers have many data points, but Bowtie, Soap2 and Gsnap have only one. Could you give a brief explanation how you get from the (single file?) SAM output of a specific aligner to the plot?

Sorry if this is a dumb question!
nickloman is offline   Reply With Quote
Old 11-08-2011, 12:53 PM   #29
brentp
Member
 
Location: denver, co

Join Date: Apr 2010
Posts: 72
Default

nickloman, I believe the thing that's changing in the figures for the other mappers is the mapping quality. GSNAP, bowtie and (apparently) soap2 do not calculate the mapping quality so there is nothing to vary to get a line.
brentp is offline   Reply With Quote
Old 11-08-2011, 02:29 PM   #30
nickloman
Senior Member
 
Location: Birmingham, UK

Join Date: Jul 2009
Posts: 356
Default

Hi Brent - that would make sense - varying minimum mapping quality thresholds and seeing the result. It would be nice if those values were also plotted on the graph somehow.
nickloman is offline   Reply With Quote
Old 11-09-2011, 02:04 AM   #31
cjp
Member
 
Location: Cambridge, United Kingdom

Join Date: Jun 2011
Posts: 58
Default

@nickloman

The output of the wgsim_eval.pl program looks a bit like the data below - bowtie 1 always gives a mapping score of 255 (column1). I'm guessing that bowtie 2 has many FP's at a mapping score of 1 (column3 if column1 == 1), but cumulatively finds more TP's with all mapping scores (column2 if column1 == 1). But I was also wondering the exact meaning from the output of the wgsim_eval.pl script.

% tail *.roc
==> bowtie2.roc <==
14 172922 11
13 172925 12
12 177943 27
11 177945 28
10 179990 37
9 179995 40
4 180250 40
3 187273 578
2 187324 580
1 199331 5877

==> bowtie.roc <==
255 86206 1740

==> bwa.roc <==
10 192354 72
9 192560 107
8 192595 107
7 192628 110
6 192652 115
5 192669 116
4 192681 117
3 192731 117
2 192741 118
1 192762 119

Chris
cjp is offline   Reply With Quote
Old 11-09-2011, 04:22 AM   #32
lh3
Senior Member
 
Location: Boston

Join Date: Feb 2008
Posts: 693
Default

Each line consists of mapping quality threshold, #mapped reads with mapQ no less than the 1st column and #mismapped reads. It does not show #reads with mapQ=0. If we include mapQ=0 mappings, the sensitivity of bwa is also good for simulated data, but on single-end real data, the low-quality tail on reads makes bwa much worse. This is what Steven and Ben have observed. This is also why it is recommended to enable trimming when using bwa.

BWA always gives mapQ 0 to repetitive hits, but other mappers (gsnap, bowtie2 and novoalign) may give mapQ<=3 to repetitive hits. This is theoretically correct. I may further set a mapQ threshold 1-4 when plotting.
lh3 is offline   Reply With Quote
Old 11-09-2011, 01:52 PM   #33
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Quote:
Originally Posted by cjp View Post
@nickloman

The output of the wgsim_eval.pl program looks a bit like the data below - bowtie 1 always gives a mapping score of 255 (column1). I'm guessing that bowtie 2 has many FP's at a mapping score of 1 (column3 if column1 == 1), but cumulatively finds more TP's with all mapping scores (column2 if column1 == 1). But I was also wondering the exact meaning from the output of the wgsim_eval.pl script.

% tail *.roc
==> bowtie2.roc <==
14 172922 11
13 172925 12
12 177943 27
11 177945 28
10 179990 37
9 179995 40
4 180250 40
3 187273 578
2 187324 580
1 199331 5877

==> bowtie.roc <==
255 86206 1740

==> bwa.roc <==
10 192354 72
9 192560 107
8 192595 107
7 192628 110
6 192652 115
5 192669 116
4 192681 117
3 192731 117
2 192741 118
1 192762 119

Chris
I forked Heng's code a while back into the dwgsim project (links below). I also added user documentation:
https://github.com/nh13/DWGSIM
http://sourceforge.net/apps/mediawik...ome_Simulation
nilshomer is offline   Reply With Quote
Old 11-09-2011, 04:22 PM   #34
sparks
Senior Member
 
Location: Kuala Lumpur, Malaysia

Join Date: Mar 2008
Posts: 126
Default

Quote:
Originally Posted by rskr View Post
I disagree. If you look at hash based aligners there are certain patterns of indels, mismatches and errors, where they won't find the right result even if it is unique. For example if the word size is 15, and there are are two mismatches 10 bases apart in a 50mer, the hash won't return the region at all. Likewise for longer reads the number of mismatches is likely to be higher and the Suffix Array search will terminate before finding the ideal match.
That's a ridiculous statement! Most hashed aligners using a 15-mer hash would need 3 equally spaced mismatches in a 50mer to miss an alignment. But there are some hashed based aligners that are can find a 50-mer alignment with 5 or 6 mismatches even if equally spaced. Novoalign can do this.
sparks is offline   Reply With Quote
Old 11-09-2011, 05:03 PM   #35
Heisman
Senior Member
 
Location: St. Louis

Join Date: Dec 2010
Posts: 535
Default

Looking at those ROC curves, it appears to me that Novoalign is the best mapper in the specified simulation that was run with respect to sensitivity and specificity. Is this a correct interpretation?
Heisman is offline   Reply With Quote
Old 11-09-2011, 08:54 PM   #36
zee
NGS specialist
 
Location: Malaysia

Join Date: Apr 2008
Posts: 249
Default

Quote:
Originally Posted by Heisman View Post
Looking at those ROC curves, it appears to me that Novoalign is the best mapper in the specified simulation that was run with respect to sensitivity and specificity. Is this a correct interpretation?
From this comparison yes that would be the case, with smalt showing good performance as well.What would be interesting would be to do this with genericforms' suggestion of a 30x coverage genome.
zee is offline   Reply With Quote
Old 11-09-2011, 09:14 PM   #37
sparks
Senior Member
 
Location: Kuala Lumpur, Malaysia

Join Date: Mar 2008
Posts: 126
Default

Quote:
Originally Posted by Heisman View Post
Looking at those ROC curves, it appears to me that Novoalign is the best mapper in the specified simulation that was run with respect to sensitivity and specificity. Is this a correct interpretation?
For sure, we believe it's still the best

Colin

PS. Some mapper comparisons have shown different results but this can be the result of targeting the simulation at the mapper the developer is promoting. One recent example used high simulated indel rates and then didn't adjust the other mappers gap open & extend penalties to suit. Their mapper, with default low gap penalties, came out as the clear winner.
It can be a problem to optimise the parameters for every mapper when doing these comparisons and the tendency is to use defaults which is probably reasonable.

I think Heng Li is doing an honest and unbiased comparison at mutation rates you'd expect in a resequencing project.
sparks is offline   Reply With Quote
Old 11-18-2011, 12:27 PM   #38
twu
Developer of GMAP and GSNAP
 
Location: South San Francisco, CA

Join Date: Oct 2011
Posts: 17
Default

Heng, thanks for making the ROC plots available. I think they're pretty interesting.

About GSNAP having only a single point on the plots: actually, GSNAP does calculate mapping quality, but my understanding was that the quality should translate into the probability that the given read alignment is the correct one. So GSNAP does a Bayesian normalization of all of the raw mapping qualities, and then reports the normalized mapping quality. This tends to produce a dichotomous result, where if one alignment is much better than the others, it gets a very high mapping quality of 40 or so, but if there are two or more roughly similar multimappers, they all get a mapping quality of 3 or so (where 3 corresponds to the log probability of 0.5). Perhaps I have the wrong understanding of mapping quality here (and maybe someone can correct me), but I am told that GATK has a similar expectation.

To get around this, I have added a new field in the SAM output of GSNAP called XQ, which gives the non-normalized mapping quality, although it is still scaled so the best alignment gets a mapping quality of 40. (There are certain reasons why the scaling is important, mostly having to do with GSTRUCT needing to know this information.)

Regarding the comment someone made about multimappers, GSNAP is designed to report all multimappers, but I think it has a different meaning than other programs of what a multimapper is. Some programs expect to give the program a specific search parameter, like 5 mismatches or less, and then expect to get all mappings that satisfy that parameter. On the other hand, GSNAP interprets the search parameter to be the maximum perimeter of the search space, so has a hard limitation of looking for 6 mismatches or more. However, if GSNAP finds an exact match, it will also place a soft limitation at that point and just report all multimappers that are also exact matches, and not report the 1-mismatch through 5-mismatch answers. The exception is that if GSNAP is given a value for suboptimal mismatches, and then it will proceed to search at that many mismatches past its optimal finding. For example, if GSNAP cannot find an exact match, but finds a 1-mismatch, and is given a suboptimal-mismatch value of 1, then it will report all 1-mismatch and 2-mismatch alignments, but will still not go past that to report 3-mismatch through 5-mismatch answers.

Last edited by twu; 11-18-2011 at 01:41 PM.
twu is offline   Reply With Quote
Old 11-19-2011, 01:10 AM   #39
kopi-o
Senior Member
 
Location: Stockholm, Sweden

Join Date: Feb 2008
Posts: 319
Default

Heng, in cases where the distribution of positive vs negative examples is very skewed (such as variant calling), the ROC curve can also be misleading. The ROC curve typically only looks at positive examples (false positive rate vs true positive rate), but one should also look at the corresponding ROC curve for negative examples (TNR vs FNR), or, look at precision-recall curves.
kopi-o is offline   Reply With Quote
Old 11-19-2011, 06:02 AM   #40
lh3
Senior Member
 
Location: Boston

Join Date: Feb 2008
Posts: 693
Default

@kopi-o As Steve has argued, the experiment I am doing now is not a classical binary classification. There are only "wrongly mapped", but actually no "false positives" in the strict sense.

I guess what you mean is to look at reads generated from regions with simulated polymorphisms. This is also Gerton insists doing and I agree it is a good thing to do. The current ROC does not tell you if a mapper uniformly misses a hit or consistently misses a hit in some regions. Similarly, the ROC does not tell you whether a mapper tends to produce consistent errors or random errors. All these matter in variant calling.

If you are interested in variant calling, the right evaluation is to plot the ROC for variant calls. This is a standard binary classification and is more telling.
lh3 is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:53 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO