Seqanswers Leaderboard Ad

**ondovb** · 05-13-2010, 09:12 AM

Most aligners offer settings that let you trade off between sensitivity and speed. To compare them, you'll need to control one variable as much as possible by tuning them so they either take approximately the same time or align approximately the same number of reads. You should try a few levels of each, since different aligners may be do better in different areas.

Specificity is also important, since aligning lots of reads quickly doesn't matter if they're not the optimal alignments. This complicates benchmarks even more, since sometimes you can change settings that affect specificity as well. A 3D plot for each aligner showing the relationship between all three variables would be interesting if you have lots of free time.

**Poshi** · 05-13-2010, 09:25 AM

I count with the parameters of the aligners. I'm trying to get the maximum number of correct results, regardless of time spent (up to some limit). But tweaking all the aligners is not very easy without knowing them in advance. Each one seems to have its tricks.

About the specificity problem... I'm not really sure how to ensure it. How can I know that the alignment is optimal? In the case of perfect reads, without mismatches, I can compare the position of the read with the position predicted by the software, but one we start to introduce variations we can end up having a read that matches better a different area than the original. And deciding which is the optimal alignment is not easy. If it was... all the problems of read alignment would be gone.
By now, I'm relying in the fact that if the aligner returns some alignment, this is a correct alignment.

**nilshomer** · 05-13-2010, 10:07 AM

I would read http://lh3lh3.users.sourceforge.net/false-bench.shtml,
http://lh3lh3.users.sourceforge.net/bioinfo.shtml, and http://www.nilshomer.com/index.php?title=NGS_Alignment before continuing. The first two are written by Heng Li (MAQ/BWA) and the latter is written by myself (BFAST/SRMA). Also, check out the original papers for each aligner as I am sure they performed alignment comparisons (how did they do it)? For a discussion of possible mapping errors, the supplementary materials in the MAQ is the best.

**ondovb** · 05-13-2010, 12:19 PM

Originally posted by Poshi View Post

I'm trying to get the maximum number of correct results, regardless of time spent (up to some limit).

I think "up to some limit" will be key here. I'm actually not familiar the settings for BFAST or Bowtie, but aligners I've used could theoretically have 100% sensitivity regardless of mismatches...it just might take years to run (and/or use way too much RAM). The developers chose default settings that would result in reasonable run times and RAM usage, but their definition of "reasonable" may vary from others and yours. You'll have to get to know the settings enough to choose your own reasonable limits.

Originally posted by Poshi View Post

In the case of perfect reads, without mismatches, I can compare the position of the read with the position predicted by the software, but one we start to introduce variations we can end up having a read that matches better a different area than the original.

I would say that the "correct" alignment is the optimal match, regardless of where the read actually came from, since that's what you would want to find if you didn't know the actual origin. Of course, the definition of "optimal" varies from one tool to another also (there is usually some kind of p-value). Ideally you would do your own assessment that accounts for everything you need (paired-end optimization, quality scores, etc.), possibly based on one of the aligners' methods.

As you can tell from nilshomer's links, this gets very complicated, but it's necessary to account for these things. You may want to at least restrict your tests to conditions closest to what your real data will look like.

**nilshomer** · 05-13-2010, 02:02 PM

Acknowledging my own self-promotion, I would also read this paper: http://dx.doi.org/10.1093/bib/bbq015

**Poshi** · 05-14-2010, 08:19 AM

Thanks a lot for all the advice and the links provided. I have some things to read :-)
I readed the papers of the different aligners, but I was not convinced by the results. Most of them claimed that their aligner was the best but... all of them cannot be "the best" at the same time! This was the reason that pushed me to do my own testing.

I will follow the advice given in the links provided, although I think that not all the comments are appropriate or can be done (like the one that is advising me to run a long input instead a short input: because I'm only assessing the quality of the results and not its speed or scalability, this has no importance).

When I talk about "time up to some limit" I'm thinking in a reasonable time for the test, say a couple of days. And when I'm talking about memory I will use the total amount of memory available in our machines. When I perform a time benchmarck, the limits will be different.

What looks quite clear is that I should check the quality of the alignments by checking the read against the aligned reference. But it is supposed that an alignment is correct so... I will have to define what I consider "optimal".

Thanks a lot for your comments. I'll try to improve my tests and maybe I will switch to a real data set to have a different scenario.

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Yesterday, 11:49 AM	0 responses 15 views 0 likes	Last Post by seqadmin Yesterday, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Trying to test the aligners

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News