SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   Literature Watch (http://seqanswers.com/forums/forumdisplay.php?f=10)
-   -   Stampy: A statistical algorithm for sensitive and fast mapping of Illumina sequence r (http://seqanswers.com/forums/showthread.php?t=7579)

lh3 10-29-2010 07:51 AM

Stampy: A statistical algorithm for sensitive and fast mapping of Illumina sequence r
 
Abstract

High-volume sequencing of DNA and RNA is now within reach of any research laboratory and is quickly becoming established as a key research tool. In many workflows, each of the short sequences (“reads”) resulting from a sequencing run are first “mapped” (aligned) to a reference sequence to infer the read from which the genomic location derived, a challenging task because of the high data volumes and often large genomes. Existing read mapping software excel in either speed (e.g., BWA, Bowtie, ELAND) or sensitivity (e.g., Novoalign), but not in both. In addition, performance often deteriorates in the presence of sequence variation, particularly so for short insertions and deletions (indels). Here, we present a read mapper, Stampy, which uses a hybrid mapping algorithm and a detailed statistical model to achieve both speed and sensitivity, particularly when reads include sequence variation. This results in a higher useable sequence yield and improved accuracy compared to that of existing software.

http://www.ncbi.nlm.nih.gov/pubmed/20980556

sparks 11-04-2010 12:28 AM

It looks like an interesting aligner and it showed very good performance at indels. I think this is largely due to it's low gap extend penalty, -40 for open, -3 to extend 1 base. Most other aligners would have a higher gap extension penalty relative to gap open and mismatch penalties.
Quote:

Originally Posted by lh3 (Post 28143)
Abstract

High-volume sequencing of DNA and RNA is now within reach of any research laboratory and is quickly becoming established as a key research tool. In many workflows, each of the short sequences (“reads”) resulting from a sequencing run are first “mapped” (aligned) to a reference sequence to infer the read from which the genomic location derived, a challenging task because of the high data volumes and often large genomes. Existing read mapping software excel in either speed (e.g., BWA, Bowtie, ELAND) or sensitivity (e.g., Novoalign), but not in both. In addition, performance often deteriorates in the presence of sequence variation, particularly so for short insertions and deletions (indels). Here, we present a read mapper, Stampy, which uses a hybrid mapping algorithm and a detailed statistical model to achieve both speed and sensitivity, particularly when reads include sequence variation. This results in a higher useable sequence yield and improved accuracy compared to that of existing software.

http://www.ncbi.nlm.nih.gov/pubmed/20980556


lh3 11-04-2010 07:46 AM

You are right that we should apply small gap extension penalty. In addition, I believe stampy is the major competitor to novoalign. :)

sparks 11-08-2010 07:51 PM

Agree, looking at 1000 Genomes indels stats a gap extension penalty of 3 looks about correct. Novoalign still showed better sensitivity on SNPs and small indels and lowering gap extend penalty should fix the longer indels but Stampy is getting close.


All times are GMT -8. The time now is 02:46 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.