Chanced upon this
The fusion of GenomeMapper and QPALMA, called PALMapper, is designed to accurately align NGS sequencing reads against genomes.
The benefits of Palmapper include:
1. Alignments with mismatches and indels.
2. Accurate spliced alignments using computational splice site predictions, if available.
3. Fast alignments (about 10 million reads/hour).
The source code of Palmapper is available from ftp://ftp.tuebingen.mpg.de/pub/fml/r...ware/palmapper.
Summary:
Genome and transcriptome sequencing experience a challenging renewal with the advent of Next Generation Sequencing (NGS) technologies. Notably, short mRNA sequences produced by RNA-Seq enhance transcriptome analysis and promise great opportunities for the discovery of new genes and the identification of alternative transcripts. One way to analyze this data is aligning the reads against a reference genome. However, the sheer amount of NGS data requires highly efficient methods for accurate spliced alignments, which is further challenged by the size and quality of the sequence reads.
We propose a combination of the spliced alignment method QPALMA [1] with the short read alignment tool GenomeMapper [2]. The resulting method, called PALMapper, efficiently computes both spliced and unspliced alignments at high accuracy while taking advantage of base quality information and splice site predictions. QPALMA that relies on a machine learning strategy is highly sensitive but suffers from its time consumption in the alignment step, which can be impractical for large genomes or extremely large introns. To speed this up and thus to improve efficiency, we combined it with GenomeMapper that quickly carries out an initial read mapping which will then guide a banded Smith-Waterman-like algorithm that allows for long gaps that correspond to introns. PALMapper considerably reduced time consumption without decreasing accuracy compared to QPALMA. In fact, it runs around 50 times faster and hence allows to align around 7 million reads per hour on a single AMD CPU core (similar speed as TopHat [3]). Our study for C. elegans furthermore shows that PALMapper predicts introns with very high sensitivity (72%) and specificity (82%) when using the annotation as ground truth. PALMapper is considerably more accurate than TopHat (47% and 81%, respectively).
The fusion of GenomeMapper and QPALMA, called PALMapper, is designed to accurately align NGS sequencing reads against genomes.
The benefits of Palmapper include:
1. Alignments with mismatches and indels.
2. Accurate spliced alignments using computational splice site predictions, if available.
3. Fast alignments (about 10 million reads/hour).
The source code of Palmapper is available from ftp://ftp.tuebingen.mpg.de/pub/fml/r...ware/palmapper.
Summary:
Genome and transcriptome sequencing experience a challenging renewal with the advent of Next Generation Sequencing (NGS) technologies. Notably, short mRNA sequences produced by RNA-Seq enhance transcriptome analysis and promise great opportunities for the discovery of new genes and the identification of alternative transcripts. One way to analyze this data is aligning the reads against a reference genome. However, the sheer amount of NGS data requires highly efficient methods for accurate spliced alignments, which is further challenged by the size and quality of the sequence reads.
We propose a combination of the spliced alignment method QPALMA [1] with the short read alignment tool GenomeMapper [2]. The resulting method, called PALMapper, efficiently computes both spliced and unspliced alignments at high accuracy while taking advantage of base quality information and splice site predictions. QPALMA that relies on a machine learning strategy is highly sensitive but suffers from its time consumption in the alignment step, which can be impractical for large genomes or extremely large introns. To speed this up and thus to improve efficiency, we combined it with GenomeMapper that quickly carries out an initial read mapping which will then guide a banded Smith-Waterman-like algorithm that allows for long gaps that correspond to introns. PALMapper considerably reduced time consumption without decreasing accuracy compared to QPALMA. In fact, it runs around 50 times faster and hence allows to align around 7 million reads per hour on a single AMD CPU core (similar speed as TopHat [3]). Our study for C. elegans furthermore shows that PALMapper predicts introns with very high sensitivity (72%) and specificity (82%) when using the annotation as ground truth. PALMapper is considerably more accurate than TopHat (47% and 81%, respectively).
Comment