Software/list

From SEQwiki
Jump to: navigation, search

Below is (one of many possible) dynamic tables of software data, created from pages in the wiki. To add a package to the list, use the following form:


CSV

JSON


Name Summary Bio Tags Meth Tags Features Language Licence OS
.NET BIO ".NET Bio is an open source library of common bioinformatics functions, intended to simplify the creation of life science applications. The core library implements a range of file parsers and formatters for common file types, connectors to commonly-used web services such as NCBI BLAST, and standard algorithms for the comparison and assembly of DNA, RNA and protein sequences. Sample tools and code snippets are also included." Sequence analysis Programming Library C# Windows
linux
4peaks Allows viewing sequencing trace files, motif searching trimming, BLAST and exporting sequences. Sequencing Sequence analysis Freeware Mac OS X
A5 A5 is an integrative pipeline for genome assembly that automates sequence data cleaning, error correction, assembly, and quality control by chaining a number of programs together with additional custom algorithms. De-novo assembly Assembly GPLv3 Linux
Mac OS X
AB Large Indel Tool Identifies deviations in clone insert size that indicate intra-chromosomal structural variations compared to a reference genome. InDel discovery
Sequencing
Mapping Perl GPL Linux 64
AB Small Indel Tool The SOLiD™ Small Indel Tool processes the indel evidences found in the pairing step of the SOLiD™ System Analysis Pipeline Tool (Corona Lite). InDel discovery
Sequencing
Mapping
Alignment
Perl
C++
GPL Linux 64
ABBA Assembly Boosted By Amino acid sequence is a comparative gene assembler, which uses amino acid sequences from predicted proteins to help build a better assembly Genomic Assembly Assembly
Scaffolding
Artistic License Linux
ABMapper Maps RNA-Seq reads to target genome considering possible multiple mapping locations and splice junctions Genomics
Transcriptomics
Mapping
Alignment
C++
Perl
GPLv3 Linux
ABySS ABySS is a de novo sequence assembler designed for short reads and large genomes. De-novo assembly Assembly
De Bruijn graph
MPI
OpenMP
C++ Commercial
Freeware
POSIX
Linux
Mac OS X
Adapter Removal (software) Removes adaptor fragments from raw short read sequence data and outputs data to FASTA format. General bioinformatics (pipeline) Adapter Removal (software) Trimming Java Custom Licence Linux 64
Windows
Mac OS X
ADTEx Aberration Detection in Tumour Exome (ADTEx) is a tool for copy number variation (CNV) detection for whole-exome data from paired tumour/matched normal samples. Copy number estimation
Next Generation Sequencing
Cancer biology
Exome analysis
Hidden Markov Model
Expectation Maximization
Copy number analysis Python
R
GPL v3 GNU/Linux
AGE AGE is a tool that implements an algorithm for optimal alignment of sequences with SVs. Structural variation Alignment
Gap extension
Creative Commons license (Attribution-NonCommerical).
AGILE A hash table based high throughput sequence mapping algorithm for longer 4A54 reads that uses diagonal multiple seed-match criteria, customized q-gram filtering and a dynamic incremental search approach among other heuristics to optimize every step of the mapping process Mapping C
Agp2amos missing Format conversion Windows
Linux
Alcovna ALgorithms for COmparing and Visualizing Non Assembled data SNP discovery Java
ALEXA-Seq Alternative Expression Analysis by massively parallel RNA sequencing RNA-Seq Quantitation
Alternative Splicing
Perl GPLv3
ALLPATHS De novo assembly of whole-genome shotgun microreads. De-novo assembly Assembly
De Bruijn graph
Alta-Cyclic Alta-Cyclic is a Illumina Genome-Analyzer (Solexa) base caller. Basecaller
AMOS AMOS is a Modular, Open-Source whole genome assembler. Assembly
Assembly validation
Assembly visualization
Format conversion
Integrated Solution
C
Perl
Linux
ANCHOR Post-processing tools for de novo assemblies Assembly
Assembly QC
C++
Python
BCCA (academic use) Linux
Anno-J Annotation Browsing 2.0 Visualization Creative Commons - Attribution-NonCommercial-ShareAlike
ANNOVAR ANNOVAR: Functional annotation of genetic variants from high-throughput sequencing data Genomics
Genetics
Annotation
Variant Prioritization
Gene-based annotation
region-based annotation
filter-based annotation
Perl Commercial
Freeware
Linux
Windows
Mac OS X
Arachne ARACHNE is a program for assembling data from whole genome shotgun sequencing experiments. Assembly
OLC
AREM AREM: Aligning Short Reads from ChIP-Sequencing by Expectation Maximization ChIP-Seq Peak calling
Mapping
Python Linux
Arf arf is a genetic analysis program for sequencing data.
Array Suite (Array Studio/Server) Array Studio is a complete analysis and visualization package for NextGen sequencing data, as well as other -OMIC data types. Array Server is a backend enterprise server for storage and analysis of -OMIC and NextGen sequencing data. Genomics
SNP discovery
InDel discovery
Mapping
Expression profiling
Data Visualisation
Variant annotation and analysis
coverage analysis
Mapping
C# Commercial Windows
ArrayExpressHTS R-based pipeline for RNA-Seq data analysis. RNA-Seq
RNA-Seq Quantitation
R
ArrayStar ArrayStar is an easy-to-use gene expression analysis software package that offers powerful visualization and statistical tools to help you analyze your microarray data. Gene Expression Analysis Differentially expressed gene identification
Gene ontology analysis
Sequence variation analysis
Statistics
Commercial Windows
Mac OS X 10.6 with Parallels Desktop
ASC Empirical Bayes method to detect differential expression. RNA-Seq Quantitation Empirical Bayes
ATAC ATAC is a computational process for comparative mapping between two genome assemblies, or between two different genomes. Assembly validation
Alignment
Linux
Atlas Suite Atlas is a suite of variant analysis tools specializing in the separation of true SNPs and insertions and deletions (indels) from sequencing and mapping errors in Whole Exome Capture Sequecing (WECS) data. SNPs may be called using the Atlas-SNP2 application and indels may be called using the Atlas-Indel2 application. SNP discovery
InDel discovery
Variant Calling Ruby
C
BSD POSIX
Atlas-SNP2 Atlas-SNP2 is a SNP discovery tool developed for next generation sequencing platforms SNP discovery Ruby Freeware UNIX
Avadis NGS Strand NGS formerly Avadis NGS is a desktop software platform for alignment, analysis, visualization, and management of data generated by next-generation sequencing (NGS) platforms. It supports workflows for RNA-Seq, DNA-Seq, small RNA-Seq, ChIP-Seq, and Methyl-Seq data analysis. Strand NGS is designed with the biologist in mind. ChIP-Seq
DNA-Seq
RNA-Seq
Small RNA
Methyl-Seq
Pathway analysis
Alignment
Quality Control
Sequence analysis
Visualization
Biological Contextualization
Biological interpretation
Downstream analysis
Rich Visualization
Identify effects of SNPs on transcripts
Identify Structural Variants from Paired Reads (Insertions
Deletions
Translocations
Inversions)
Identify binding site peaks in ChIP-Seq data
Identify motifs around binding sites
Determine gene expression levels and identify differentially expressed genes De-convolve transcript expression levels and identify differential splice variants
Identify Novel Exons
Identify Novel Splice Junctions
Identify Fusion Genes Perform QC on Reads
determine on-and off-target reads
and filter anomalous reads
Determine Enriched GO Terms
Determine Significant Pathways
Java
R
Commercial Windows
Linux
Mac OS X
Baa.pl use transcripts to assess a de novo assembly Genomic Assembly Evaluation
Genomic Assembly Validation
Alignment Analysis Perl GPL any
Bambino Variant detector and graphical alignment viewer for SAM/BAM format data. SNP discovery
Somatic mutations
Java
Bambus Bambus is a general purpose scaffolder Scaffolding
BAMseek BAMseek is a large file viewer for BAM and SAM alignment files. Genomics
Transcriptomics
Alignment viewer Java GPLv3 Cross-Platform
BamTools BamTools provides a fast, flexible C++ API & toolkit for reading, writing, and managing BAM files. Programming Library
Alignment Analysis
C++ MIT Cross-Platform
BamView Interactive Java application for visualising the large amounts of data stored for sequence reads which are aligned against a reference genome sequence Visualization Java GPL Mac OS X
UNIX
Windows
Barcode generator Generator of sequence barcodes suitable for Illumina sequencing. Sample Barcoding Python
Barcrawl Bartab Barcrawl facilitates the design of barcoded primers, for multiplexed high-throughput sequencing. Sample Barcoding GPL
BarraCUDA Barracuda is a high-speed sequence aligner based on BWA and utilizes the latest Nvidia CUDA architecture for accelerating alignments of sequence reads generated by the next-generation sequencers. Sequence analysis Mapping
Alignment
FM-Index
GPU
Gapped and ungapped alignment
paired-end mapping
GPGPU
parallel execution
C
C++
CUDA
GPLv3
MIT
Linux
Batman Bayesian tool for methylation analysis (Batman) €”for analyzing methylated DNA immunoprecipitation (MeDIP) profiles DNA methylation Java LGPL
BayesCall Bayesian basecaller Sequencing Basecaller C++
Python
GPLv3
BayesPeak A Bayesian hidden Markov model to detect enriched locations in ChIP-seq data. ChIP-Seq Hidden Markov Model
MCMC
Multicore R GPL
BaySeq Identify differential expressed genes RNA-Seq Quantitation Differentially expressed gene identification R
BBMap BBMap is a fast splice-aware aligner for RNA and DNA. It is faster than almost all short-read aligners, yet retains unrivaled sensitivity and specificity, particularly for reads with many errors and indels. Resequencing
Alignment
Quality Control
RNA-Seq Alignment
Alternative Splicing
Whole Genome Resequencing
SNP discovery
Phylogenetics
Metagenomics
Read Binning
Mapping
RNA-Seq analysis
Alignment
Quality Trimming
Contaminant filtering
Multithreaded. Faster and more accurate than competing aligners. Splice-aware. Java 7 BSD Windows
*NIX
Mac OS X
all supporting JVM
BBSeq Tool for analyzing RNA-Seq data to analyze gene expression RNA-Seq Quantitation R
Bcbio-nextgen Python scripts and modules for automated next gen sequencing analysis. These provide a fully automated pipeline for taking sequencing results from an Illumina sequencer, converting them to standard Fastq format, aligning to a reference genome, doing SNP calling, and producing a summary PDF of results. General bioinformatics (pipeline) QC
Filtering
Trimming
Mapping
Peak calling
Motif detection
Differential expression
Genomic region matching
Alignment
Genotyping
Python MIT platform-independent
BEADS ChIP-Seq data normalization for Illumina ChIP-Seq Normalization
BEAP The Blast Extension and Assembly Program (BEAP) uses a short starting DNA fragment to recursively blast nucleotide databases to obtain all sequences that overlaps to construct the a "full length" sequence. Mapping
BEDTools BEDTools is an extensive suite of utilities for comparing genomic features in BED format. Genomics Mapping Feature overlaps
UNIX pipes
coverage
split-alignments
BAM support
C++ GPLv2 Linux
Mac OS X
Bedutils NGSUtils is a suite of software tools for working with next-generation sequencing datasets. Staring in 2009, we (Liu Lab @ Indiana University School of Medicine) starting working with next-generation sequencing data. We initially started doing custom coding for each project in a one-off manner. It quickly became apparent that this was an inefficient manner to work, so we started assembling smaller utilities that could be adapted into larger, more complicated, workflows. We have used them for Illumia, SOLiD and 454 sequencing data. We have used them for DNA and RNA resequcing, ChIP-Seq, CLIP-Seq, and targeted resequencing (Agilent exome capture and PCR targeting). These tools are also used heavily in our in-house DNA and RNA mapping pipelines.

These tools have of great use within our lab group, and so we are happy to make them available to the greater community.

NGSUtils is made up of 50+ programs, mainly written in Python. These are separated into modules based on the type of file that is to be analyzed. There are four modules:
Belvu An X-windows viewer for multiple sequence alignments Multiple sequence alignment viewer Linux
BFAST Blat-like Fast Accurate Search Tool. Whole Genome Resequencing Mapping
Alignment
Genome Indexing
Colorspace
parallel execution
command line
C GPL Solaris
UNIX
BFCounter BFCounter is a program for counting k-mers in DNA sequence data. K-mer analysis C++ GPL v3
BING biomedical informatics pipeline (BING) for the analysis of NGS data that offers several novel computational approaches to 1. image alignment, 2. signal correlation, compensation, separation, and pixel-based cluster registration, 3. signal measurement and base calling, 4. quality control and accuracy measurement. Basecaller
Sequencing Quality Control
BioJava "BioJava is an open-source project dedicated to providing a Java framework for processing biological data. It provides analytical and statistical routines, parsers for common file formats and allows the manipulation of sequences and 3D structures. The goal of the biojava project is to facilitate rapid application development for bioinformatics. " Genomics Programming Library Java LGPL 2.1
Bionimbus Cloud environment for analysis of microarray and second generation sequencing data. Linux
Amazon EC2
cloud
BioPerl "BioPerl, a community effort to produce Perl code which is useful in biology. " Genomics Programming Library Perl Cross-Platform
BioPHP biology tools for php. Genomics Programming Library PHP GPL 2
Biopieces The Biopieces are a collection of bioinformatics tools that can be pieced together in a very easy and flexible manner to perform both simple and complex tasks. The Biopieces work on a data stream in such a way that the data stream can be passed through several different Biopieces, each performing one specific task: modifying or adding records to the data stream, creating plots, or uploading data to databases and web services. Genomics Alignment
Quality Control
Sequence analysis
Visualization
Perl
Python
Ruby
C
GPLv2
Biopython Biopython provides a tool kit for writing bioinformatics and computational molecular biology software in Python. Sequence analysis
Phylogenetics
Population genetics
Protein structures
Sequence parsing
Command line tool wrappers
Programming Library
Various Python Biopython License (MIT/BSD style) Linux
Windows
Mac OS X
BioRuby "BioRuby comes with a comprehensive set of free development tools and libraries for bioinformatics and molecular biology, for the Ruby programming language. BioRuby has components for sequence analysis, pathway analysis, protein modelling and phylogenetic analysis; it supports many widely used data formats and provides easy access to databases, external programs and public web services, including BLAST, KEGG, GenBank, MEDLINE and GO." Genomics Programming Library Ruby Cross-Platform
BioSmalltalk BioSmalltalk provides an environment to build bioinformatics scripts and applications using the most powerful object technology as of today, the Smalltalk programming environment Sequence analysis
Phylogenetics
Population genetics
Protein structures
Sequence parsing
Command line tool wrappers
Programming Library
Various Smalltalk Linux
Windows
Mac OS X
BiQ Analyzer BiQ Analyzer is a software tool for easy visualization and quality control of DNA methylation data. With more than 2,000 downloads so far, BiQ Analyzer has become a standard tool for processing DNA methylation data from bisulfite sequencing. Epigenomics
DNA methylation
Java Windows
Linux
Mac OS X
Solaris
BiQ Analyzer HT BiQ Analyzer HT is an enhanced version of BiQ Analyzer that provides extensive support for high-throughput bisulfite sequencing. BiQ Analyzer HT facilitates the processing, quality control and initial analysis of single-basepair resolution DNA methylation data. It was developed for deep bisulfite sequencing of one or more loci using the Roche 454 platform, but it easily extends to other sequencing platforms. BiQ Analyzer HT features a biologist-friendly graphical user interface, a fast alignment algorithm and a variety of ways to visualize DNA methylation data. Epigenomics
DNA methylation
Bisulfite Sequencing
Java Windows
Linux
Mac OS X
Solaris
Bis-SNP BisSNP is a package based on the Genome Analysis Toolkit (GATK) map-reduce framework for genotyping in bisulfite treated massively parallel sequencing (Bisulfite-seq, NOMe-seq and RRBS) on Illumina platform. It uses bayesian inference with either manually specified or automatically estimated methylation probabilities of different cytosine context(not only CpG, CHH, CHG in Bisulfite-seq, but also GCH et.al. in other bisulfite treated sequencing) to determine genotypes and methylation levels simultaneously. SNP discovery
Genotyping
DNA methylation
Bisulfite Sequencing
Bisulfite SNP calling
Methylation Calling
MapReduce
Accurate SNP and methylation calling in Bisulfite-seq/NOMe-seq/RRBS Java
Perl
MIT Linux
Mac OS X
Bismark Bismark is a tool to map bisulfite treated sequencing reads and perform methylation calling in a quick and easy-to-use fashion. Epigenomics
Genomics
DNA methylation
Bisulfite mapping
Mapping
Methylation Calling
fast and convenient Bisulfite-Seq output
very flexible
Perl GPLv3 Linux
Mac OS X
Windows
Bison Bison allows users with access to a computer cluster to rapidly align whole-genome bisulfite sequencing or RRBS reads. It can align both directional and non-directional libraries and uses bowtie2. Epigenomics
Bisulfite Sequencing
DNA methylation
Bisulfite mapping
Mapping
Methylation Calling
BAM support
Bisulfite sequencing
MPI
C Unix-like
Linux
Mac OS X
BLAST ...it's BLAST. Linux
BLAST Ring Image Generator "BRIG is a cross-platform (Windows/Mac/Unix) application that can display circular comparisons between a large number of genomes, with a focus on handling genome assembly data. " Comparative genomics Visualization
Assembly visualization
Cross-Platform
BLAT Fast, accurate spliced alignment of DNA sequences Mapping
Alignment
Mapping C Freeware Linux
Mac OS X
Blixem a graphical blast viewer Sequence analysis
Phylogenetics
Homology
Alignment viewer
Multiple sequence alignment viewer
GPL Linux
BOAT Can accurately and efficiently map sequencing reads back to the reference genome. Mapping GPL
Bort Bort parses Blast output and quantifies hits by contig and read counts. RNA-Seq Quantitation Perl any
BOW BOW - Bioinformatics On Windows is essentially a windows port of BWA and SAMTOOLS
Bowtie Bowtie is an ultrafast, memory-efficient short read aligner. Mapping
Burrows-Wheeler
FM-Index
Mac OS X
Linux
Windows
BRAT accurate and efficient tool for mapping short reads obtained from the Illumina Genome Analyzer following sodium bisulfite conversion. Both single and paired ends are supported. Epigenomics
DNA methylation
Bisulfite mapping
Mapping
GPLv3
BRCA-diagnostic Computational screening test for BRCA1/2 mutants in human genomic DNA Personal genomics Perl
BreakDancer BreakDancer is an application for detecting structural rearrangements and indels in short read sequencing data Genomics
Structural variation
InDel discovery
Perl
C++
GPLv3
Breakpointer Breakpointer is a fast tool for locating sequence breakpoints from the alignment of single end reads (SE) produced by next generation sequencing (NGS). It adopts a heuristic method in searching for local mapping signatures created by insertion/deletions (indels) or more complex structural variants(SVs). With current NGS single-end sequencing data, the output regions by Breakpoint mainly contain the approximate breakpoints of indels and a limited number of large SVs. Exome and Whole genome variant detection
InDel discovery
Statistical testing C++
Perl
GPL
BreakSeq Database of known human breakpoint junctions and software to search short reads against them. Structural variation Mapping
BreakTrans BreakTrans is a computer program that maps predicted gene fusions to genomic structural rearrangements so as to validate both types of events. Post-analysis
Breakway Breakway is a suite of programs that take aligned genomic data and report structural variation breakpoints. Whole Genome Resequencing
Genomics
Structural variation
InDel discovery
Sequence analysis
Genetic variation annotation
Fast
specific
UNIX pipes
Perl GPL Linux
Mac OS X
Windows
BS Seeker Mapping tool for bisulfite treated reads Epigenomics Bisulfite mapping Python
BS-Seq The source code and data for the "Shotgun Bisulphite Sequencing of the Arabidopsis Genome Reveals DNA Methylation Patterning" Nature paper by Cokus et al. (Steve Jacobsen's lab at UCLA). POSIX. Epigenomics Bisulfite mapping
BSMAP short reads mapping software for bisulfite sequencing DNA methylation Mapping
Bisulfite mapping
Bisulfite sequencing C++ GPLv3 Linux 64
BSSim BSSim: Bisulfite sequencing simulator for next-generation sequencing. DNA methylation
Bisulfite Sequencing
Simulation BSSim can allow users to mimic various methylation level. Python GPL v3 UNIX
Linux
Mac OS X
Windows
Btrim Btrim is a fast and lightweight software to trim adapters and low quality regions in reads. Trimming Linux
BWA Fast, accurate, memory-efficient aligner for short and long sequencing reads Mapping
Read alignment
FM-Index Gapped alignment
paired-end mapping
C GPLv3
MIT
UNIX
BWA-SW Fast, accurate, memory-efficient aligner for long sequencing reads Mapping
Read alignment
FM-Index Gapped alignment
Local alignment
C GPLv3
MIT
UNIX
CABOG Celera Assembler is scientific software for DNA research. De-novo assembly Assembly Robust to homopolymer run length Linux
CANGS CANGS is a flexible and user-friendly utility to trim sequences, filter low quality sequences, and produce input files for further downstream analyses for 454 sequences. CANGS can be used to assign the taxonomic grouping based on similarity with sequences from the NCBI database Metagenomics
Phylogenetics
Primer removal
Trimming
Sequencing Quality Control
CARPET A web‐based package for the analysis of ChIP‐chip and expression tiling data ChIP-on-chip Tilling C++
CASHX Parse, map, quantify and manage large quantities of short-read sequence data. Small RNA transcriptome Mapping
CATCH A tool for exploring patterns in ChIP profiling data. ChIP-Seq
ChIP-on-chip
Clustering and alignment parallel execution
graphical browsing of results
Java Open Source
CatchAll Estimate ecological diversity with both parametric and non-parametric estimators. Population genetics
Metagenomics
CEQer CEQer (Comparative Exome Quantification analyzer) is a graphical, event-driven tool for copy number abnormalities/allelic-imbalance coupled analysis of whole-exome sequencing data. By using case-control matched exome data, CEQer performs a comparative digital exonic quantification to generate CNA data and couples this information with exome-wide LOH and allelic imbalance detection. Exome analysis Copy number estimation
Allelic imbalance
Copy number and allelic imbalance analyses from matched exomes. C# GPL v3 Windows
Mac
Linux
CexoR Strand specific peak-pair calling in ChIP-exo data ChIP-exo Peak calling
Peak-pair calling
R/Bioconductor package
can run on major computer platforms
R GPL-2 + file LICENSE Linux
Mac OS X
Windows
CGA Tools Tools for viewing, manipulating and converting data from Complete Genomics Conversion C++ Apache License 2.0 Linux
UNIX
Mac OS X
ChimeraScan Identifies chimaeric transcripts in RNA-Seq data Fusion transcripts
ChIP-Seq (application) The ChIP-Seq web server provides access to a set of useful tools performing common ChIP-Seq data analysis tasks, including positional correlation analysis, peak detection, and genome partitioning into signal-rich and signal-poor regions. It is an open system designed to allow interoperability with other resources, in particular the motif discovery programs from the Signal Search Analysis (SSA) server. ChIP-Seq Peak calling
Mapping
C
Perl
GPL Linux
Mac OS X
ChIPmeta Combining data from ChIP-seq and ChIP-chip. Transcription Factor Binding Site identification
ChIP-Seq
ChIP-on-chip
Hidden Markov Model
ChIPMunk ChIPMunk is a fast heuristic DNA motif digger based a on greedy approach accompanied by bootstrapping. ChIPMunk identifies the strong motif with the maximum Kullback Discrete Information Content in a given set of DNA sequences. *NEW URL* http://autosome.ru/ChIPMunk ChIP-Seq
Motif analysis
Motif discovery
Motif analysis
Motif discovery
efficient motif discovery for huge datasets up to tens of thousands of sequences; multi-core CPU support; usage of the ChIP-Seq base coverage peak data Java Freeware platform-independent
CHiPSeq From Science Johnson, 2007 ChIP-Seq Peak calling
ChIPseqR ChIP-seq qanalysis tool ChIP-Seq R
Chipster User-friendly NGS data analysis software with built-in genome browser and workflow functionality. Chipster includes tools for ChIP-seq, RNA-seq, miRNA-seq and MeDIP-seq analysis, and functionality for exome-seq and CGH-seq will soon be added. ChIP-Seq
RNA-Seq
MiRNA-Seq
MeDIP-Seq
QC
Filtering
Trimming
Mapping
Peak calling
Motif detection
Differential expression
Pathway analysis
Methylation analysis
Genomic region matching
Genome browser
Java
R
GPLv3 platform-independent
ChromaSig An unsupervised learning method, which finds, in an unbiased fashion, commonly occurring chromatin signatures in both tiling microarray and sequencing data. ChIP-on-chip Chromatin motif finding Perl
C
ChromHMM ChromHMM is software for learning and characterizing chromatin states. Epigenomics Hidden Markov Model
Segmentation
Java GPL 2
Circos Circos is tool for visualizing data in a circular format. It was developed for genomic data but can work for many other kinds of data as well. Comparative genomics Visualization Perl Windows
Linux
CisGenome An integrated tool for tiling array, ChIP-seq, genome and cis-regulatory element analysis ChIP-Seq
ChIP-on-chip
Motif analysis
Gene annotation retrieval
Gibbs motif sample C
C++
UNIX
Windows
Cistrome Galaxy-based web service for analysis of ChIP data ChIP-on-chip
ChIP-Seq
Python
CLCbio Genomics Workbench De novo and reference assembly SNP and small indel detection and annotation. Genomics
Whole Genome Resequencing
De-novo assembly
SNP discovery
InDel discovery
ChIP-Seq
RNA-Seq
MiRNA
Transcriptomics
Mapping
Assembly
Alignment
Colorspace
BLAST
Ab-inito gene prediction
Adapter Removal (software)
Annotation
Assembly QC
Basespace
Bisulfite SNP calling
De Bruijn graph
Heatmaps
Advanced and user-friendly analyses of genomic
transcriptomic
and epigenomic NGS data in a graphical user-interface. Wizard driven tools and a freely available developer toolkit
SIMD implementation
multi-threading
hybrid assembly
Integrated solution
Java
C++
Commercial Windows
Mac OS X
Linux
Clean reads clean_reads cleans NGS (Sanger, 454, Illumina and solid) reads. Trimming
Sequencing Quality Control
Python
CleaveLand A pipeline for using degradome data to find cleaved small RNA targets. MiRNA Perl
R
Freeware
CLEVER CLEVER is a tool to discover structural variations such as (larger) insertions and deletions in genomes from paired-end sequencing reads. Genomics
Structural variation
Copy number estimation
Structural variation discovery command line C++
Python
GPLv3 any
ClipCrop a new method and implementation named ClipCrop for detecting SVs with single-base resolution
CloudAligner Hadoop-based short read aligner Mapping
Hadoop
Java GPL cloud
CloudBurst CloudBurst is a parallel read-mapping algorithm optimized for mapping next-generation sequence data to the human genome and other reference genomes. SNP discovery
Genotyping
Personal genomics
Mapping
MapReduce
Hadoop
parallel execution
Hadoop
Academic Cloud Computing Initiative
Java
ClustDB A powerful tool for exact sequence matching Linux
Cluster Flow A command-line pipeline tool which uses common cluster managers to run bioinformatics analysis pipelines. Pipeline Management Perl
though modules can be any language
GNU GPL v3 Linux
CNANorm A normalization method for Copy Number Aberration in cancer samples. Cancer biology
Copy number estimation
Genomics
Mixture model
Peak detection
Normalization
R
Perl
GPLv2 Linux
Mac OS X
Windows
CNAseg We present a novel approach, called CNAseg, to identify CNAs from second-generation sequencing data. It uses depth of coverage to estimate copy number states and flowcell-to-flowcell variability in cancer and normal samples to control the false positive rate. Structural variation
CNAseg We present a novel approach, called CNAseg, to identify CNAs from second-generation sequencing data. It uses depth of coverage to estimate copy number states and flowcell-to-flowcell variability in cancer and normal samples to control the false positive rate. Structural variation
CNB MetaGenomics tools A number of tools and meta-tools developed at CNB/CSIC for the analysis of metagenomics data (some rely on QIIME). Metagenomics
Biodiversity
Community analysis
High-throughput sequencing
Community Analysis Bash
Perl
Python
C
EU-GPL Linux
Unix-like
POSIX
CnD Program to detect copy number variation in inbred mouse strains Copy number estimation Hidden Markov Model D GPL
CNVer CNVer is a method for CNV detection that supplements the depth-of-coverage with paired-end mapping information, where matepairs mapping discordantly to the reference serve to indicate the presence of variation. CNVer combines this information within a unified computational framework called the donor graph, allowing it to better mitigate the sequencing biases that cause uneven local coverage. CNVer can also reconstruct the absolute copy counts of segments of the donor genome, and work with low coverage datasets. Structural variation
Copy number estimation
Perl
C++
CnvHMM WashU copy number variant (CNV) detection algorithm for Illumina/Solexa data. Structural variation Linux
CNVnator CNV discovery and genotyping from read-depth analysis of personal genome sequencing Copy number estimation
Genotyping
CNVseq Copy number estimation Perl
R
CompreheNGSive compreheNGSive is an interactive visualization of the end results of the next-generation sequencing pipeline. Next Generation Sequencing Visualization Python
Qt
LGPL Mac OS X
Linux
CoNAn-SNV CoNAn-SNV is a probabilistic framework for the discovery of single nucleotide variants in WGSS data. This software explicitly integrates information about copy number state of different genomic segments into the inference of single nucleotide variants. SNP discovery C
ConDeTri ConDeTri is a content dependent read trimming software for Illumina/Solexa sequencing data RNA-Seq
DNA-Seq
Genomics
Trimming Perl
ContEst GATK tool to estimate amount of cross-individual contaminating sequence in a dataset Sequencing Quality Control Java BSD
Contra Copy number analysis for exome-sequencing / targeted-resequencing. Two methods of analysis available: Case vs Control, or Case vs Baseline. Function available for creating a baseline from multiple samples. Next Generation Sequencing
Cancer biology
Genomics
Copy number estimation
Copy number analysis
baseline (pseudo-control) creation
Python
R
GPL v3 Linux 64
Linux
Contrail A Hadoop based genome assembler for assembling large genomes in the clouds De-novo assembly Assembly
De Bruijn graph
Hadoop
CopySeq CopySeq analyzes the depth-of-coverage of whole genome resequencing data to predict CNVs and to infer quantitative locus copy-number genotypes. Structural variation
Copy number estimation
Genotyping
Personal genomics
Java
R
Mac OS X
Linux
Coral Corrects sequencing errors in short read data via multiple alignments Error correction C++
CORAL (Contig Ordering Algorithm) An algorithm has been developed to order fingerprinted clones within contigs. Error correction Java
Cortex Cortex is an efficient and low-memory software framework for analysis of genomes using sequence data. Cortex allows de novo assembly of variants without having to do a consensus assembly first. Also allows comparison of genomes without using consensus, and alignment of sequence data to a de Bruijn graph Genomics Assembly
Variant Calling
C GPLv3
CPTRA Integrated transcriptome analysis from Sanger, 454, Solexa, SOLiD, etc reads RNA-Seq Alignment
RNA-Seq Quantitation
Python
CPTRA Integrated transcriptome analysis from Sanger, 454, Solexa, SOLiD, etc reads RNA-Seq Alignment
RNA-Seq Quantitation
Python
CRAC CRAC is a mapping software specialized for RNA-Seq data. It detects mutations, indels, splice or fusion junctions in each single read. Mapping
RNA Seq analysis
RNA-Seq Alignment
Alternative Splicing
Fusion genes
Fusion transcripts
SNP discovery
InDel discovery
Mapping
Read mapping
Burrows-Wheeler
FM-Index
C++ CeCILL Linux
Linux 64
Mac OS X
CRISP Identifies rare and common variants in pooled sequencing data SNP discovery Pooled samples Python
Crossbow Crossbow is a cloud-computing software tool that combines the aligner BOWTIE and the SNP caller SOAPsnp. SNP discovery Mapping
MapReduce
Hadoop
Crossbow Crossbow is a cloud-computing software tool that combines the aligner BOWTIE and the SNP caller SOAPsnp. SNP discovery Mapping
MapReduce
Hadoop
CUDA-EC A scalable parallel algorithm for correcting sequencing errors in high-throughput short-read data so that error-free reads can be available before DNA fragment assembly. Sequencing Quality Control
GPU
read error correction C
Cufflinks Cufflinks assembles transcripts and estimates their abundances in RNA-Seq samples. It accepts aligned RNA-Seq reads and assembles the alignments into a parsimonious set of transcripts. Cufflinks then estimates the relative abundances of these transcripts based on how many reads support each one. RNA-Seq Alignment
RNA-Seq Quantitation
Alternative Splicing
Transcriptome
RNA-Seq
Transcript assembly
Mapping
Differentially expressed gene identification
Differential expression
Boost
CummeRbund Allows for persistent storage, access, exploration, and manipulation of Cufflinks high-throughput sequencing data. In addition, provides numerous plotting functions for commonly used visualizations. RNA-Seq Quantitation Visualization
Curtain Curtain is a Java wrapper around next-generation assemblers such as Velvet which allows the incremental introduction of read-pair information into the assembly process. This enables the assembly of larger genomes than would otherwise be possible within existing memory constraints. De-novo assembly Assembly
De Bruijn graph
Apache License 2.0
Cutadapt remove adapter sequences from high-throughput sequencing data using alignment Python
C
MIT
DCLIP dCLIP is a Perl program for discovering differential binding regions in two comparative CLIP-Seq (HITS-CLIP, PAR-CLIP or iCLIP) experiments. CLIP-Seq
HITS-CLIP
PAR-CLIP
ICLIP
Alignment Analysis Perl
C
UNIX
Unix-like
DecGPU Parallel and distributed error correction algorithm for high-throughput short reads. De-novo assembly Error correction
GPU
C++ GPLv3 Linux
DeconSeq DeconSeq can be used to automatically detect and efficiently remove any type of sequence contamination from metagenomic datasets, including human or other host sequences. The tool uses a modified version of the BWA-SW aligner and can be applied to longer-read datasets (150+bp read length). DeconSeq is available as both standalone and web-based versions. Metagenomics
Metatranscriptomics
Genomics
Contaminant filtering Perl
C
GPLv3 UNIX
Mac OS X
DeepTools User-friendly tools for the normalization and visualization of deep-sequencing data. Genomics
ChIP-Seq
Normalization
Visualization
Conversion
Data Visualisation
coverage analysis
conversion
normalization
GC plot analysis
Python GPL3 Linux
Mac OS X
DeFuse deFuse is a software package for gene fusion discovery using RNA-Seq data. The software uses clusters of discordant paired end alignments to inform a split read alignment analysis for finding fusion boundaries. The software also employs a number of heuristic filters in an attempt to reduce the number of false positives and produces a fully annotated output for each predicted fusion Fusion genes
RNA-Seq
Fusion transcripts
DEGseq an R package to identify differentially expressed genes or isoforms for RNA-seq data from different samples RNA-Seq Quantitation Differentially expressed gene identification R
DESeq DESeq is an R package to analyse count data from high-throughput sequencing assays such as RNA-Seq and test for differential expression. The latest version is DESeq2 (released April 2013). RNA-Seq Quantitation
ChIP-Seq
Statistical testing
Sequencing Quality Control
R GPLv3 UNIX
Windows
Mac OS X
DIAL A computational pipeline for identifying single-base substitutions between two closely related genomes without the help of a reference genome. SNP discovery
Comparative genomics
C
Python
MIT Linux
DiBayes Bayesian identification of SNPs in color space (SOLiD) data SNP discovery Colorspace GPL
DiffBind Differential Binding Analysis of ChIP-Seq peak data Compute differentially bound sites from multiple ChIP-seq experiments using affinity (quantitative) data. Also enables occupancy (overlap) analysis and plotting functions. ChIP-Seq Differential binding sites Multiple replicates information used; automated pipeline; finding hotspots; R Artistic-2.0 Linux
Mac OS X
Windows
Diffreps diffReps is developed to find different peaks in ChIP-seq. It scans the whole genome using a sliding window, performing millions of statistical tests and report the significant hits. diffReps takes into account the biological variations within a group of samples and uses that information to enhance the statistical power. Considering biological variation is of high importance, especiallly for in vivo brain tissues. ChIP-Seq ChIP-seq differential analysis Multiple replicates information used; automated pipeline; easy genomic annotation; finding hotspots; Perl GPLv3 Linux
Windows
Mac OS X
Dindel Calls small indels from short-read sequence data InDel discovery Localized reassembly/realignment
DiscoSnp discoSnps : qualitative de-novo SNP caller. Extremely low memory and time efficient. No reference genome needed. Call both homozygous and heterozygous SNPs. Population Genomics
Comparative genomics
Barcoding
DNA-Seq
De novo assembly
Genotyping
High-throughput sequencing
De Bruijn graph
Targeted de novo assembly
Read depth analysis
de novo (reference free) SNP calling C++ CeCILL Unix-like
iOS
DNA Baser Tool for manual and automatic sequence assembly, analysis, editing, sample processing, metadata integration, file format conversion and mutation detection. Structural variation
SNP discovery
Assembly
Assembly editing
Sequence analysis
Portable. Does not require installation. Can run from USB stick. Only 3MB. Compiled Commercial
Freeware
Windows
DNA Chromatogram Explorer DNA Chromatogram Explorer is a Windows Explorer clone dedicated to DNA sequence analysis and manipulation. Chromatogram management
Chromatogram viewer
Conversion
Portable. Does not require installation. Can run from USB stick. Only 1MB. Freeware Windows
DNAA DNAA (DNA Analysis) software for analysis of Next-Generation Sequencing data. Structural variation
SNP discovery
DNA methylation
Statistics
Sequencing Quality Control
Simulation
GPL
DNaseR DNase I footprinting analysis of DNase-seq data in R DNase-seq DNase I footprinting
Digital genomic footprinting
R/Bioconductor package
can run on major computer platforms
R GPL-2 + file LICENSE Linux
Mac OS X
Windows
DNAzip A series of techniques that in combination reduces a single genome to a size small enough to be sent as an email attachment. Data compression C++
DrFAST Fast mapper for dibase encoded data.
DSAP Automated multiple-task web service designed to provide a total solution to analyzing deep-sequencing small RNA datasets generated by next-generation sequencing technology Small RNA transcriptome
MiRNA
browser based
DSGseq This program aims to identify differentially spliced genes from two groups of RNA-seq samples. RNA-Seq
Differential Expression
Alternative Splicing
RNA-Seq analysis
Differential expression
Alternative Splicing
Statistical testing
C
R
Commercial
Freeware
Linux
Windows
Mac OS X
DSRC Compression algorithm for genomic data in FASTQ format Data compression
E-miR Perl tools for processing miRNA sequencing data Small RNA transcriptome
MiRNA
Ea-utils FASTQ processing utilities Trimming
Sequencing Quality Control
C++ MIT
EagleView EagleView is an information-rich genome assembler viewer with data integration capability. Assembly visualization
EagleView genome viewer EagleView is an information-rich genome assembler viewer with data integration capability. Viewer
Easyfig Genome comparison figure generator Comparative genomics Comparative genomics Python GPLv3 Windows
Mac OS X
GNU/Linux
EBCall EBCall is a software package for somatic mutation detection (including InDels). EBCall uses not only paired tumor/normal sequence data of a target sample, but also multiple non-paired normal reference samples for evaluating distribution of sequencing errors, which leads to an accurate mutaiton detection even in case of low sequencing depths and low allele frequencies.
ECHO Reference-free short read error correction from diploid genomes, with explicit modeling of heterozygous sites. SNP discovery
InDel discovery
Error correction Python
C++
BSD
EDENA An assembler dedicated to process the millions of very short reads produced by the Illumina Genome Analyzer. Assembly
EdgeR edgeR is an R/Bioconductor software package for statistical analysis of replicated count data. Methods are designed for assessing differential expression in comparative RNA-Seq experiments, but are generally applicable to count data from other genome-scale platforms (ChIP-Seq, MeDIP-Seq, Tag-Seq, SAGE-Seq etc). RNA-Seq
RNA-Seq Quantitation
ChIP-Seq
Gene Expression Analysis
DNA methylation
Statistical testing R LGPL Windows
Mac OS X
UNIX
ELAND Efficient Large-Scale Alignment of Nucleotide Databases. Whole genome alignments to a reference genome. Written by Illumina author Anthony J. Cox for the Solexa 1G machine. Alignment Commercial
EMBF Frequency-based, de novo short-read clustering method that organizes erroneous short sequences originating in a single abundant sequence into a tree structure; in this structure, each “child” sequence is considered to be stochastically derived from its more abundant “parent” sequence with one mutation through sequencing errors. Mapping
Epigenome A bioinformatic pipeline that scores epigenetic alterations according to strength and significance and links them to potentially affected genes. Epigenomics Bisulfite mapping R
Python
EpiGRAPH EpiGRAPH enables biologists to analyze genome and epigenome datasets with powerful statistical and machine learning methods. In a typical workflow, the user uploads a set of genomic regions of interest (e.g. experimentally mapped enhancers, hotspots of epigenetic regulation or sites exhibiting disease-specific alterations), and EpiGRAPH searches a large database of (epi-) genomic attributes for significant overlap and correlation with the regions in the input dataset. Furthermore, EpiGRAPH can predict the status of genomic regions that were not included in the input dataset. Epigenomics Statistics
Machine Learning
browser based
ERANGE ERANGE is a Python package for doing RNA-seq and ChIP-seq. RNA-Seq Alignment
RNA-Seq Quantitation
ChIP-Seq
Allele-specific transcription
RNAseq analysis
Chipseq analysis
Python
ERDS ERDS is a free, open-source software, designed for detection of copy number variants (CNVs) on human genomes from next generation sequence data. It uses paired Hidden Markov models (PHMM) based on the expected distribution of read depth of short reads and the presence of heterozygous sites. ERDS is NOT good for whole exome data. Copy number estimation Hidden Markov Model
ERGO Genome Analysis and Discovery System ERGO provides a systems-biology informatics toolkit centered on comparative genomics to capture, query and visualize sequenced genomes. Building upon the most comprehensive genomic database available anywhere integrated with the largest collection of microbial metabolic and non-metabolic pathways and using Igenbio's proprietary algorithms, ERGO assigns functions to genes, integrates genes into pathways, and identifies previously unknown or mischaracterized genes, cryptic pathways and gene products. Metabolic reconstruction
Phylogenetics
Comparative genomics
SNP Annotation
SNP discovery
Alignment
Exome analysis
Metagenomics
Pathway analysis
Comparative transcriptomics
Functional Genomics
Gene Expression Analysis
Genome Wide Association Studies
Fusion finding
Fusion genes
Fusion transcripts
Sequence annotation
Sequence functional annotation
Transcription Factor analysis
Commercial Web
ERNE Extended Randomized Numerical alignEr for accurate alignment of NGS reads. It can map bisulfite-treated reads. Genomics
Alignment
Bisulfite Sequencing
Mapping
Bisulfite mapping
Bisulfite sequencing
sequence alignment
C++ GPL v3 Linux
Mac OS X
Windows
Error Correction Evaluation Toolkit Evaluation of error correction results Sequence Quality Control Python
Perl
POSIX
Est2assembly Processes raw sequence data from Sanger or 454 sequencing into a hybrid de-novo assembly, annotates it and produces GMOD compatible output, including a SeqFeature database suitable for GBrowse. RNA-Seq Alignment
Genomics
Assembly
ESTcalc Estimation of project costs for RNA-Seq study. RNA-Seq Cost estimation Perl
EULER EULER-SR is a program for de novo assembly of reads. Contrary to the overlap-layout approach, EULER-SR uses a de Bruijn graph to construct an assembly. The assembly of a genome corresponds to an Eulerian path in the de Bruijn graph. Long (possibly erroneous) reads, and mate-pairs are used to determine parts of the correct Eulerian traversal in the assembly. Assembly
De Bruijn graph
C++
Perl
Linux
ExomeCNV Identifies copy number variation from targeted exome sequencing data Targeted resequencing
Copy number estimation
R
ExomeCopy CNV detection from exome sequencing read depth Exome and Whole genome variant detection
Copy number estimation
Exome analysis
Hidden Markov Model simultaneous normalization and segmentation R GPL 2.0+ Linux
Windows
Mac OS X
ExomePicks ExomePicks is a program that suggests individuals to be sequenced in a large pedigree.
Exonerate Various forms of alignment (including Smith-Waterman-Gotoh) of DNA/protein against a reference. Authors are Guy St C Slater and Ewan Birney from EMBL. C for POSIX. Alignment C GPL Linux
FAAST Flowspace Assisted Alignment Search Tool Mapping Linux
FaBox Tools for splitting, joining and otherwise manipulating FASTA format sequence files. Phylogenetics
Genomic Assembly
FACS Rapid and accurate classification of sequences as belonging or not belonging to a reference sequence. Metagenomics Bloom filters Perl
f
GPLv2 Linux
FastQ Screen FastQ Screen provides a simple way to screen a library of short reads against a set of reference libraries. Its most common use is as part of a QC pipeline to confirm that a library comes from the expected source, and to help identify any sources of contamination. Genomics
Transcriptomics
Mapping
Sequencing Quality Control
Summarises the mapping of a library against a series of reference sequences Perl GPLv3 Linux
Mac OS X
Windows
FastQC FastQC aims to provide a simple way to do some quality control checks on raw sequence data coming from high throughput sequencing pipelines. Sequencing Quality Control Java GPLv3 UNIX
Windows
FastQValidator Checking that FastQ files are follows standards Quality Control Sequencing Quality Control C++
FDM Detects differential transcription in RNA-Seq data RNA-Seq Quantitation
FeatureCounts featureCounts is a very efficient read quantifier. It can be used to summarize RNA-seq reads and gDNA-seq reads to a variety of genomic features such as genes, exons, promoters, gene bodies and genomic bins. It is included in the Bioconductor Rsubread package and also in the SourceForge Subread package. Next Generation Sequencing Read summarization read summarization R
C
GPLv3 Linux 64
Mac OS X
Mac OS X; x86 64
FHiTINGS "FHiTINGS is designed for use in rapidly identifying, classifying, and parsing internal transcribed spacer (ITS) DNA sequences after a BLASTn search. This software is useful for fungal ecology studies using next generation sequencing (NGS)." Metagenomics
Comparative genomics
Alignment Python Cross-Platform
Figaro Figaro is a software tool for identifying and removing the vector from raw DNA sequence data without prior knowledge of the vector sequence. Sequencing K-mer analysis AMOS C++
Perl
Artistic License UNIX
Figaro Figaro is a software tool for identifying and removing the vector from raw DNA sequence data without prior knowledge of the vector sequence. Sequencing K-mer analysis AMOS C++
Perl
Artistic License UNIX
Filter Produces a filtered version of an sRNA dataset, controlled by several user-defined criteria, including sequence length, abundance, complexity, transfer and ribosomal RNA removal. General bioinformatics (pipeline) Filtering multi-threading Java Custom Licence Linux 64
Windows
Mac OS X
FindPeaks 3.1 Findpeaks was developed to perform analysis of ChIP-Seq experiments. ChIP-Seq Peak calling GPLv3
FindPeaks 4.0 (Vancouver Short Read Package) The Vancouver Short Read Analysis Package (VSRAP) contains the FindPeaks application for Chip-Seq and RNA-Seq analysis, as well as utilities for SNP finding, working with aligned sequence files and a nascent database for storing SNPs across multiple libraries. Genomics
SNP discovery
Peak calling
Database
Format conversion
Alignment Analysis
command line Java GPL Linux
Windows
Mac OS X
FLASH Identifies paired-end reads which overlap in the middle, converting them to single long reads Assembly
Read pre-processing
combining forward and reverse reads C Open Source Linux 64
Flexbar flexible barcode and adapter processing for next-generation sequencing platforms Next Generation Sequencing
Sequence Quality Control
Genomics
Read pre-processing
Sample Barcoding
Adapter Removal (software)
Trimming
Paired read support
separate barcode reads
multi-threading
C++ GPLv3 Linux
Windows
Mac OS X
Flower Tool for reformatting SFF files into other formats or tab-delimited Haskell
FlowSim Tool for simulating errors in 454 sequencing data Error correction
Simulation
Haskell
Flux FluxCapacitor s a computer program to predict splice form abundancies from reads of an RNA-seq experiment. FluxSimulator can generate simulated data for testing RNA-seq pipelines RNA-Seq Simulation
Forge De novo assembly using a combination of next-generation and Sanger reads Genomics
De-novo assembly
Assembly
FragGeneScan Application for finding (fragmented) genes in short reads Metagenomics C
Perl
GPL
FrameDP Sensitive peptide detection on noisy matured sequences. A self-training integrative pipeline for predicting CDS in transcripts which can adapt itself to different levels of sequence qualities. RNA-Seq
FreClu a frequency-based, de novo short-read clustering method that organizes erroneous short sequences originating in a single abundant sequence into a tree structure; in this structure, each “child” sequence is considered to be stochastically derived from its more abundant “parent” sequence with one mutation through sequencing errors. The root node is the most frequently observed sequence that represents all erroneous reads in the entire tree, allowing the alignment of the reliable representative read to the genome without the risk of mapping erroneous reads to false-positive positions. RNA-Seq Alignment Mapping
Freebayes Bayesian genetic variant detector (SNPs, indels, MNPs) Genomics MIT
FREEC A tool for control-free copy number alteration (CNA) detection using deep-sequencing data, particularly useful for cancer studies. Copy number estimation Linux
Linux 64
Windows
FusionAnalyser FusionAnalyser is a new graphical, event-driven tool dedicated to the identification of driver fusion rearrangements in human cancer through the analysis of paired-end high-throughput transcriptome sequencing data. Tested on Illumina. Requires short, paired-end sequences. High-throughput sequencing Gene fusions discovery. Advanced and user-friendly analysis of RNA-seq data for fusion discovery C# GPLv3 Windows
Linux
FusionCatcher FusionCatcher searches for novel/known fusion genes, translocations, and chimeras in RNA-seq data (paired-end reads from Illumina NGS platforms like Solexa and HiSeq). RNA-Seq
Fusion finding
Alignment Python GPL v3 *NIX
FusionHunter Identifies gene fusions in RNA-Seq data RNA-Seq
Fusion transcripts
Perl
C
Linux
Linux 64
FusionMap Detects fusion events in both single- and paired-end datasets from either RNA-Seq or gDNA-Seq studies and characterize fusion junctions at base-pair resolution. Fusion genes
Fusion transcripts
Split-read C# Commercial
Freeware
Windows
Linux
Linux 64
FusionSeq Identifies fusion transcripts from paired end RNA-Seq data. Fusion transcripts
RNA-Seq
Fusion genes
Alignment Analysis C Creative Commons - Attribution; Non-commercial 2.5 Mac OS X
UNIX
Linux
Fuzzypath Assembler Genomics De Bruijn graph
Assembly
G-Mo.R-Seq G-Mo.R-Se is a method aimed at using RNA-Seq short reads to build de novo gene models. RNA-Seq Alignment CeCILL Linux
G-SQZ Huffman coding-based sequencing-reads specific representation scheme that compresses data without altering the relative order. Read storage
Data compression
C++
Galaxy "Galaxy is an open, web-based platform for data intensive biomedical research. Whether on the free public server or your own instance, you can perform, reproduce, and share complete analyses. " Comparative genomics
Functional Genomics
Whole Genome Resequencing
Genomic Assembly
Genomics
Alignment
Assembly
Quality Control
Visualization
Python Cross-Platform
Galign Identifies polymorphisms between sequence reads obtained using Illumina/Solexa technology and a reference genome SNP discovery Mapping GPL
Gambit A cross-platform GUI for sequence visualization and analysis. Visualization GPL 2.0+
Commercial
GAMES GAMES (Genomic Analysis of Mutations Extracted by Sequencing) is a tool for mining and prediction of functional effect of mutation. SNP discovery
SNP Annotation
InDel discovery
Perl Linux
GASSST Fast and accurate aligner for short an long reads Alignment
Mapping
Gapped alignment
short and long reads
C++ CeCILL Linux
GASV Software for classification and comparison of structural variants measured via paired-end sequencing and/or array-CGH. Structural variation GPLv3
GATK The Genome Analysis Toolkit (GATK) is a structured programming framework designed to enable rapid development of efficient and robust analysis tools for next-generation DNA sequencers. The GATK solves the data management challenge by separating data access patterns from analysis algorithms, using the functional programming philosophy of Map/Reduce SNP discovery MapReduce
Programming Library
Localized reassembly/realignment
Java
Python
GBrowse Genome Viewer Visualization Genome Viewer Perl Open Source Linux
Mac OS X
Windows
GeeFu Database tool for genomic assembly and feature data Genomics Assembly Ruby
GEM GEM is a java software tool to analyze transcription factor binding ChIP-Seq/ChIP-exo data. It predicts binding events, performs de novo motif discovery and use the motif to improve the binding event calling. It calls binding events right at (or very close to ) the motif positions, deconvolves closely spaced homotypic binding events and accurately discovers binding motifs. ChIP-Seq Peak calling
Motif discovery
Sequence motif discovery
probabilistic mixture model
motif prior
multi-threading
Java Commercial
Freeware
Cross-Platform
GEM library A set of very optimized tools for indexing/querying huge genomes/files. Provided so far: a very fast exact mapper, and an unconstrained split-mapper Mapping
Programming Library
Colorspace
C
Python
OCaml
GPLv3
GENALICE MAP From FASTQ to VCF in 30 min or less. Ultra-fast Next-Generation Sequencing (NGS) read alignment and variant calling solution. Genomics Mapping and variant calling ultrafast alignment and variant calling Commercial Linux
GENE-Counter GENE-counter is a computational pipeline for analyzing RNA-Sequencing (RNA-Seq) data for differential gene expression RNA-Seq Linux
Mac OS X
Genedata Expressionist
Genedata Expressionist is the leading software platform for the efficient and quality compliant analysis of next generation sequencing (NGS) and other genomic profiling data, often used in translational research. The highly scalable enterprise system can simultaneously process thousands of experiments in high throughput. By utilizing standardized and end-to-end traceable NGS data analysis workflows, an adaptable data management system with strict user management, audit trailing, and reporting, the platform fully meets the highest regulatory/compliance standards in the pharmaceutical industry.
Genomics
Epigenomics
SNP discovery
InDel discovery
ChIP-Seq
RNA-Seq
Transcriptomics
Bisulfite Sequencing
Next Generation Sequencing
Alignment
Annotation
Bisulfite mapping
ChIP-Seq analysis
Clustering
Copy number estimation
Gene expression analysis
Fusion genes
General bioinformatics
Genome browser
InDel discovery
Mapping
Normalization
Quality Control
RNA-Seq analysis
Sequence analysis
Statistics
Data processing
Data management
Data analysis
Data storage
Downstream analysis
Pipeline Management
Workflows
Reporting
Audit Trail
User Management
Data Visualisation
Java Commercial Windows
Linux
Geneious Search, organize and analyze genomic and protein information of any size via desktop program that provides publication ready images to enhance the impact of your research. Phylogenetics
Sequence analysis
De-novo assembly
Genomics
Population genetics
Metagenomics
Structural variation
RNA-Seq
Epigenomics
Alignment
Assembly
Assembly validation
Annotation
Genome browser
Sample Barcoding
Mapping
Visualization
Motif discovery
Variant Calling
Java Commercial Windows
Mac OS X
Linux
Solaris
GeneProf GeneProf is a web-based, graphical software suite and database resource for high-throughput-sequencing experiments (RNA-seq and ChIP-seq). RNA-Seq
ChIP-Seq
Mapping
Visualization
Peak calling
Differentially expressed gene identification
Quality assessement
User-friendly
wizards
tutorials
examples
very flexible
reproducible
transparent
extensible
API
Java
Javascript
Commercial
Freeware
browser based
GeneTalk GeneTalk, a web-based platform, that can filter, reduce and prioritize human sequence variants from NGS data and assist in the time consuming and costly interpretation of personal variants in clinical context. It serves as an expert exchange platform for clinicians and scientists who are searching for information about specific sequence variants and connects them to share and exchange expertise on variants that are potentially disease-relevant. Genetic variation annotation
Sequence variation analysis
Variant Calling
Structural variation discovery
Filtering
Annotation
Database
Exome analysis
Sequence analysis
Variant Classification
Viewer
Easy-to-use point-and-click web interface
data visualization
data filtering
Fast
SNP annotation
SNP calling
Variant annotation and analysis
variant counting
Ruby
Javascript
Freemium
Genomatix Mining Station (GMS) The Genomatix Mining Station (GMS) offers mapping of NGS reads onto genomes, transcriptomes and splice-junction libraries. It is a client-server based solution and can be controlled through an intuitive GUI or via command-line. It covers different tasks such as, as genomic positioning, SNP detection, splice analyses and genomic enrichments. RNA-Seq
SNP discovery
ChIP-Seq
Assembly
Mapping
SNP calling
Genomic correlations
Client-server based system allows for command-line and web-based access. Grid engine is used for job scheduling and mapping is run on multiple cores. Can be combined with a Genomatix Genome Analyzer (GGA) for a fully integrated NGS solution. C++
Java
Flash
Commercial Windows
Mac OS X
Linux
Genome Trax Genome Trax™ enables you to identify human genome variations of functional significance by mapping your NGS data to known elements such as disease mutations and regulatory sites. Structural variation
Regulatory genomics
Variant Mapping
SNP calling
InDel discovery
Commercial
GenomeBrowse A free genome browser for exploring sequencing pile-up and coverage data with numerous annotation tracks hosted on the cloud. Sequence analysis
DNA-Seq
Alignment
De novo sequencing
Exome analysis
Exome and whole genome variant detection
Genetics
Whole Genome Resequencing
Next Generation Sequencing
Genomics
Alignment viewer
Assembly visualization
Visualization
Windows
Linux
Mac OS X
Genomedata Genomedata is a format for efficient storage of multiple tracks of numeric data anchored to a genome. The format allows fast random access to hundreds of gigabytes of data, while retaining a small disk space footprint. Storage Signal Python
C
GPL Linux
Mac OS X
GenomeJack GenomeJack is a genome browser specialized in next-generation sequencing data. Advantages are intuitive interface and smooth drag'n drop response. Genomics
Personal genomics
Visualization Java Freeware Windows
Mac OS X
Linux
GenomeMapper GenomeMapper is a short read mapping tool designed for accurate read alignments. It quickly aligns millions of reads either with ungapped or gapped alignments. It can be used to align against multiple genomes simulanteously or against a single reference. Alignment
Mapping
Genometa Genometa is a Java based local bioinformatics program which allows rapid analysis of metagenomic short read datasets. Millions of short reads can be accurately analysed within minutes and visualised in the browser component. A large database of diverse bacteria and archaea has been constructed as a reference sequence. Metagenomics
Genomics
Mapping
Visualization
mapping
Data Visualisation
Java Linux
Mac OS X
Windows
GenomeTools The GenomeTools genome analysis system is a free collection of bioinformatics tools for genome informatics.1.3.6 Genomics Integrated solution C BSD POSIX
Linux
Mac OS X
OpenBSD
Windows (Cygwin)
UNIX
GenomeView GenomeView is a next-generation stand-alone genome browser and editor initiated in the BSB group at VIB and currently developed at Broad Institute. It provides interactive visualization of sequences, annotation, multiple alignments, syntenic mappings, short read alignments and more. Many standard file formats are supported and new functionality can be added using a plugin system. Genomics
Comparative genomics
Comparative transcriptomics
Transcriptomics
Gene annotation retrieval
Quality Control
Sequencing
Sequence analysis
Visualization
Alignment viewer
Multiple sequence alignment viewer
Viewer
Genome browser
Visualization of a multitude of genomics data Java GPL platform-independent
GenomicTools GenomicTools is a flexible computational platform for the analysis and manipulation of high-throughput sequencing data such as RNA-seq and ChIP-seq. A variety of mathematical operations between sets of genomic regions is implemented thereby enabling the prototyping of computational pipelines that can address a wide spectrum of tasks from preprocessing and quality control to meta-analyses. More specifically, the user can easily create average read profiles across transcriptional start sites or enhancer sites, quickly prototype customized peak discovery methods for ChIP-seq experiments, perform genome-wide statistical tests such as enrichment analyses, design controls via appropriate randomization schemes, among other applications. Genomics
ChIP-Seq
RNA-Seq
Genomic overlaps
Peak detection
Profiles
Heatmaps
create custom pipelines
feature overlaps
identify binding site peaks in ChIP-seq data
create read profiles
create read heatmaps
C
C++
GPL 2
GenoMiner A proprietary NGS analysis solution. Powerful hardware comes with preinstalled software, organized in workflows. Reference assembly
De-novo assembly
ChIP-Seq
RNA-Seq
Assembly
Viewer
Error correction
Mutation detection
Peak detection
Expression profiling
Sequence alignment
GenoMiner provide workflows for Reference assembly
De novo assembly
ChIPSeq
RNASeq and more. You upload your files at the beginning
and you get the results at the end
while you can choose from various tools to use for analysis.
Java Commercial Linux
GenoREAD GenoREAD is a web-based, sequence verification software that can be used to compare Sanger sequencing trace files against a reference sequence. Users can either submit their sequencing results one clone at a time, or they can submit a series of clones (as a project) to run at once. Results can be viewed online or downloaded. Sequencing
Clone verification
Mapping
Assembly
Alignment
PERL; PHP; Javascript Linux 64
GenoViewer A feature rich NGS assembly viewer/browser. Viewer large file loading
multicontig handling
SNP/InDel/Read Error display and search
mutation table generation and export
consensus sequence generation and export
Java Freeware platform-independent
GensearchNGS A user friendly framework for re-sequencing in a diagnostics context: searching for mutations/variants, especially on well known genes. Targeted resequencing Alignment
Alignment viewer
Read Alignment
Variant Prioritization
Mutation detection
Database
Database submission preparation
Plugin framework
Cafe Variome submission
Java Commercial UNIX
Windows
GenVision GenVision is a genomic visualization software package that is fully integrated with Lasergene and is designed to support easy generation of publication quality graphics and maps. Genomics Visualization Commercial Windows
Mac OS X
Geoseq Instead of mapping the reads to reference genomes or sequences, Geoseq maps a reference sequence against the sequencing data. It is web-based, and holds pre-computed data from public libraries. Resequencing Mapping
GigaBayes A short-read SNP and short-INDEL discovery program. Genomics
SNP discovery
SNP calling
GimmeMotifs GimmeMotifs is a de novo motif prediction pipeline, especially suited for ChIP-seq datasets. It incorporates several existing motif prediction algorithms in an ensemble method to predict motifs and clusters these motifs using the WIC similarity scoring metric. Transcription regulation
ChIP-Seq
Epigenomics
Motif analysis Python MIT Linux
Girafe The R/Bioconductor package girafe facilitates the functional exploration of alignments of sequence reads from next-generation sequencing data to a genome. It allows users to investigate the genomic intervals together with the aligned reads and to work with, visualise and export these intervals. Alignment R
Gk arrays Gk-arrays are a data structure to index the k-mers in a collection of reads. Genomics
Transcriptomics
Metagenomics
Assembly
Error correction
Mapping
programming library C++ CeCILL-C license Linux
Linux 64
Mac OS X
any
GMAP GMAP (Genomic Mapping and Alignment Program) for mRNA and EST Sequences. Alignment
Mapping
C
Bourne shell
UNIX
Gnumap The Genomic Next-generation Universal MAPper (gnumap) is a program designed to accurately map sequence data obtained from next-generation sequencing machines (specifically that of Solexa/Illumina) back to a genome of any size. Currently, gnumap is designed to be used with the _int.txt data received from the Solexa/Illumina machine. Mapping C++
Goby framework Goby is a next-gen data management framework designed to facilitate the implementation of efficient next-gen data analysis pipelines. RNA-Seq Programming Library
Data compression
Java GPLv2
Golden Helix Golden Helix is a bioinformatic software provider and analytic service provider. The core of its business is about empowering scientists to discover more, discover it easier, and to come away with valid and reproducible bioinformatics results. The software, SNP & Variation Suite, is a stable platform for clever data manipulations, robust quality assurance, advanced statistical modeling, and compelling visual results in a genome browser environment of DNA Seq, Copy Number variation, SNP Chip, and RNA Seq data. Epigenomics
Genomics
DNA-Seq
SNP discovery
Whole Genome Resequencing Analysis
Copy number estimation
Quality Control
Quality Control
Statistics
Statistical testing
Genome browser
Annotation
Filtering
Collapsing Methods
Variant Classification
Variant Mapping
Windows
Linux
Mac OS X
Goseq An R package to detect Gene Ontology (GO) categories and other categories of genes (such as KEGG pathways) that are over/under represented in an RNA-seq data. RNA-Seq Quantitation Gene Set Testing R LGPL UNIX
Windows
Gowinda Gowinda: unbiased analysis of gene set enrichment for Genome Wide Association Studies Genomics
Genome Wide Association Studies
Population genetics
Population Genomics
High-throughput sequencing
Gene set enrichment
Gene ontology
Genome wide association studys
Multicore Java Mozilla Public License Mac OS X
Linux
Windows
GPS GPS is a high spatial resolution peak detection algorithm for ChIP-Seq data. Genomics
ChIP-Seq
Transcription Factor Binding Site identification
Regulatory genomics epigenomics
Protein Binding Peak Detection multi-threading Java Commercial
Freeware
Cross-Platform
GPSeq Analyze RNA-seq data to estimate gene and exon expression, identify differentially expressed genes, and differentially spliced exons RNA-Seq Quantitation R
C
GRS Reference-based data compression for storage of resequencing data Data compression sequence compression C
Bourne shell
Commercial
Freeware
Linux
Linux 64
GSNAP GSNAP can align both single-end and paired-end reads as short as 14 nt and of arbitrarily long length. It can detect short- and long-distance splicing, including interchromosomal splicing, in individual reads using probabilistic models or a database of known splice sites. Our program also permits SNP-tolerant alignment to a reference space of all possible combinations of major and minor alleles, and can align reads from bisulfite treated DNA for the study of methylation state. RNA-Seq Alignment
DNA methylation
Mapping
Bisulfite mapping
C
Perl
Hairpin Annotation Generates a secondary structure from an RNA sequence and highlights regions of interest using RNAplot General bioinformatics (pipeline) Java Custom Licence Linux 64
Windows
Mac OS X
Haplowser Haplowser: comparative haplotype browser for personal genome and metagenome Visualization
Haplotype reconstruction
Java GPL
HawkEye An interactive visual analytics tool for genome assemblies. Assembly visualization Assembly visualization C++ Artistic License Linux
Mac OS X
HeliSphere Open-source LINUX software package intended for use in analyzing data produced by the HeliScope Single Molecule Sequencer. Genomics
Whole Genome Resequencing
RNA-Seq
SNP discovery
Mapping Freeware Linux
HI Program for haplotype reconstruction from paired-end reads. Haplotype reconstruction Java
Hicup A mapping pipeline for HiC interaction data. Performs independent mapping on each end of the interaction pair and removes commonly found artefacts. Epigenomics Mapping Perl GPLv3 UNIX
Linux
Mac OS X
HINT HMM-based Identification of TF Footprints Regulatory genomics
Regulatory genomics epigenomics
Transcription Factor Binding Site identification
Digital genomic footprinting Digital Genomic Footprinting Python GNU GPL v3 Unix-like
HiPipe HiPipe is to make NGS data analysis quick and easy with high performance pipelines and intuitive web GUI. Genomics Mapping
Variant detection
Analysis Pipeline
JavaScript; Java
BASH
platform-independent
HiTEC An algorithm which provides a highly accurate, robust, and fully automated method to correct reads produced by high-throughput sequencing methods. Error correction C++ GPLv3 Linux
HMMSplicer Splice junction discovery in RNA-Seq data RNA-Seq Alignment Python
HPeak Hidden Markov model (HMM)-based Peak-finding algorithm for analyzing ChIP-Seq data to identify protein-interacting genomic regions. ChIP-Seq Hidden Markov Model
HTSeq Python framework to process and analyse high-throughput sequencing (HTS) data Programming Library Python GPLv3
Hybrid-SHREC Improves sequence data quality using information from multiple platforms. Error correction Java
IBD2 Our algorithm uses a non-homogeneous hidden Markov model (HMM) that employs local recombination rates to identify chromosomal regions that are identical by descent (IBD=2) in children of consanguineous or non-consanguineous parents solely based on genotype data of siblings derived from high-throughput sequencing platforms. Targeted resequencing R
Java
Ibis Ibis (Improved base identification system), is an accurate, fast and easy-to-use base caller for the Illumina sequencing system, which significantly reduces the error rate and increases the output of usable reads. Ibis is faster and makes fewer assumptions about chemistry and technology Sequencing Basecaller Statistical learning of base calling parameters and calibrated quality scoring Python
C
C++
Non-commercial Linux
Windows (Cygwin)
ICORN Iteratively aligns deep coverage of short sequencing reads to correct errors in reference genome sequences and evaluate their accuracy. Assembly
Sequencing Quality Control
IDBA IDBA (Iterative De Bruijn graph short read Assembler) is a short read assembler based on iterative De Bruijn graph. It is developed under 64-bit Linux, but should be suitable for all unix-like system De-novo assembly Assembly POSIX
Linux
Linux 64
IGV The Integrative Genomics Viewer (IGV) is a high-performance visualization tool for interactive exploration of large, integrated datasets. It supports a wide variety of data types and format, including short-read alignments in the SAM/BAM format. Data can be viewed from local files or over the web via http. Genomics Visualization Java LGPL Windows
Mac OS X
Linux
Illuminate Analytics toolkit in Python for Illumina HiSeq and MiSeq metrics Genomics Sequencing Quality Control object-oriented access to results of binary parsing
some command line support
Python MIT Unix-like
Illuminator Software for machines running Windows to identify variants in Illumina short read data. SNP discovery
InDel discovery
IMAGE “Iterative Mapping and Assembly for Gap Elimination”. IMAGE closes gaps in a draft assembly using Illumina paired-end reads. Assembly editing
Inchworm Employs the Kmer graph method to reconstruct (in many cases full-length) transcripts from Illumina RNA-Seq (preferrably strand-specific) reads. RNA-Seq
De novo transcriptome assembly
InGAP inGAP is an integrated platform for next-generation sequencing project, the core function of which is to detect SNPs and indels using a Bayesian algorithm. SNP discovery Mapping
Assembly visualization
Ingenuity Variant Analysis Ingenuity Variant Analysis is a web application that helps researchers studying human disease to identify causal variants from human resequencing data in just minutes. Ingenuity Variant Analysis combines analytical tools and integrated content to help you rapidly identify and prioritize variants by drilling down to a small, targeted subset of compelling variants based both upon published biological evidence and your own knowledge of disease biology. With Variant Analysis, you can interrogate your variants from multiple biological perspectives, explore different biological hypotheses, and identify the most promising variants for follow-up. Genomics
Exome
Whole Genome Resequencing
SNPs
Variant Classification
Integrated Genome Browser Visualization software for next-generation genomics Genomics Visualization Java Open Source platform-independent
IOmics iOmics is a cloud based workflow analysis framework for managing, analyzing and visualizing NGS data. Genomics
Transcriptomics
Epigenomics
RNA-Seq
Exome and Whole genome variant detection
Genome Alignment
Assembly
Ab-inito gene prediction
Genetic variation annotation
Exome analysis
ChIP seq
MiRNA analysis (Ref and Ab-initio)
Commercial cloud
IQSeq Integrated Isoform Quantification Analysis based on A Partial Sampling Framework RNA-Seq Quantitation
Alternative Splicing
C++
ISAAC ISAAC comprises of genome aligner and variant caller, by Illumina. Runtime Speed C++ Linux 64
Isas Fast aligner for color and base space short read data. Alignment
Colorspace
Linux
IsoEM Expectation maximization algorithm for estimating alternative splicing isoform frequencies Alternative Splicing Expectation Maximization Java
ISSAKE Short Sequence Assembly by K-mer search and 3' read Extension, Immunology version (iSSAKE) Metagenomics Assembly Perl
Python
GPLv2
JBrowse Slick, speedy genome browser with a responsive and dynamic AJAX interface for visualization of genome data. Being developed by the GMOD project as a successor to GBrowse. Visualization Perl
Javascript
Open Source browser based
Jellyfish Fast, memory-efficient k-mer counting algorithm C++ GPLv3 Linux 64
Mac OS X
JointSLM Copy number estimation from read depth information Copy number estimation R
KARMA K-tuple Alignment with Rapid Matching Algorithm Bisulfite Sequencing Mapping
KBASE "KBase provides a computational framework and tools for integrating and analyzing large, diverse datasets generated by the scientific community to advance predictive understanding, manipulation, and design of biological processes in an environmental context. The purpose of KBase is to enable users to integrate a wide spectrum of genomics and systems biology data, models, and information for microbes, microbial communities, and plants. Powerful tools within KBase will allow users to analyze and simulate data to predict biological behavior, generate and test hypotheses, design new biological functions, and propose new experiments. " Comparative genomics Annotation Linux
Kismeth Web-based tool for bisulfite sequencing analysis DNA methylation
Epigenomics
Bisulfite mapping
Kissnp kisSnp compares two sets of NGS raw reads, detecting Single Nucleotide Polymorphism occurring between the two sets. The two sets typically come from the sequencing of two individuals from the same species or from closely related species. Comparative genomics
Comparative transcriptomics
Gene annotation retrieval
SNP discovery
InDel discovery
Micro assembly
De Bruijn graph
SNP calling C CeCILL Linux
KNIME Software for organizing bioinformatic workflows Workflow GPLv3 Windows
Mac OS X
Linux
Knime4Bio custom nodes for the interpretation of Next Generation Sequencing data with KNIME. Genomics
Gene annotation retrieval
Mutations and regulatory sites
KNIME Java GPLv3 any
Krona Krona creates interactive HTML5 charts of hierarchical data (such as taxonomic abundance in a metagenome). Metagenomics Visualization Interactive
Animation
HTML5 canvas graphics
Javascript
Perl
Linux
UNIX
Mac OS X
Lab7 Data workflow management platform to streamline NGS analyses Genomics Workflow
Pipeline Management
Sample Tracking
Protocol Management
Python
Javascript
Commercial Mac OS X
Linux
Lasergene Lasergene is a comprehensive DNA and protein sequence analysis software suite comprised of seven applications which include functions ranging from sequence assembly and SNP detection, to automated virtual cloning and primer design. Alignment
De novo sequencing
De-novo assembly
Genomics
InDel discovery
Integrated solution
Mapping
Phylogenetics
Protein structure analysis
Read alignment
SNP discovery
Sequence analysis
Transcription Factor Binding Site identification
Alignment
Alignment Analysis
Annotation
Assembly
Chromatogram viewer
Colorspace
Sequence analysis
Integrated Solution
Mapping
PCR Primer Design
Paired End
Scaffolding
Commercial Windows
Mac OS X
LAST Short read alignment program incorporating quality scores Genomics
Comparative genomics
Alignment C++ GPLv3
LASTZ A tool for (1) aligning two DNA sequences, and (2) inferring appropriate scoring parameters automatically Genomics Mapping
Alignment
Mac OS X
Linux
LobSTR lobSTR is an alignment and genotyping tool for profiling short tandem repeats from next generation sequencing data Sequencing Profiling short tandem repeats from short reads Fast
Scalable
sequence alignment
Gapped alignment
C++
R
Python
Freeware UNIX
LOCAS LOCAS low-coverage short-read assembler Assembly C++ Linux
LookSeq AJAX-based browser for deep sequencing data Assembly visualization
MACS Model-based Analysis of ChIP-Seq data. ChIP-Seq Peak calling Python Artistic License platform-independent
MagicViewer Large-scale short reads and sequencing depth visualization. De novo sequencing
Targeted resequencing
Visualization
Genetic variation annotation
Java platform-independent
MapDamage Identifies and quantifies DNA damage patterns in ancient DNA Ancient DNA
DNA-Seq
Quality Control
Statistical Modelling
Python
R
Linux
Mac OS X
MapNext MapNext provides four mainly analysis: (i) unspliced alignment and clustering of reads, (ii) spliced alignment of transcriptomic reads, (iii) SNP detection and calculation of SNP frequency from population sequences, and (iv) storage of result data into database to make it available for more flexible query and further analyses. SNP discovery
RNA-Seq Alignment
Alignment C++
Perl
Mapsembler Mapsembler is a targeted assembly software. It takes as input a set of NGS raw reads and a set of input sequences (starters). It first determines if each starter is read-coherent, e.g. whether reads confirm the presence of each starter in the original sequence. Then for each read-coherent starter, Mapsembler outputs its sequence neighborhood as a linear sequence or as a graph, depending on the user choice. Metagenomics
Transcriptomics
DNA-Seq
RNA-Seq Quantitation
Targeted assembly
Assembly
Micro assembly
Mapping
De novo assembly
Identify Novel Exons
Remove contaminants
Detect enzymes in metagenomics NGS reads
C CeCILL Linux
MapSplice We introduce a second generation splice detection algorithm, MapSplice, whose focus is high sensitivity and specificity in the detection of splices as well as CPU and memory efficiency. MapSplice can be applied to both short (<75 bp) and long reads (75 bp). MapSplice is not dependent on splice site features or intron length, consequently it can detect novel canonical as well as non-canonical splices. MapSplice leverages the quality and diversity of read alignments of a given splice to increase accuracy. RNA-Seq Alignment Mapping C++
Python
Linux
MapView Visualization of short reads alignment on desktop computer Visualization Linux
Windows
MAQ Mapping and Assembly with Qualities (renamed from MAPASS2). Particularly designed for Illumina-Solexa 1G Genetic Analyzer, and has preliminary functions to handle ABI SOLiD data. Genomics
SNP discovery
Mapping C++
Perl
GPL
MAQGene Complete pipeline for mutant discovery, with web front end SNP discovery Mapping
Integrated Solution
MARGARITA SNP discovery and genotyping from low-coverage sequencing data SNP discovery
Genotyping
Mason A fast, feature-rich and hackable read simulator for the simulation of NGS and Sanger data. Genomics Simulation
Assembly
Mapping
Empirical or simple model for position dependent errors
can write out sample position and extensive information about the sampled infix
haplotype simulation through mutation of reference sequence.
C++ GPLv3 UNIX
Windows
Mauve Mauve Genome Alignment software, for comparing two or more draft or finished genomes Genomics
Transcriptomics
Sequence alignment comparison
Visualization
Assembly quality evaluation
C++
Java
GPL Mac OS X
Windows
Linux
MAXIMUS Hybrid reference and de novo assembly pipeline Genomics Hybrid assembly
MAYDAY Extensible platform for visual data exploration and interactive analysis and provides many methods for dissecting complex transcriptome datasets. RNA-Seq Visualization
Meerkat Meerkat is designed to identify structure variations (SVs) from paired end high throughput sequencing

data. It predicts SVs from discordant read pairs (pairs that mapped to reference genome in unexpected way). Then it looks for reads that cover the predicted breakpoints junctions (split read support), refines breakpoints by local alignments and predicts mechanisms that SVs are formed. It is more sensitive, with remapping of unmapped and partially mapped reads, especially when the insert size of sequencing library is small (i.e. read lenght of 100bp and insert size is 200bp), since the SV breakpoint has to be inbetween the paired end reads to form discordant read pair. With discordant read pair, split read support and some filtering steps, it has low false positive rate. It can also take into account of reads from repetitive regions (non-uniquely mapped), combine discordant read pair clusters to predict

complex events, and select the most supported and smallest events.
Structural variation Structural variation discovery
MEGAN Metagenome Analysis Software - MEGAN (“MEtaGenome ANalyzer”) is a new computer program that allows laptop analysis of large metagenomic datasets. In a preprocessing step, the set of DNA reads (or contigs) is compared against databases of known sequences using BLAST or another comparison tool. MEGAN can then be used to compute and interactively explore the taxonomical content of the dataset, employing the NCBI taxonomy to summarize and order the results. Metagenomics metagenomic analysis
functional classification
Commercial
Freeware
Megraft Megraft is a software tool to graft ribosomal small subunit (16S/18S) fragments from metagenomes onto full-length SSU sequences, enabling accurate diversity estimates from fragmentary and non-overlapping sequence data. Metagenomics
Phylogenetics
Sequence analysis
Community analysis
Rarefaction
Hidden Markov Model
Sequence analysis
Perl GPLv3 Linux
UNIX
Mac OS X
Meraculous De novo genome assembler from short reads Assembly De novo assembly
scaffolding
Perl
C
METAGENassist User-friendly, web-based analytical pipeline for comparative metagenomic studies. Input can be derived from either 16S rRNA data or NextGen shotgun sequencing. Metagenomics Visualization
Statistics
Clustering
Machine Learning
Easy-to-use point-and-click web interface; data visualization; publication-quality graphs and charts; wide variety of statistical methods; taxon-to-phenotype mapping; data filtering and normalization; supports many common input formats
MetaSim The software can be used to generate collections of synthetic reads. Metagenomics
Genomics
Simulation
Assembly
Mapping
Java Commercial
Freeware
Metaxa Metaxa uses Hidden Markov Models to identify, extract and classify small-subunit (SSU) rRNA sequences (12S/16S/18S) of bacterial, archaeal, eukaryotic, chloroplast and mitochondrial origin in metagenomes and other large sequence sets Metagenomics
Phylogenetics
Sequence analysis
Community analysis
Hidden Markov Model
Sequence analysis
Perl GPLv3 Linux
UNIX
Mac OS X
MethMarker MethMarker facilitates the design of DNA methylation assays for COBRA, bisulfite SNuPE, bisulfite pyrosequencing, MethyLight and MSP. It also implements a systematic workflow for design, optimization and (computational) validation of DNA methylation biomarkers. This workflow starts from a preselected differentially methylated region (DMR) and results in an optimized DNA methylation assay that is ready to be tested in a large-scale clinical trial. Epigenomics
DNA methylation
Java Windows
Linux
Mac OS X
Solaris
Methpipe The MethPipe software package is a computational pipeline for analyzing bisulfite sequencing data (BS-seq, WGBS and RRBS). MethPipe provides tools for mapping bisulfite sequencing read and estimating methylation levels at individual cytosine sites. Additionally, MethPipe includes tools for identifying higher-level methylation features, such as hypo-methylated regions (HMR), partially methylated domains (PMD), hyper-methylated regions (HyperMR), and allele-specific methylated regions (AMR). Epigenomics
DNA methylation
Bisulfite Sequencing
Bisulfite mapping C++ GPL (>= 3) Linux
Mac OS X
MethylCoder Pipeline for fast, simple processing of BiSulfite-treated reads into methylation data. Includes scripts for analysis and visualization. In addition to a binary output, the direct output of methylcoder is a text file that indicates per-nucleotide methylation context (CG/CHG/CHH) and methylation levels (both coverage and C-T conversions) Genomics
Sequencing
DNA methylation
Epigenomics
Mapping
Bisulfite mapping
Python
C
BSD Linux
Linux 64
Mac OS X
MetMap Produces corrected site-specific methylation states from MethylSeq experiments and annotates unmethylated islands across the genome. DNA methylation
MeV Visualization of genomic data, Differential Gene Expression based on DEGseq, DESeq and edgeR RNA-Seq Clustering
Visualization
Classification
Differentially expressed gene identification
Artistic License
MG-RAST MG-RAST is a fully-automated service for annotating metagenome samples providing analysis tools for comparison Metagenomics
Phylogenetics
Metabolic reconstruction
Annotation Perl
C
GO
Javascript
Python
UNIX
MicroRazerS MicroRazerS is a tool optimized for mapping short RNAs onto a reference genome. Mapping C++ Linux
Microsoft Biology Foundation C#/.NET library for biological applications. Programming Library C#
MICSA Combines positional information with information on motif occurrences to better predict binding sites of transcription factors (TFs) ChIP-Seq Motif analysis
Minia De novo assembly of human genomes on a desktop computer De novo assembly Assembly Memory efficient and fast C++ CeCILL Linux
Mac OS X
MIP Scaffolder MIP Scaffolder is a program for scaffolding contigs produced by fragment assemblers using mate pair data. Scaffolding C++
Perl
Linux
MIRA MIRA 3 - Whole Genome Shotgun and EST Sequence Assembler De-novo assembly
SNP discovery
RNA-Seq Alignment
Smith-Waterman
Graph reduction
Learning algorithm
Assembly
Mapping
K-mer analysis
C++ GPL Linux
Mac OS X
UNIX
MiRanalyzer Web-server for identifying and analyzing miRNA in next-gen sequencing experiments MiRNA Annotation of micro RNA
differential expression
Java
Perl
browser based
MiRCat Predicts mature miRNAs and their precursors from an sRNA dataset and a genome. General bioinformatics (pipeline) MiRNA Prediction Detection and prediction of known or novel miRNAs
secondary structure generation
Java Custom Licence Linux 64
Windows
Mac OS X
MiRDeep Discovering known and novel miRNAs from deep sequencing data MiRNA Perl
MiRNAkey A software pipeline for the analysis of microRNA Deep Sequencing data MiRNA Java
Perl
Linux
Mac OS X
MiRProf Determines normalised expression levels of sRNAs matching known miRNAs in miRBase. General bioinformatics (pipeline) MiRNA profiling Java Custom Licence Linux 64
Windows
Mac OS X
MiRspring missing Perl
Javascript
GNU
MirTools Web server for microRNA profiling and discovery based on high-throughput sequencing Small RNA transcriptome
MiRNA
Perl
PHP
MISO An alternative to Cufflinks, MISO (Mixture-of-Isoforms) is a probabilistic framework that quantitates the expression level of alternatively spliced genes. RNA-Seq Quantitation
RNA-Seq
Alternative Splicing
Mlgt Processing and analysis of high throughput, long-read (e.g. Roche 454) sequences generated from multiple loci and multiple biological samples. Sequences are assigned to their locus and sample of origin, aligned and trimmed. Where possible, genotypes are called and variants mapped to known alleles. Genotyping
Targeted resequencing
Resequencing
Sequence analysis
Error correction
Filtering
Sample Barcoding
Pooled samples
Read Alignment
Sequence assignment
sequence alignment
allignment error correction
variant counting
genotype calling
allele-matching
R GPL >=2 Windows
UNIX
Mac OS X
MMSEQ Pipeline and methodology for simultaneously estimating isoform expression and allelic imbalance in diploid organisms using RNA-seq data. Allele-specific transcription C++ Mac OS X
Linux 64
MochiView Hybrid genome browser and motif visualization/analysis/management desktop software. Genomics
ChIP-Seq
ChIP-on-chip
RNA-Seq
Motif analysis
Genome browser
Motif analysis
Desktop hybrid genome browser and motif visualization/analysis software Java Linux
Mac OS X
Windows
MoDIL Program to detect small indels in next generation sequencing data Genomics
InDel discovery
Python
MOM Short-read mapping Genomics Mapping
MOSAIK Reference guided aligner/assembler. Assembly
Colorspace
C++ Commercial
GPLv2
Windows
Linux
Mac OS X
MPscan MPscan (multi-pattern scan) is a program for mapping short reads (<30bp) exactly on a set of reference sequences (eg, a genome) without indexing the reference. MPscan performs only exact mapping (no substitution, nor indels), is fast (optimal complexity), and easy to use. Genomics
Transcriptomics
Mapping C++ Linux
Mac OS X
MrBayes "MrBayes is a program for Bayesian inference and model choice across a wide range of phylogenetic and evolutionary models. MrBayes uses Markov chain Monte Carlo (MCMC) methods to estimate the posterior distribution of model parameters." Phylogenetics Statistical Modelling Cross-Platform
MrCaNaVaR mrCaNaVaR is a copy number caller that analyzes the next-generation sequence mapping read depth to discover large segmental duplications and deletions. It also has the capability of predicting absolute copy numbers of genomic intervals. Genomics
Personal genomics
Copy number estimation
Read depth analysis C Commercial
Freeware
POSIX
MrFAST mrFAST is designed to map short reads generated with the Illumina platform to reference genome assemblies; in a fast and memory-efficient manner. Genomics Read Alignment
Mapping
C BSD UNIX
MrsFAST mrsFAST is a micro-read substitution-only Fast Alignment Search Tool. mrsFAST is a cache-oblivous short read mapper that optimizes cache usage to get higher performance. Genomics Read Alignment
Mapping
C BSD UNIX
MTR Metagenomics software for clustering at multiple ranks. Metagenomics C++
Matlab
MU2A Genomic variant annotation tool SNP Annotation Java Apache License 2.0 Windows
Linux
Mac OS X
MUMmer MUMmer is a modular system for the rapid whole genome alignment of finished or draft sequence. Basically it is a ultra-fast alignment of large-scale DNA and protein sequences Genomics
Transcriptomics
Alignment
Mapping
Artistic License Linux
MUMmerGPU MUMmerGPU is a low cost, ultra-fast sequence alignment program designed to handle the increasing volume of data produced by HTS. Genomics
Transcriptomics
Alignment
GPU
MuMRescueLite Probabilistically reincorporates multi-mapping tags into mapped short read data. Genomics
ChIP-Seq
Mapping Python MIT
MuSICA 2 Assembles millions of short (36-nucleotide) reads collected from a single flow cell lane of Illumina Genome Analyzer to shotgun-sequence ~800 human full-length cDNA clones. Clone verification Assembly Perl
Mutascope Mutascope is a software suite designed to analyze data from high throughput sequencing of PCR amplicons, with an emphasis on normal-tumor comparison for the accurate and sensitive identification of low prevalence mutations. Cancer biology Somatic variant calling
Analysis Pipeline
Perl UNIX
MuTect MuTect is a method developed at the Broad Institute for the reliable and accurate identification of somatic point mutations in next generation sequencing data of cancer genomes. SNP calling
Myrialign Software to align short reads produced by a short read genome sequencer to a reference genome. Alignments can contain any number of SNPs, insertions and deletions, up to a user specified cutoff. Myrialign can use a Cell Broadband Engine processor to accelerate alignments if available, for example on a PlayStation 3 running GNU/Linux.

Myrialign performs brute force alignment using a variant on the "bitap" algorithm that aligns several thousand reads to a reference in parallel. It uses bit-parallelism, multiple processors, and Cell SPUs if available.

Unlike other reference genome alignment software, heuristics and hashtable lookups are not used. Myrialign will find alignments with any number of errors up to a user specified cutoff. The emphasis is on doing a 100% accurate search as fast as is possible.
Mapping
Alignment
GPU
Myrna Myrna is a cloud computing tool for calculating differential gene expression in large RNA-seq datasets. RNA-Seq Quantitation
RNA-Seq Alignment
Hadoop
MapReduce
Mzip Reference-based sequence data compression tool Data compression
NarrowPeaks Analysis of variation in ChIP-seq using functional PCA ChIP-Seq Peak calling
Differential Binding
R/Bioconductor package
can run on major computer platforms
R Artistic-2.0 Linux
Mac OS X
Windows
NCBI Genome Workbench "NCBI Genome Workbench is an integrated application for viewing and analyzing sequence data. With Genome Workbench, you can view data in publically available sequence databases at NCBI, and mix this data with your own private data." Whole Genome Resequencing Analysis
Next Generation Sequencing
Sequence annotation
Sequence analysis
Visualization
Genome browser
Cross-Platform
Nesoni Nesoni is a high-throughput sequencing data analysis toolset. RNA-Seq Alignment
SNP discovery
Phylogenetics
Alignment largely for bacterial genomes Python
Newbler The assembly/mapping program developed by 454 Life Sciences for of 454 data De-novo assembly Assembly
Mapping
C++ Unknown Linux 64
Nexalign Nexalign is a program to align millions of short reads from next-generation sequencing data sets to reference genomes Mapping C++
R
GPL UNIX
NextGen Utility Scripts A collection of links to scripts available for working with data generated by new sequencing technologies. A collection of many different scripts
NextGENe de novo and reference assembly of Roche/454, Illumina and SOLiD data. Uses a novel Condensation Assembly Tool approach where reads are joined via "anchors" into mini-contigs before assembly which reduces sequencing errors. Requires Win or MacOS. De novo sequencing
Metagenomics
SNP discovery
InDel discovery
Targeted resequencing
Unique condensation tool
Data Visualisation
very flexible
C++ Commercial Windows
Ngs backbone ngs_backbone is a bioinformatic application created to work on sequence analysis by using NGS (Next Generation Sequencing) and sanger sequences. It is capable of cleaning reads, do de novo assembly or mapping against a reference and annotate SNPs, SSRs, ORFs, GO terms and sequence descriptions. SNP discovery
Genomics
Mapping
Assembly
AGPL UNIX
NGS-DesignTools Tools to assist in designing deep sequencing experiments for haplotype reconstruction and structural variant breakpoint detection Structural variation
RNA-Seq Quantitation
Haplotype reconstruction
Simulation
Ngs-pipeline Complete solution for human re-sequencing projects Personal genomics
Epigenomics
Structural variation
Mapping
InDel discovery
SNP calling
Sequence annotation
Perl GPLv3 Linux
Ngs.plot ngs.plot is a program that allows you to easily visualize your next-generation sequencing (NGS) samples at functional genomic regions. The signature advantage of ngs.plot is that it collects a large database of functional elements for many genomes. A user can ask for a functionally important region to be displayed in one command. It handles large sequencing data efficiently and has only modest memory requirement. A web-based version (integrated into Galaxy) is also available for the ones who are allergic to terminals. Epigenomics
Transcriptomics
Visualization
Database
Data Visualisation R
Python
GPL (>= 3) All
NGSUtils NGSUtils is a suite of software tools for working with next-generation sequencing datasets Genomics
Transcriptomics
Filtering
QC
Read pre-processing
Variant Calling
Format conversion
Python GPL Linux
Mac OS X
NGSView High-throughput sequencing technologies introduce novel demands on tools available for data analysis. We have developed NGSView, a generally applicable, flexible and extensible next-generation sequence alignment editor. The software allows for visualization and manipulation of millions of sequences simultaneously on a desktop computer, through a graphical interface. NGSView is available under an open source license and can be extended through a well documented API. Genomics Assembly visualization
NOISeq Next Generation Sequencing (NGS) technologies are increasingly being used for gene expression pro�filing as a replacement for microarrays. The expression level given by these technologies is the number of reads in the library mapping to a given feature (gene, exon, transcript, etc.), i.e., the read counts. Most of the statistical methods for assessment of differential expression using count data rely on parametric assumptions about the distribution of the counts (Poisson, Negative Binomial, …). Moreover, many of them need replicates to work and tend to have problems to evaluate differential expression in features with low counts.

NOISeq is a non-parametric approach for the identification of differentially expressed genes from count data. NOISeq empirically models the noise distribution of count changes by contrasting fold-change differences (M) and absolute expression differences (D) for all the features in samples within the same condition. This reference distribution is then used to assess whether the M-D values computed between two conditions for a given gene is likely to be part of the noise or represent a true differential expression.

The are two variants of the method: NOISeq-real uses replicates, when available, to compute the noise distribution and, NOISeq-sim simulates them in absence of replication. It should be noted that the NOISeq-sim simulation procedure assimilates to technical replication and does not reproduce biological variability, which is necessary for population inferential analysis.
Differential Expression
NovelSeq A computational framework to discover the content and location of long novel sequence insertions using paired-end sequencing data Structural variation
InDel discovery
Mapping
Assembly
Variant Calling
C BSD UNIX
Novocraft Novoalign is a program for mapping short reads from the Illumina/SOLiD sequencing platform(s) to a reference genome. Genomics
Whole Genome Resequencing
RNA-Seq Alignment
ChIP-Seq
MiRNA
Mapping Bisulfite sequencing
Mate-pair/jumping libraries
parallel execution
insertions/deletions
SAM format output
paired-end
colourspace
MPI
C++ Commercial
Freeware
Mac OS X
Linux 64
NPS Identify nucleosome positions given histone-modification ChIP-seq or nucleosome sequencing at the nucleosome level. Epigenomics
ChIP-Seq
Python
NucleR nucleR is a R/Bioconductor package for working with tiling arrays and next generation sequencing. It uses a novel aproach in this field which comprises a deep profile cleaning using Fourier Transform and peak scoring for a quick and flexible nucleosome calling ChIP-on-chip
ChIP-Seq
Nucleosome Positioning
Epigenomics
Annotation
Peak calling
Multicore
Integrated solution
R LGPL3 Cross-Platform
Oases De novo transcriptome assembler for very short reads De novo transcriptome assembly supports strand specific and paired-end RNA-seq data sets C GPLv3
OLego OLego is a program specifically designed for de novo spliced mapping of mRNA-seq reads. OLego adopts a seeding and extension scheme, and does not rely on a separate external mapper. It achieves high sensitivity of junction detection by using very small seeds (12-14 nt), efficiently mapped using Burrows-Wheeler transform (BWT) and FM-index. This also makes it particularly sensitive for discovering small exons. It is implemented in C++ with full support of multiple threading, to allow fast processing of large-scale data. Genomics
RNA-Seq
RNA-Seq Alignment
Mapping
Alignment
capable of using very small seeds for splice mapping
but still fast and accurate
C++ GPLv3 Linux
Linux 64
Mac OS X
Omixon Variant Toolkit Omixon Target Standard, Target HLA and Target Pro are designed to help clinical, diagnostic and research labs to efficiently get the maximum accuracy and precision from their targeted NGS data. Comparative genomics
Mapping
Sequence analysis
Read alignment
InDel discovery
SNP discovery
Alignment
Assembly
Mapping
Colorspace
Basespace
easy to use parameters
full documentation
also a plugin available in CLCbio and Geneious
Commercial
Freeware
interoperable
Optimus Primer Automated primer design for large-scale resequencing by second generation sequencing Resequencing PCR Primer Design
PacBio conversion tools Tools to convert from PacBio HDF5 format to other commonly used formats & libraries to read HDF5 from Java & R Programming Library
Conversion
Java
R
Python
PaCGeE PaCGeE (Parallel Computational Genomics Engine) is a suite of HPC accelerated sequence data analysis tools for assembly and analysis. The tool set comprises of many popular open source and proprietary software for a high performance, high throughput and high quality data analysis. The PaCGeE family of parallel NGS analysis tools are Cloud-MAQ, VELVET-P, EULER, ERANGE, BOWTIE, BFAST, MPI-BLAST, ChIP Seq Peak Finder etc Mapping
Hadoop
Commercial
PALMA We present a novel approach based on large margin learning that combines accurate splice site predictions with common sequence alignment techniques. By solving a convex optimization problem, our algorithm -- called PALMA -- tunes the parameters of the model such that true alignments score higher than other alignments. We study the accuracy of alignments of mRNAs containing artificially generated micro-exons to genomic DNA. In a carefully designed experiment, we show that our algorithm accurately identifies the intron boundaries as well as boundaries of the optimal local alignment. It outperforms all other methods: for 5702 artificially shortened EST sequences from C. elegans and human it correctly identifies the intron boundaries in all except two cases. The best other method is a recently proposed method called exalin which misaligns 37 of the sequences. Our method also demonstrates robustness to mutations, insertions and deletions, retaining accuracy even at high noise levels. RNA-Seq Alignment Alignment
PALMapper Fast and Accurate Spliced Alignments of Sequence Reads. Mapping C++ GPLv3
PanGEA Tool which enables a fast and user-friendly analysis of allele specific gene expression using the 454 technology. RNA-Seq
Allele-specific transcription
SNP discovery
Mozilla Public License
PARalyzer Tool to analyze cross-linking and immunoprecipitation data (CLIP) Java Commercial
Freeware
Partek Genomics Suite Easy to use software providing A to Z analysis for all Next Generation Sequencing and Microarray data. Allele-specific transcription
RNA-Seq Quantitation
Epigenomics
Functional Genomics
ChIP-Seq
Alternative Splicing
SNP discovery
Small RNA transcriptome
PASH Pash 3.0 performs sequence comparison and read mapping and can be employed as a module within diverse configurable analysis pipelines, including ChIP-Seq and methylome mapping by whole-genome bisulfite sequencing Epigenomics
DNA methylation
Alignment
Bisulfite mapping
PASS PASS performs fast gapped and ungapped alignments of short DNA sequences onto a reference DNA, typically a genomic sequence. It is designed to handle a huge amount of reads such as those generated by Solexa, SOLiD or 454 technologies. The algorithm is based on a data structure that holds in RAM the index of the genomic positions of seed" words (typically 11-12 bases) as well as an index of the precomputed scores of short words (typically 7-8 bases) aligned against each other. Alignment C++ Linux
Windows
Patchwork Patchwork is a bioinformatic tool for analyzing and visualizing allele-specific copy numbers and loss-of-heterozygosity in cancer genomes. The data input is in the format of whole-genome sequencing data which enables characterization of genomic alterations ranging in size from point mutations to entire chromosomes.

High quality results are obtained even if samples have low coverage, ~4x, low tumor cell content or are aneuploid.

Patchwork is available in two formats. The first, named simply patchwork, takes BAM files as input whereas patchworkCG takes input from CompleteGenomics files. Detailed guides and information regarding these can be found in their respective tabs.
Structural variation Copy number estimation Allele specific copy numbers. R Linux
Mac OS X
PatMaN Patman searches for short patterns in large DNA databases, allowing for approximate matches. It is optimized for searching for many small pattern at the same time, for example microarray probes. Mapping
PE-Assembler A simple 3' extension approach to assembling paired-end reads and capable of parallelization De-novo assembly Scaffolding C++
PeakAnalyzer PeakAnalyzer is a set of applications for processing ChIP signal peaks. Functional Genomics ChIP-Seq analysis Java
C++
R
PeakRanger A multi-purpose, ultrafast ChIP Seq peak caller ChIP-Seq Peak calling C++ Artistic License Linux
Mac OS X
PeakSeq ChIP-Seq C
Perl
PeakTrace PeakTrace is an alternative basecaller for improving the quality and read length of Sanger DNA sequencing traces. The PeakTrace basecaller works with trace files produced by the ABI 310, 3700, 3100, 3130, 3730, and 3500 DNA sequencers. MegBACE sequencers are also supported. Sequencing Basecaller DNA basecaller C Commercial Windows
Mac OS X
Linux 64
PECAN Alignment method practical for large genomic sequences. Alignment
PEMer The package is composed of three modules, PEMer workflow, SV-Simulation and BreakDB. PEMer workflow is a sensitive software for detecting SVs from paired-end sequence reads. SV-Simulation randomly introduces SVs into a given genome and generates simulated paired-end reads from the ‘novel’ genome. Subsequent analysis with PEMer workflow on the simulated reads can facilitate parameterize PEMer workflow. BreakDB is a web accessible database developed to store, annotate and dsplay SV breakpoint events identified by PEMer and from other sources. Structural variation
PERalign A probabilistic framework is described to predict the alignment to the genome of all paired-end read transcript fragments in a paired-end read dataset. Starting from possible exonic and spliced alignments of all end reads, our method constructs potential splicing paths connecting paired ends. An expectation maximization method assigns likelihood values to all splice junctions and assigns the most probable alignment for each transcript fragment. RNA-Seq Alignment C++ Linux
PerM PerM (Periodic Seed Mapping) uses periodic spaced seeds to significantly improve mapping efficiency for large reference genomes when compared to state-of-the-art programs. Genomics
SNP discovery
Mapping C++ Apache License 2.0 Linux
Phred The phred software reads DNA sequencing trace files, calls bases, and assigns a quality value to each called base. Basecaller C Solaris
IRIX
AIX
Phred Phrap Consed Cross match The phred software reads DNA sequencing trace files, calls bases, and assigns a quality value to each called base. Phrap is a program for assembling shotgun DNA sequence data. Cross_match is a general purpose utility for comparing any two DNA sequence sets using a 'banded' version of swat. Consed/Autofinish is a tool for viewing, editing, and finishing sequence assemblies created with phrap. Alignment
Assembly
Basecaller
Smith-Waterman
Phymm A classifier for metagenomic data, that has been trained on 539 complete, curated genomes and can accurately classify reads as short as 100 base pairs Metagenomics Hidden Markov Model
PiCall Identifies short indel polymorphisms in population sequencing data InDel discovery
Population genetics
C
PICS PICS identifies binding event locations by modeling local concentrations of directional reads, and uses DNA fragment length prior information to discriminate closely adjacent binding events via a Bayesian hierarchical t-mixture model. ChIP-Seq R
PileLine PileLine is a flexible command-line toolkit for efficient handling, filtering, and comparison of genomic position (GP) files produced by next-generation sequencing experiments. PileLineGUI adds a graphical interface. Viewer Java LGPL
Pindel A pattern growth approach to detect break points of large deletions and medium sized insertions from paired end short reads. InDel discovery
Structural variation
Split-read
Mapping
Localized reassembly/realignment
C++ Linux
Mac OS X
Windows
Pipeline Pilot Analysis and workflow development of Next Generation Sequencing and gene expression. Next Generation Sequencing
Gene expression
Sequence analysis
SNP discovery
General bioinformatics
Mapping
De-novo assembly
Sequence analysis
Variant detection
Gene expression analysis
RNA-Seq analysis
ChIP-Seq analysis
Genomics
Comparative genomics
Whole genome resequencing
Sequence alignment
Integrated solution wrapping custom and third party tools for integration
analysis
and reporting
C++
Java
Perl
R
Pilot Script
Commercial Linux
Windows
PIQA PIQA is a quality analysis pipeline designed to examine genomic reads produced by Next Generation Sequencing technology (Illumina G1 Genome Analyzer). It is a set of libraries for R. Sequencing Quality Control R
PoissonSeq Identify differential expressed genes Differential Expression
PolyBayesShort A re-incarnation of the PolyBayes SNP discovery tool developed by Gabor Marth at Washington University. This version is specifically optimized for the analysis of large numbers (millions) of high-throughput next-generation sequencer reads, aligned to whole chromosomes of model organism or mammalian genomes. Developers at Boston College. SNP discovery Linux
Linux 64
PoolHap Computational tool for inferring haplotype frequencies from pooled samples when haplotypes are known. In future version, haplotype unknown analysis will be supported. Mapping
Regression.
PoPoolation Toolbox specifically designed for the population genetic analysis of sequence data from pooled individuals. Population genetics Pooled samples Perl
R
PoPoolation2 PoPoolation2 allows to compare allele frequencies for SNPs between two or more populations and to identify significant differences. PoPoolation2 requires next generation sequencing data of pooled genomic DNA (Pool-Seq). It may be used for measuring differentiation between populations, for genome wide association studies and for experimental evolution. Population genetics
Genomics
Pooled samples Perl
R
Post Assembly Genome Improvement Toolkit " Tools to generate automatically high quality sequence by ordering contigs, closing gaps, correcting sequence errors and transferring annotation. With the advent of next generation sequencing a lot of effort was put into developing software for mapping or aligning short reads and performing genome assembly. For genome assembly the problem of generating a draft assembly (i.e. a set of unordered contigs) has now been very well addressed - but for users who need high quality assemblies for their analyses there are still unresolved issues: this is where PAGIT is used. " De novo assembly De-novo assembly
Quality Control
Assembly quality evaluation
Linux
PRICE PRICE uses paired-read information to iteratively increase the size of existing contigs. Assembly C++
PRINSEQ PRINSEQ is a sequence processing tool that can be used to filter, reformat and trim genomic and metagenomic sequence data. It generates summary statistics of the input in graphical and tabular formats that can be used for quality control steps. PRINSEQ is available as both standalone and web-based versions. Metagenomics
Genomics
Metatranscriptomics
Preprocessing
Filtering
Trimming
Perl GPLv3 UNIX
Mac OS X
Windows
ProbeMatch Matches a large set of oligonucleotide sequences against a genome database using gapped alignments Mapping Linux
Mac OS X
ProbHD We present a new strategy for identifying heterozygous sites in a single individual by using a machine learning approach that generates a heterozygosity score for each chromosomal position. Our approach also facilitates the identification of regions with unequal representation of two alleles and other poorly sequenced regions. The availability of confidence scores allows for a principled combination of sequencing results from multiple samples. Population genetics
SNP discovery
Perl
R
Python
Proxygenes We introduce a clustering method which significantly reduces the size of a metagenome dataset while maintaining a faithful representation of its functional and taxonomic content. Metagenomics Mapping
Annotation
Pybedtools Python extension to BEDTools that allows use of all BEDTools programs directly from Python, as well as feature-by-feature manipulation, automatic handling of temporary files, and more. Genomics Mapping See full description Python GPLv2 Windows (Cygwin)
Linux
Linux 64
Mac OS X
PyroBayes PyroBayes is a novel base caller for pyrosequences from the 454 Life Sciences sequencing machines. SNP discovery Basecaller
PyroMap PyroMap accurately maps pyrosequencing reads onto reference sequences using a selectively weighted Smith-Waterman (SW^2) algorithm to incorporate quality scores into alignment. Mapping Python
PyroNoise Clustering of pyrosequencing (454) data with noise model (AmpliconNoise) and chimaera removal (Perseus) for sequence diversity analysis. Phylogenetics
Metagenomics
QCALL SNP detection and genotyping from low-coverage sequencing data on multiple diploid samples SNP discovery
Qpalma QPalma is an alignment tool targeted to align spliced reads produced by Next Generation sequencing platforms RNA-Seq Alignment Alignment Python
C++
QSeq QSeq is DNASTAR's Next-Gen application for RNA-Seq,ChIP-Seq, and miRNA alignment and analysis. ChIP-Seq
RNA-Seq
MiRNA
Integrated Solution
Alignment
Visualization
Protein Binding Peak Detection
Commercial Mac OS X 10.6 with Parallels Desktop
Windows
QSRA Quality-value guided Short Read Assembler, created to take advantage of quality-value scores as a further method of dealing with error. Compared to previous published algorithms, our assembler shows significant improvements not only in speed but also in output quality. De-novo assembly Assembly
QuadGT QuadGT is a software package for calling single-nucleotide variants in four sequenced genomes: normal-tumor pairs coupled with parents. Genotypes are inferred using a joint model of parental variant frequencies, de novo germline mutations, and somatic mutations. The model quantifies the descent-by-modification relationships between the unknown genotypes by using a set of parameters in a Bayesian inference setting. SNPs SNP calling
Variant Calling
Java
Quake Program to detect and correct errors in DNA sequencing reads. Using a maximum likelihood approach incorporating quality values and nucleotide specific miscall rates, Error correction
QualiMap Qualimap is a platform-independent application written in Java and R that provides both a Graphical User Inteface (GUI) and a command-line interface to facilitate the quality control of alignment sequencing data. Sequence Quality Control
Quality Control
Sequencing Quality Control
QUAST QUAST stands for QUality ASsessment Tool. It evaluates a quality of genome assemblies by computing various metrics and providing nice reports. Quality Control
Genomic Assembly Evaluation
Sequence analysis
Assembly QC
Visualization
Quality Control
Data Visualization
Assembly Quality Evaluation
Detailed Reports
Python
C
Perl
GPLv2 Linux
Mac OS X
QuEST QuEST is a Kernel Density Estimator-based package for analysis of massively parallel sequencing data from chromatin immunoprecipitations (ChIP-Seq or ChIPseq). ChIP-Seq C++ GPLv2
Quip Aggressive compression of FASTQ and SAM/BAM files. Data compression C BSD (3-clause) any
R2R R2R is a simple to use package for very sensitive analysis of short read sequence data obtained by NextGen sequencing techniques. R2R was developed in conjunction with data obtained on the Illumina GA platforms. R2R is written in simple Perl script and runs equally well under MS Windows, Mac OS and Linux/Unix operative systems. SNP discovery Alignment Perl
R453Plus1Toolbox Facilitates analysis of data from 454 sequencer in R/Bioconductor. R
RACA Reference-Assisted Chromosome Assembly (RACA)
RApiD Tools for processing restriction site associated DNA sequencing. SNP discovery Perl
C++
GPLv3
RAPSearch Fast protein similarity search tool for short reads that utilizes a reduced amino acid alphabet and suffix array to detect seeds of flexible length. Metagenomics Alignment C++ GPLv3
RAST "RAST (Rapid Annotation using Subsystem Technology) is a fully-automated service for annotating complete or nearly complete bacterial and archaeal genomes. It provides high quality genome annotations for these genomes across the whole phylogenetic tree." Genomics
Phylogenetics
Annotation
Genomics
Ray de novo genome assembly is now a challenge because of the overwhelming amount of data produced by sequencers. Ray assembles reads obtained with new sequencing technologies (Illumina, 454, SOLiD) using MPI 2.2 -- a message passing inferface standard. De-novo assembly Assembly * MPI 2.2 * ISO/IEC C++ 2003 * de Bruijn * paralleled * Illumina data C++ GPL Linux
POSIX
RazerS RazerS allows the user to align sequencing reads of arbitrary length using either the Hamming distance or the edit distance. The tool can work either lossless or with a user-defined loss rate at higher speeds. Mapping
Read alignment
SWIFT Filter
Myers Bitvector Algorithm
Gapped alignment
paired-end mapping
C++ GPLv3 UNIX
Mac OS X
Windows
RDiff rDiff is an open source tool for accurate detection of differential RNA processing from RNA-Seq data. It implements two statistical tests to detect changes of the RNA processing between two samples. rDiff.parametric is a powerful test, which can be applied for well annotated organisms to detect changes in the relative abundance of isoforms. rDiff.nonparametric is an alternative when the annotation is incomplete or missing. Alternative Splicing
RNA-Seq
Transcriptomics
Alignment
Differential expression
Python
Matlab
Open Source Linux
Mac OS X
RDP Pyrosequencing Pipeline The Ribosomal Database Project's Pyrosequencing Pipeline aims to simplify the processing of large 16s rRNA sequence libraries obtained through pyrosequencing. This site processes and converts the data to formats suitable for common ecological and statistical packages such as SPADE, EstimateS, and R. Metagenomics Alignment
Database submission preparation
Format conversion
browser based
Readaligner A tool for mapping (short) DNA reads into reference sequences. Mapping
ReadDepth Detects copy number aberrations in deep sequencing data Copy number estimation R Apache License 2.0
REAL REad ALigner for Next-Generation sequencing reads Mapping C++ GPLv3 Linux
Reaper Reaper is a program for demultiplexing, trimming and filtering short read sequencing data. Next Generation Sequencing Filtering
Adapter Removal (software)
Trimming
Sample Barcoding
Sequencing Quality Control
QC
Memory efficient and fast. C GPL v3 Linux
UNIX
Mac OS X
Reconciliator The tool for merging assemblies Assembly Perl Linux
RECOUNT Probabilistic tag count error correction for next generation sequencing data (Solexa/Illumina). RNA-Seq Quantitation Expectation Maximization C++ GPL Linux
RefCov WashU Reference Coverage tool for analyzing the depth, breadth, and topology of sequencing coverage Copy number estimation
Repitools Toolbox of procedures to interrogate and visualize epigenomic data. Part of BioConductor ChIP-Seq
ChIP-on-chip
Sequencing Quality Control
Visualization
Methylation Calling
Statistical testing
R LGPL
Reptile A new algorithm for short read error correction that harvests information from k-spectrum and read decomposition Genomics Sequencing Quality Control C++ GPL Boost
ReSeqSim A simulation toolbox that will help us optimize the combination of different technologies to perform comparative genome re-sequencing, especially in reconstructing large structural variants (SVs). Structural variation Mapping
Simulation
RGA Reference-guided assembler SNP discovery Assembly
RiboPicker riboPicker is a publicly available tool that is able to automatically identify and efficiently remove rRNA-like sequences from metatranscriptomic and metagenomic datasets. riboPicker is available as both standalone and web-based versions. Metagenomics
Genomics
Metatranscriptomics
Preprocessing
RRNA filtering
Perl
C
GPLv3 UNIX
Mac OS X
RMAP Assembles 20 - 64 bp Solexa reads to a FASTA reference genome. By Andrew D. Smith and Zhenyu Xuan at CSHL. (published in BMC Bioinformatics). POSIX OS required. DNA methylation Mapping
Bisulfite mapping
GPLv3 Linux
Mac OS X
RNA A randomized Numerical Aligner for Accurate alignment of NGS reads Read alignment Mapping
Hash Table Based
Fast
Accurate
C++ GPL v3 Linux
UNIX
Windows
Mac OS X
RNA-MATE A recursive mapping strategy for high-throughput RNA-sequencing data. RNA-Seq Alignment
RNA-Seq Quantitation
Colorspace
RNASEQR a streamlined and accurate RNA-seq sequence analysis program Alternative Splicing Read mapping
Rnnotator Automated software pipeline that generates transcript models by de novo assembly of RNA-Seq data without the need for a reference genome De novo transcriptome assembly Commercial
Freeware
RobiNA RobiNA is a Java GUI that enables the user to graphically call differentially expressed genes. For read mapping it relies on bowtie and for the differntial expression analysis it builds on an R backbone running DESeq and edgeR. RNA-Seq Differentially expressed gene identification Trimming
differential expression
graphical display
Java
R
GPL Windows
Linux
Mac OS X
Rolexa Allows fast and accurate base calling of Solexa's fluorescence intensity files and the production of informative diagnostic plots. Sequencing Basecaller R
RSAT peak-motifs A workflow combining a series of time- and memory-efficient motif analysis tools to extract motifs from full-size collections of peaks as generated by ChIP-seq, ChIP-chip or other ChIP-X technologies. ChIP-Seq
Regulatory genomics
Epigenomics
Motif discovery
Motif scanning
Motif comparison
Perl
CGI
Python
C
Commercial
Freeware
UNIX
Mac OS X
Linux
RSEM We present a generative statistical model and associated inference methods that handle read mapping uncertainty in a principled manner. Through simulations parameterized by real RNASeq data, we show that our method is more accurate than previous methods. Our improved accuracy is the result of handling read mapping uncertainty with a statistical model and the estimation of gene expression levels as the sum of isoform expression levels. RNA-Seq Alignment
RNA-Seq Quantitation
C++
RSEQtools RSEQtools includes a format specification for RNA-Seq data that provides confidentially-aware; data summaries as well as several tools for performing common analyses: expression measurements (e.g. RPKMs), creation of signal tracks, segmentation, annotation manipulations, etc. RNA-Seq Quantitation C Creative Commons - Attribution; Non-commercial 2.5 Mac OS X
UNIX
Linux
Rsolid Rsolid implements a version of the quantile normalization algorithm that transforms the intensity values before calling colors Colorspace
Basecaller
R
C
Rsubread Rsubread is Bioconductor R package, which provides facilities to performing read alignments using the Subread aligner. It also includes other functionalities such as featureCounts read summarization function. Next-generation sequencing Read mapping
Read summarization
Quality assessement
R
C
GPL v3 Linux 64
Mac OS X; x86 64
Mac OS X
RTG Investigator Comprehensive analysis pipelines powered with unique mapping speed and sensitivity deliver deep genomic analysis in variant detection and metagenomic applications with Illumina, Ion Torrent, Complete Genomics and Roche 454 data sets. Exome and whole genome variant detection
Metagenomics
SNP discovery
InDel discovery
Mapping
Alignment
Translated nucleotide search
K-mer analysis
Species frequency estimation
Contaminant filtering
Read depth analysis
Commercial
Freeware
Linux
Mac OS X
Windows
RUbioSeq RUbioSeq has been developed to facilitate the primary and secondary analysis of resequencing projects by providing an integrated software suite of parallelized pipelines to detect exome variants (SNVs and CNVs) and to perform Bisulfite-seq analyses automatically. RUbioSeq's variant analysis results have been already validated and published. AVAILABILITY: http://rubioseq.sourceforge.net/ Exome analysis
Copy number estimation
Bisulfite Sequencing
Somatic variant calling Perl UNIX
S-MART S-MART manages your RNA-Seq and ChIP-Seq data. RNA-Seq
ChIP-Seq
Python
Java
Linux
Mac OS X
Windows
SAMMate GUI for processing SAM/BAM and BED files. The software allows users to accurately estimate gene expression scores using short reads originating from both exons and exon-exon junctions, to generate wiggle files for visualization in UCSC genome browser, and to generate an alignment statistics report. RNA-Seq Quantitation Sequence analysis Java GPLv3 Windows
Mac OS X
Samscope Samscope is a lightweight SAM/BAM file viewer that makes visually exploring next generation sequencing data as intuitive as Google Maps. Samscope uses multiple layers to simultaneously (or sequentially) view SAM/BAM related features like coverage or allele frequency, or ChIP-SEQ features like polarity from as many files as you like. The paging-friendly binary file layout makes it feasible to browse data sets far larger than the system's available RAM. ChIP-Seq
RNA-Seq
Genomics
Visualization
Read mapping
SAMtools
C++ AGPL POSIX
Linux
SAMStat SAMStat is an efficient C program for displaying statistics of large sequence files. Sequencing Quality Control C GPLv3 UNIX
SAMtools Various utilities for processing alignments in the SAM format, including variant calling and alignment viewing. SNP discovery Simulation
Programming Library
Assembly visualization
Integrated solution
API
C MIT
Savant Genome Browser Savant is a genome browser which combines visualization of HTS and other genome-based data with powerful analytic tools. Genomics Visualization
Viewer
Alignment viewer
Plugin framework
Bookmarking
Table View
fast
memory efficient
Java Apache License 2.0 Windows
Linux
all supporting JVM
Mac OS X
Scaffolder Edit your genome sequence using a simple human readable syntax. Manage contig positions and add inserts all in a plain text file. Scaffolding Ruby MIT Linux
Mac OS X
SCALCE SCALCE (skeɪlz) is fast FASTQ compression utility that utilizes locally consistent parsing for better compression rate. It achieves around 2X more compression than gzip alone. Genomics Data compression FASTQ file compression C Linux 64
Linux
SCARF Scaffolded and Corrected Assembly of Roche 454 (SCARF) is a next-generation sequence assembly tool for evolutionary genomics that is designed especially for assembling 454 EST sequences against high-quality reference sequences from related species. Assembly GPLv3
Scripture Tool for assembling transcriptome from paired-end Illumina RNA-Seq data RNA-Seq Alignment
SEAL Read mapper and duplicate remover. Mapping
Hadoop
Python
C++
Java
GPLv3
SEECER Error correction for RNA-Seq data RNA Seq analysis supports multicore processors C
SEED Tool to cluster sequence reads prior to assembly or other operations. Metagenomics Clustering C++ Mac OS X
Linux
Windows
Segemehl Map short reads to known genome with tolerance for mismatches and indels using suffix arrays for high accuracy matching Genomics Mapping fast
precise
low cost for high-error matching
C
C++
Segtor A software tool to annotate large sets of genomic coordinates, intervals, SNVs, indels and translocations with respect to known genes. SNP Annotation Annotation SNP annotation Perl
C
Non-commercial Linux
Mac OS X
Seq2HLA seq2HLA is a computational tool to determine Human Leukocyte Antigen (HLA) directly from existing and future short RNA-Seq reads. It takes standard RNA-Seq sequence reads in fastq format as input, uses a bowtie index comprising known HLA alleles and outputs the most likely HLA class I and class II types, a p-value for each call, and the expression of each class. Transcriptomics Mapping
Read alignment
HLA typing
Python
R
Unix
Mac OS X
SeqAn C++ template library with many sequence analysis algorithms and datastructures. Sequence analysis
Genomics
Phylogenetics
Programming Library C++ BSD (3-clause) UNIX
Mac OS X
Windows
SeqBuster SeqBuster, a web-based bioinformatic tool offering a custom analysis of deep sequencing data at different levels, with special emphasis on the analysis of miRNA variants or isomiRs and the discovering of new small RNAs. Small RNA transcriptome
MiRNA
Mapping
Annotation
Annotation and detection of miRNAs and other small RNAs Java
R
Commercial
Freeware
Mac OS X
Linux
SeqCons SeqCons is an open source consensus computation program for Linux and Windows. The algorithm can be used for de novo and reference-guided sequence assembly. Assembly Linux
Windows
SeqEM Genotype-calling algorithm that estimates parameters underlying the posterior probabilities in an adaptive way rather than arbitrarily specifying them a priori. The algorithm applies the well-known EM algorithm to an appropriate likelihood for a sample of unrelated individuals with next-generation sequence data, leveraging information from the sample to estimate genotype probabilities and the nucleotide-read error rate. SNP discovery Expectation Maximization
SeqGSEA Gene Set Enrichment Analysis (GSEA) of RNA-Seq Data: integrating differential expression and splicing Biomedical Sciences
Genomics
RNA-Seq
Statistics
Functional analysis
Gene set enrichment analysis
integrative analysis R GPL (>= 3) any
SeqMan NGen Sequence assembly software using traditional, next-gen, and third-gen techonologies. Subsequent analysis of the assembly, including SNP discovery, coverage evaluation and consensus annotation is provided through full integration with Lasergene. Genomics
De-novo assembly
De novo transcriptome assembly
Whole Genome Resequencing
SNP discovery
InDel discovery
ChIP-Seq
RNA-Seq Alignment
Mapping
Assembly
Alignment
Paired End
Commercial Windows
Mac OS X
Linux
SeqMap SeqMap is a tool for mapping large amount of short sequences to the genome. Mapping command line
parallel execution
Mac OS X
Windows
Linux
SeqMINER seqMINER is an integrated portable ChIP-seq data interpretation platform with optimized performances for efficient handling of multiple genomewide datasets. seqMINER allows comparison and integration of multiple ChIP-seq datasets and extraction of qualitative as well as quantitative information. seqMINER can handle the biological complexity of most experimental situations and proposes supervised methods to the user in data categorization according to the analysed features. In addition, through multiple graphical representations, seqMINER allows visualisation and modelling of general as well as specific patterns in a given dataset. Moreover, seqMINER proposes a module to quantitatively analyse correlations and differences between datasets. ChIP-Seq Java GPLv3 platform-independent
SeqMonk A tool to visualise and analyse high throughput mapped sequence data Genomics
Epigenomics
Visualization
Assembly visualization
Statistical testing
Alignment viewer
Genome Viewer
Data Visualisation
Data Quantitation filtering and analysis
Java GPLv3 Windows
Mac OS X
Linux
SeqPrep Strips adapters and optionally merges overlapping paired-end (or paired-end contamination in mate-pair libraries) illumina style reads. Genomics
De-novo assembly
Merges overlapping paired-end reads
strips adapters off of reads.
C MIT POSIX
SeqSaw A package for mapping of spliced reads and unbiased detection of novel splice junctions from RNA-seq data. RNA-Seq
Alternative Splicing
Mapping
Alignment
Short Spliced Sequence Mapping Tool C++ GPL Linux
SeqSeg An algorithm to identify chromosomal breakpoints using massively parallel sequence data Copy number estimation Matlab
SeqSite SeqSite is an efficient and easy-to-use software tool implementing a novel method for identifying and pinpointing transcription factor binding sites. It first detects transcription factor binding regions by clustering tags and statistical hypothesis testing, and locates every binding site in detected binding regions by modeling the tag profiles. It can pinpoint closely spaced adjacent binding sites from ChIP-seq data. This software is coded in C/C++, and supports major computer platforms. ChIP-Seq Peak calling stand-along software tool
can run on major computer platforms
C
C++
GPL Linux
Mac OS X
Windows
SeqSolve Simple analysis of Next Generation Sequencing data. RNA-Seq
ChIP-Seq
Transcriptomics
Small RNA transcriptome
Alternative Splicing
Novel gene discovery
Differentially expressed gene identification
Quality assessement
User-friendly
Scientifically relevant
Reliable
Scalable
Commercial Windows
Linux
SeqTrim A pipeline for preprocessing sequences. Trimming
Sequedex Sequedex classifies short reads for phylogeny and function at high speed Metagenomics
Phylogenetics
Genomics
Sequence analysis
Sequence annotation
Fast
protein fragments identified
Java
Python
Commercial
Freeware
Linux 64
Mac OS X
SequenceVariantAnalyzer DNA sequence information underpins genetic research, enabling discoversies of important biological or medical benefit. Compared with previous discovery strategies, a whole-genome sequencing study is no longer constrained by differing patterns of linkage disequilibrium, thus, in theory, is more possible to directly identify the gentic variants contributing to biological traits or medical outcomes.

The rapidly evolving high-throughput DNA sequencing technologies have now allowed the fast generation of large amount of sequence data for the purpose of performing such whole-genome sequencing studies, at a reasonable cost. SequenceVariantAnalyzer, or SVA, is a software tool that we have been developing to analyze the genetic variants identified from such studies.

URL: http://www.svaproject.org/
Personal genomics
Genomics
Sequence analysis
Annotation
Genetic variation annotation
Genome browser
Variant annotation and analysis Java Linux 64
Sequencher Desktop alignment software now with plugins to MAQ and GSNAP for NGS sequence date De-novo assembly
SNP discovery
Assembly
Alignment
Bisulfite sequencing
consensus sequence generation and export
SNP/InDel/Read Error display and search
Commercial Windows
Mac OS X
SeqWare SeqWare provides tools designed to support massively parallel sequencing technologies. LIMS
Workflow
Java GPLv3 Linux
SeqWords SeqWords is a featherweight object for the calculation of n-mer word occurrences in a single sequence. K-mer analysis Part of BioPerl Perl Perl artistic licence
SESAME Gnotyping of multiplexed individuals for several markers based on NGS amplicon sequencing. Genotyping
Targeted resequencing
GPLv3 Windows
Linux
SEWAL Processing of deep sequencing data from in vitro selection experiments In vitro selection
Sff2fastq The program 'sff2fastq' extracts read information from a SFF file, produced by the 454 genome sequencer, and outputs the sequences and quality scores in a FASTQ format. Conversion Linux
SGA SGA is a de novo assembler designed to assemble large genomes from high coverage short read data. Assembly C++ GPLv3 Linux
SHARCGS SHARCGS is a suitable tool for fully exploiting novel sequencing technologies by assembling sequence contigs de novo with high confidence and by outperforming existing assembly algorithms in terms of speed and accuracy. De-novo assembly Assembly Perl Linux
SHE-RA The SHE-RA software turns error-prone short reads into Sanger-quality composite reads. Error correction Open Source
Sherman bisulfite-treated Read FastQ Simulator Genomics
Bisulfite Sequencing
DNA methylation
Simulation Perl GPLv3 Linux
Mac OS X
ShoRAH Inference of a population from a set of short reads. The package contains programs that support mapping of reads to a reference genome, correcting sequencing errors by locally clustering reads in small windows of the alignment, reconstructing a minimal set of global haplotypes that explain the reads, and estimating the frequencies of the inferred haplotypes. Metagenomics Haplotype reconstruction
Mapping
GPLv3 Linux
Mac OS X
Shore Analysis suite for short read data. Structural variation
SNP discovery
Mapping Linux
Mac OS X
POSIX
SHOREmap Extension of the short read analysis pipeline SHORE. SHOREmap supports genome-wide genotyping and candidate-gene sequencing in a single step through analysis of deep sequencing data from a large pool of recombinants. Perl
R
GPLv3
ShortFuse Method for using paired-end reads to find fusion transcripts without requiring unique mappings or additional single read sequencing Fusion transcripts C++
Python
ShortRead ShortRead is an R/BioConductor package for input, quality assessment, manipulation, and output of high-throughput sequencing data. R
SHORTY SHORTY is targetted for de novo assembly of microreads with mate pair information and sequencing errors. SHORTY has some novel approach and features in addressing the short read assembly problem.. De-novo assembly Assembly C++
Perl
SHRAP A sequencing protocol and assembly methodology that utilizes high-throughput short-read technologies. De-novo assembly Assembly
SHREC A new algorithm for correcting errors in short-read data that uses a generalized suffix trie on the read data as the underlying data structure Sequencing Quality Control
Error correction
Java
SHRiMP Assembles to a reference sequence. Developed with Applied Biosystem's colourspace genomic representation in mind. Authors are Michael Brudno and Stephen Rumble at the University of Toronto. Works with data in letterspace (Roche, Illumina), colourspace (AB) and Helicos space. Mapping
Colorspace
colourspace
Sibelia Sibelia: A comparative genomic tool: It assists biologists in analysing the genomic variations that correlate with pathogens, or the genomic changes that help microorganisms adapt in different environments. Sibelia will also be helpful for the evolutionary and genome rearrangement studies for multiple strains of microorganisms. Genomics Variant Calling C++ GPL
SICER A clustering approach for identification of enriched domains from histone modification ChIP-Seq data. ChIP-Seq
Epigenomics
Filtering Python
SiLoCo Compares sRNA expression levels in multiple samples by grouping sRNAs into loci based on genomic location General bioinformatics (pipeline) Expression profiling Java Custom Licence Linux 64
Windows
Mac OS X
Sim4cc Cross-species spliced alignment of ESTs to genomes RNA-Seq Alignment
Comparative genomics
Mapping C++ GPLv2 UNIX
Linux
SimNext Sequencing read simulator Simulation Perl
SimSeq Illumina paired-end and mate-pair short read simulator. Used to sample reads from the simulated genome for the first Assemblathon. Genomics Simulation Position and underlying base specific error model. Simulates chimeric mate-pair reads with paired-end contamination and duplicates. Java
C
MIT POSIX
Sissrs Produce a list of peakmaxima from aligned positions. ChIP-Seq Peak calling Perl Linux
UNIX
Skewer Skewer implements a novel dynamic programming algorithm dedicated to the task of adapter trimming and it is specially designed for processing illumina paired-end sequences. Small RNA Sequencing
RNA-Seq
Whole Genome Resequencing
De novo Sequencing
Preprocessing
Adapter Removal (software)
Trimming
multi-threading C++ Linux 64
Slider A new alignment approach that reduces the alignment problem space by utilizing each read base's probabilities given in the Illumina prb files. SNP discovery Mapping Java
SlideSort SlideSort finds all similar pairs from a string pool in terms of edit distance. Using an efficient pattern growth algorithm, SlideSort discovers chains of common k-mers to narrow down the search. Clustering C++ Linux
Windows
SLOPE Detects structural variants from targeted short DNA reads Structural variation
Targeted resequencing
C++
SmashCommunity SmashCommunity is a stand-alone metagenomic annotation and analysis pipeline suitable for data from Sanger and 454 sequencing technologies. Metagenomics
SNIP-Seq Tool for discovering SNPs in population sequencing data SNP discovery Python
Sniper SNP discovery utilizing multi-mapping reads SNP discovery C
Python
UNIX
Mac OS X
SNP-o-matic SNP-o-matic is a fast, memory-efficient and stringent read mapping tool offering a variety of analytical output functions, with an emphasis on genotyping. SNP discovery Mapping C++
C
SNPSeeker Identification of SNPs in pooled genomic samples SNP discovery C
SNVer Variant calling in pooled or individual sequence data. SNP discovery Java Windows
Linux
Mac OS X
SNVMix Detects single nucleotide variants from next generation sequencing data. SNP discovery C MIT
SOAP SOAP (Short Oligonucleotide Alignment Program) is a program for efficient gapped and ungapped alignment of short oligonucleotides onto reference sequences. SOAP2 is an updated program based on Burrows-Wheeler Transform. SNP discovery Alignment
Mapping
Burrows-Wheeler
C++ UNIX
SOAPdenovo SOAPdenovo, a short read de novo assembly tool, is a package for assembling short oligonucleotide into contigs and scaffolds. De-novo assembly Assembly Has a modular structure and comes with a read corrector
an assembly module
a scaffolder and a gap filler
GPLv3 Linux
Mac OS X
SOAPfusion SOAPfusion is a novel tool for fusion discovery with paired-end RNA-Seq reads. The tool follows a different strategy by “finding fusions directly and verifying them”, differentiating it from all other existing tools by “finding the candidate regions and searching for the fusions afterwards”. This enables the fusion discovery process to be more effective and sensitive, also with a specular performance under low coverage of sequencing far more better than other tools. http://soap.genomics.org.cn/SOAPfusion.html Transcriptome
RNA-Seq
Gene fusions discovery. finding fusions directly and verifying them Perl
C++
Linux 64
SOAPsnp SOAPsnp is an accurate consensus sequence builder based on soap1 and SOAPaligner/soap2's alignment output. It calculates a quality score for each consensus base, which can be used for any latter process to call SNPs. SNP discovery C
C++
SOCS SOLiD reference based, un-gapped alignment with bisulfite capability RNA-Seq Alignment
DNA methylation
SNP discovery
Mapping
Bisulfite mapping
coverage
colourspace
Bisulfite sequencing
C++ GPLv3 POSIX
Solas Given gene annotation the major questions addressed by the package are: prediction of alternative exons in a single condition / cell sample, prediction of differential alternative exons between two conditions / cell samples, quantification of alternative splice forms in a single condition / cell sample RNA-Seq Quantitation
Alternative Splicing
R
Sole-Search Determines statistically significant peaks from ChIP experiments ChIP-Seq Java
SolexaQA User-friendly software package designed to generate detailed statistics and at-a-glance graphics of sequence data quality both quickly and in an automated fashion. This package contains associated software to trim sequences dynamically using the quality scores of bases within individual reads. Sequencing Quality Control
Trimming
Quality Processing Algorithm
Runtime Speed
Perl
R
GPLv3 Mac OS X
UNIX
SolexaTools SolexaTools is a project to create a tool set to work with a Solexa genome sequencer. It includes multiple components including a LIMS system, pipeline and other tools to support end-users and researchers setting up a Solexa environment. LIMS Java
SOLID software tools SOLID software tools hosted by Applied Biosystems Commercial
SomaticCall Finds single-base differences (substitutions) between sequence data from tumor and matched normal samples. Somatic mutations
SOPRA Tool designed to exploit the mate pair/paired-end information for assembly of short reads Assembly
Scaffolding
GPL
Spiral Genetics Spiral Genetics provides a novel aligner/variant caller, Anchored Assembly, which can detect large structural variations using short read NGS data with unmatched precision. Alignment
DNA-Seq
Exome and Whole genome variant detection
De novo Assembly
Genomic Assembly
Mapping
Quality Control
Read alignment
Reference assembly
Resequencing
SNP discovery
Sequence analysis
Whole Genome Resequencing
Alignment
De novo Assembly
Data compression
Genome Alignment
MapReduce
Accurate
Fast
Cloud Computing
Mapping
SNP calling
command line
large SV detection
C++ Commercial Linux
Mac OS X
Windows
SpliceGrapher SpliceGrapher is a package for creating splice graphs from RNA-Seq data, guided by gene models and EST data (when available). Alternative Splicing Visualization
SpliceMap Detects splice junctions from RNA-seq data. This method does not depend on any existing annotation of gene structures and is capable of finding novel splice junctions with high sensitivity and specificity. It can handle long reads (50–100 nt) and can exploit paired-read information to improve mapping accuracy. RNA-Seq Alignment Mapping Python
C++
Linux
SplicePlot RNA sequencing has provided unprecedented resolution of alternative splicing and splicing quantitative trait loci (sQTL). However, there are few tools available for visualizing the genotype-dependent effects of splicing at a population level. SplicePlot is a simple command line utility that produces intuitive visualization of sQTLs and their effects. SplicePlot takes mapped RNA sequencing reads in BAM format and genotype data in VCF format as input and outputs publication-quality Sashimi plots, hive plots and structure plots, enabling better investigation and understanding of the role of genetics on alternative splicing and transcript structure Alternative Splicing Alternative Splicing
Visualization
SpliceTrap SpliceTrap is a statistic tool for quantifying exon inclusion ratios in paired-end RNA-seq data. Instead of full transcript quantification, SpliceTrap approaches to exon inclusion level estimation as a Bayesian inference problem. For every exon it quantifies the extent to which it is included, skipped or subjected to size variations due to alternative 3’/5’ splice sites or Intron Retention. Alternative Splicing
RNA-Seq Quantitation
RNA-Seq
Statistics Splicing ratio estimation C++
Perl
UNIX
Linux
SplicingViewer SplicingViewer is an integrated tool developed to enable users to detect the splice junctions, annotate alternative splicing events, and visualization of the patterns of alternative splicing events. RNA-Seq
Genomics
Mapping GUI
command line
Data Visualisation
Java GPL
Commercial
BioLicense
LGPL
BSD License
Linux
Windows
Mac OS X
SPLINTER Identification of indel variants in pooled DNA with spike-in controls InDel discovery
SNP discovery
Pooled samples finds rare indels in pooled samples
error model profiler for sequencing library
C
C++
Commercial
Freeware
SplitSeek de novo prediction of splice junctions in short-read RNA-seq data, suitable for detection of novel splicing events and chimeric transcripts. RNA-Seq Alignment Perl GPL
SPP R-scripts for ChIP-seq analysis. Genomics
ChIP-Seq
Peak calling multi-threading R
SR-ASM SR-ASM algorithm is designed for DNA assembly of the short sequences coming from 454 sequencers. De-novo assembly Assembly C++ Linux
UNIX
SRAdb R tool to query Short Read Archive and download data from it Database R
SRMA SRMA is a short read micro re-aligner for next-generation high throughput sequencing data. SNP discovery
InDel discovery
Localized reassembly/realignment Java
C
GPL
SSA SSA (Signal Search Analysis) is a software package for the analysis of nucleic acid sequence motifs that are postionally correlated with a functional site (e.g a transcription or translation initiation site). Motif analysis Sequence analysis Fortran F77
C
Perl
GPL Linux
Mac OS X
SSAHA SSAHA (Sequence Search and Alignment by Hashing Algorithm) is an algorithm for very fast matching and alignment of DNA sequences. Alignment
Smith-Waterman
Commercial
Freeware
Linux
Mac OS X
SsahaSNP Sequence Search and Alignment by Hashing Algorithm SNP discovery Mac OS X
Linux 64
Linux
Solaris
Compaq Alpha
SSAKE The Short Sequence Assembly by K-mer search and 3' read Extension (SSAKE) is a genomics application for aggressively assembling millions of short nucleotide sequences by progressively searching for perfect 3'-most k-mers using a DNA prefix tree. SSAKE is designed to help leverage the information from short sequences reads by stringently clustering them into contigs that can be used to characterize novel sequencing targets. Authors are René Warren, Granger Sutton, Steven Jones and Robert Holt from the Canada's Michael Smith Genome Sciences Centre. Perl/Linux.

Please note that paired reads need to be in one pseuod fasta line separated by ':'

 >readpair:1000
ACGATAGCTTCG:ACGCGATAGATC
Assembly Perl GPLv2 Linux
SSPACE Stand-alone scaffolder of pre-assembled contigs using paired-read data. Genomics Scaffolding Scaffold contigs using paired reads. Extension of unmapped reads. Visualisation and tracking of contigs on scaffolds. Perl Windows
Linux
Mac OS X
STADEN Includes GAP4, GAP5, SPIN, TREV, and numerous smaller tools. Assembly
Alignment
Visualization
Integrated solution C
C++
Tcl
Fortran
BSD Linux
Windows
Mac OS X
UNIX
Stampy Uses a hybrid mapping algorithm and a detailed statistical model to achieve both speed and sensitivity, particularly when reads include sequence variation. Mapping Python Commercial
Freeware
Linux
Standalone hamming Example software for decoding error-correcting barcodes Sample Barcoding Python
Standardized Velvet Assembly Report A set of scripts and a Sweave report used to iterate through parameters and generate a report on Velvet-generated sequence assemblies Quality Control Visualization R
Perl
GPLv3
STAR Ultrafast universal RNA-seq aligner RNA-Seq
Transcriptome
Sequence alignment to a reference genome C++ GPLv3 Linux
Unix
Mac OS X; x86_64
Strelka Strelka is an analysis package designed to detect somatic SNVs and small indels from the aligned sequencing reads of matched tumor-normal samples. Somatic mutations Somatic variant calling Multicore Perl Linux
Subjunc The Subread read aligner and Subjunc junction detector employ a novel read mapping paradigm called "seed-and-vote" to achieve a fast mapping speed and a high mapping accuracy. The seed-and-vote paradigm is particularly powerful in detecting indels. Subjunc can be used to discover exon-exon junctions from RNA-seq data. It takes Subread less than 20 minutes to map 10 million 100bp reads using one thread. Its running time remains nearly the same when mapping longer reads thanks to the high scalability of the seed-and-vote paradigm. Subread and Subjunc can be used to map reads generated from all major sequencing platforms including Illumina GA/HiSeq, Roche 454, ABI SOLiD and Ion Torrent. They can run on both Linux/unix and Mac computers. Subread and Subjunc were published in Nucleic Acids Research in 2013. Alternative Splicing
Next-generation sequencing
RNA-Seq Alignment
Alternative Splicing
Read Alignment
Subread Subread is a general-purpose read aligner which can be used to map both genomic DNA-seq reads and RNA-seq reads. It uses a new mapping paradigm called "seed-and-vote" to achieve fast, accurate and scalable read mapping. It automatically determines if a read should be globally or locally aligned, therefore particularly powerful in mapping RNA-seq reads. It supports indel detection and can map reads with both fixed and variable lengths. Next Generation Sequencing
RNA-Seq Alignment
Read alignment
Read mapping
Whole genome resequencing
Gapped alignment
Local alignment
Memory efficient and fast
Paired read support
RNAseq analysis
SAM format output
SNP/indel discovery in any format
can run on major computer platforms
capable of using very small seeds for splice mapping
low cost for high-error matching
paired-end mapping
parallel execution
short and long reads
C GPL v3 Linux 64
Mac OS X
Mac OS X; x86 64
SuccinctAssembly Tools to build & analyze compact versions of de Bruijn graphs. De-novo assembly Linux
SUDS genome browser Compressed suffix tree implementation to browser genome sequences Genome browser C++ GPLv2
Suffixerator Compute enhanced suffix array Suffix arrays Part of GenomeTools Linux
Supersplat Using a genomic reference and RNA-seq high-throughput sequencing datasets, supersplat empirically identifies potential splice junctions at a rate of (~)11.4 million reads per hour. RNA-Seq Alignment Assembly C++
SUTTA De novo assembly algorithm for assembling bacterial genomes from second generation sequencing data De-novo assembly Commercial
Freeware
SVDetect Identifies genomic structural variations from paired-end and mate-pair next-generation sequencing data produced by the Illumina GA and ABI SOLiD platforms. Applying both sliding-window and clustering strategies, we use anomalously mapped read pairs provided by current short read aligners to localize genomic rearrangements and classify them according to their type, e.g. large insertions-deletions, inversions, duplications and balanced or unbalanced inter-chromosomal translocations Structural variation Perl
SVMerge Pipeline for the detection of structural variants by integrating calls from multiple structural variant callers. Structural variation Perl
Swalign A simple Smith-Waterman alignment implementation in C Alignment C MIT
SWAP454 A program for calling SNPs using 454 read data. SNP discovery
SwDMR swDMR: a sliding window approach to identify differentially methylated regions based on bisulfite sequencing Bisulfite Sequencing
Differential methylated regions identification
Differentially methylated regions identification and annotation Differentially methylated regions identification and annotation Perl
R
GPL v3 Linux
UNIX
Swift Primary Data Analysis for the Illumina Solexa Sequencing Platform. Basecaller C++ LGPL Linux
SWT WashU Sliding Window Tool for detecting copy number variants from Illumina/Solexa data. Copy number estimation
SXOligoSearch SXOligoSearch is a commercial platform offered by the Malaysian based Synamatix. Will align Illumina reads against a range of Refseq RNA or NCBI genome builds for a number of organisms. Web Portal. OS independent. Alignment
Syapse Syapse is a platform and application suite for bringing together omics and clinical data. Allele-specific transcription
DNA methylation
DNA-Seq
InDel discovery
SNP discovery
Structural variation
RNA-Seq
Small RNA transcriptome
ChIP-Seq
Comparative genomics
Comparative transcriptomics
Epigenomics
Genomics
Personal genomics
Population genomics
Regulatory genomics
Viral genomics
In vitro selection
Metagenomics
Metatranscriptomics
MiRNA-Seq
Transcriptomics
Biological Contextualization
Differentially expressed gene identification
Exome analysis
Sample Barcoding
Sequence analysis
Variant Classification
Visualization
Gene ontology analysis
API
Cloud Computing
GUI
Syzygy Software to identify variants from pooled sequencing data SNP discovery
InDel discovery
Python
R
T-lex Here, we present a computational pipeline (T-lex) that uses NGS data to detect the presence/absence of annotated Transposable Element (TE) copies. T-lex can use data from a large number of strains and returns estimates of population frequencies of individual TE insertions in a reasonable time. Transposable Elements Perl
Ta-si prediction ta-siRNA (trans-acting short interfering RNA): prediction of phased ta-siRNAs in plant sRNA datasets. General bioinformatics (pipeline) Phase pattern prediction Java Custom Licence Linux 64
Windows
Mac OS X
Tablet Tablet is a lightweight, high-performance graphical viewer for next generation sequence assemblies and alignments. Genomics
Genotyping
Comparative genomics
Assembly visualization Assembly visualization Java BSD Windows
OS X
Linux
TagCleaner TagCleaner can be used to automatically detect and efficiently remove tag sequences (e.g. WTA or MID tags) from metagenomic datasets. TagCleaner is available as both standalone and web-based versions. Metatranscriptomics
Metagenomics
Viral metagenomics
Trimming GPLv3 UNIX
Mac OS X
Windows
TagDust TagDust, a program identifying artifactual sequences in large sequencing runs. Given a user-defined cutoff for the false discovery rate (FDR), TagDust identifies all reads explainable by combinations and partial matches to known sequences used during library preparation. Sequencing Quality Control C GPL Solaris
UNIX
Linux
Taipan Taipan uses greedy extensions for contig construction but at each step realizes enough of the corresponding read graph to make better decisions as to how assembly should continue. We show that this approach can achieve an assembly quality at least as good as the graph-based approaches used in the popular Edena and Velvet assembly tools using a moderate amount of computing resources. De-novo assembly Assembly
Tally Tally is a program for deduplicating sequence fragments for both single and paired end input. Single reads, paired-end reads. Sequencing Deduplication
Uniquifying
Read pre-processing
Filtering
Paired End
Memory efficient and fast. C GPLv3 Linux
UNIX
Mac OS X
Tallymer A collection of flexible and memory-efficient programs for k-mer counting and indexing of large sequence sets. K-mer analysis
Suffix arrays
Part of GenomeTools Linux
TAPyR Efficient BWT-based read aligner supporting multiple sequencing platforms Whole Genome Resequencing Read mapping C GPL
TASE Rapid tag-counting and annotation software tool specifically designed for Illumina CASAVA sequencing datasets. RNA-Seq Quantitation Java
TASR Targeted assembly of short read data to identify the presence of variants Targeted assembly Assembly Perl GPLv2
TEQC Quality assessment of target enrichment experiments. Targeted resequencing Sequencing Quality Control R
TileQC TileQC: a system for tile-based quality control of Solexa data. Sequencing Quality Control R
TiMat2 TiMAT2 contains tools for genomic tiling microarray analysis Tilling Java BSD
TMAP TMAP is a short read aligner specifically tuned for data from the Ion Torrent PGM Mapping
TMAP TMAP is a short read aligner specifically tuned for data from the Ion Torrent PGM Mapping
TopHat TopHat is a fast splice junction mapper for RNA-Seq reads. RNA-Seq Alignment Alignment
Mapping
C++ Boost Linux
Unix
TopHat-Fusion Detection of fusion genes in RNA-Seq data Fusion transcripts
TOTALRECALLER Improves sequence quality of reads and reduces ambiguous mappings Basecaller Commercial
Freeware
Linux
Tracembler Tracembler streamlines the process of recursive database searches, sequence assembly, and gene identification in resulting contigs in attempts to identify homologous loci of genes of interest in species with emerging whole genome shotgun reads. A web server hosting Tracembler is provided at http://www.plantgdb.org/tool/tracembler/, and the software is also freely available from the authors for local installations. Chromosome Walking Assembly Linux
Trans-ABySS Trans-ABySS is a software package that is designed to analyze ABySS-assembled whole-genome shotgun transcriptome data. RNA-Seq
SNP discovery
Fusion genes
InDel discovery
Fusion transcripts
BCCA (academic use)
Trimmomatic A flexible read trimming tool for Illumina NGS data Trimming
Conversion
Sequencing Quality Control
Trinity Trinity is a transcriptome assembler which relies on three different tools, inchworm an assembler, chrysalis which pools contigs and butterfly which amongst others compacts a graph resulting from butterfly with reads. De novo transcriptome assembly Transcriptome assembly Java
C++
Linux
Tripal Tripal is a collection of open-source, freely-available Drupal modules that serves as a web interface for a GMOD Chado database. It is designed to allow anyone with genomic data to quickly create an online genomic database using community supported tools. Tripal is part of the open-source tool collection available through the Generic Model Organism Database (GMOD) project. Genomics
Genetics
Visualization
Database interface
PHP Open Source
Tview tview is a lightweight curses based assembly viewer Visualization C BSD
MIT
UGENE UGENE is a free cross-platform genome analysis suite that combines popular bioinformatics tools within a single user friendly interface. Phylogenetics
Genomics
Sequence analysis
Protein structure analysis
Sequence parsing
Command line tool wrappers
Programming Library
C++ GPLv3 Linux
Windows
Mac OS X
UnoSeq UnoSeq is a Java library to analyze next generation sequencing data (e.g. data generated by Illumina's mRNAseq method) and especially perform expression profiling in organisms where no well-annotated reference genome exists. RNA-Seq Alignment
De novo transcriptome assembly
Java
USeq Collection of software tools for for both low and high level analysis of next generation, ultra high throughput signature sequencing data from the Solexa, SOLiD, and 454 platforms. Initial emphasis: chIP-seq and RNA-Seq with FDR estimations ChIP-Seq
RNA-Seq Alignment
Java BSD
V-Xtractor V-Xtractor uses Hidden Markov Models to locate, verify, and extract defined hypervariable sequence segments (V1-V9) from bacterial, archaeal, and fungal small-subunit rRNA sequences. Pyrotags
Metagenomics
Hidden Markov Model Perl GPL UNIX
VAAL VAAL is a variant ascertainment algorithm that can be used to detect SNPs, indels, and more complex genetic variants. Structural variation
SNP discovery
InDel discovery
Alignment
VAAST Variant Annotation, Analysis and Search Tool Structural variation Variant Prioritization Commercial
Freeware
Variant Effect Predictor Tool for predicting effects of variants for any genome in Ensembl. The web version is limited to 750 variants, but an API and Perl script are provided as well. SNP Annotation Perl
VariantClassifier The VariantClassifier is a software tool for hierarchically classifying variants based on the genome annotation that is provided. Instead of looking at a region of the genome and seeing all the features relative to each other on the genomic axis, the VariantClassifier inverts the process so that novel variants can be tested for interest, based on the known features on the genomic axis. Furthermore, our hierarchical classification provides a prioritization of the variants that should be considered for more intensive study. SNP Annotation
Variation toolkit A set of C++ tools for the interpretation of VCF data. Genomics
Exome and whole genome variant detection
SNP Annotation
C++ GPLv3 Linux
VariationHunter Detection of structural rearrangements Structural variation Mapping
Variant Calling
C UNIX
VARiD VARiD is a variation detection framework for both color-space and letter-space platforms Genomics
SNP discovery
InDel discovery
Hidden Markov Model SNP/indel discovery in any format
even combining Colorspace with Letterspace
C GPLv3
VarScan VarScan, an open source tool for variant detection that is compatible with several short read align-ers. SNP discovery SNP calling Java
VCAKE De novo assembly of short reads with robust error correction. An improvement on early versions of SSAKE. De-novo assembly Assembly
K-mer analysis
Perl
C
GPL Linux
Mac OS X
Vcflib API and command line utilities for the manipulation of VCF files. Genomics C++ MIT
VCFtools Package for dealing with VCF (variant call format) files. Both command line and Perl API. Programming Library C++ GPLv3
Vectorfriends VectorFriends is an advanced, integrated, and user-friendly sequence analysis software for molecular biologists. It combines various types of in silico cloning, sequence analysis and data management into a single application. Alignment
Assembly
Annotation
Multiple sequence alignment viewer
Visualization
Isothermal Assembly
gateway and multisite gateway cloning
TOPO PCR cloning
restriction cloning
cloning history
multiple sequence alignment
Phylogenetic tree visualization
Sanger sequence assembly
ORF analysis
Translation analysis
GC plot analysis
Java Free for academic use; Commercial Windows
Velvet Velvet is a de novo genomic assembler specially designed for short read sequencing technologies, such as Solexa or 454 or SOLiD De-novo assembly Assembly
De Bruijn graph
GPL
VelvetOptimiser VelvetOptimiser is a multi-threaded Perl script for automatically optimising the parameter options for the Velvet de novo sequence assembler. De-novo assembly Perl GPLv2
Vicuna De novo assembly of viral populations De novo assembly
Viral genomics
Population Genomics
C++ POSIX
VIP A complete package designed for next-generation diagnostics using 454 sequencing. Genomics
SNP discovery
SNP Annotation
Mapping Mapping
SNP calling
coverage analysis
SNP annotation
Perl
MySQL
LGPL Linux
ViralFusionSeq Accurately discover viral integration events and fusion transcripts by the use of soft-clipping information, read-pair analysis, and targeted de novo assembly Genomics
Fusion genes
Fusion transcripts
Viral genomics
Read Alignment
Read mapping
Read pre-processing
Split-read
De-novo assembly
Targeted de novo assembly
Alignment
Assembly
Accurately discover viral integration events and fusion transcripts Perl GPLv3 Linux
VirusHunter Next generation sequencing (NGS) technologies allow us to explore virus interactions with host genomes that lead to carcinogenesis or other diseases; however, this effort is largely hindered by the dearth of efficient computational tools. Here, we present a new tool, VirusHunter, for the identification of viruses and their integration sites in host genomes using any type of existing NGS data. VirusHunter's unique features include the characterization of insertion loci of any type of virus in the host genome and high accuracy and computational efficiency as a result of its well-designed pipeline.
VirusSeq We developed a new algorithmic method, VirusSeq, for detecting known viruses and their integration sites in the human genome using next-generation sequencing data. We evaluated VirusSeq on RNA-Seq data of 256 TCGA human cancer samples. Using these data, we showed that VirusSeq accurately detects the known viruses and their integration sites with high sensitivity and specificity. VirusSeq can also perform this function using whole genome sequencing data of human tissue. Viral genomics Mapping
Read mapping
Read Alignment
Perl Unknown
VisSR VisSR (Visualisation of sRNAs): generate a visual representation of sRNAs and user-imported genomic features. Small RNA transcriptome Visualization Desktop sRNA Visualisation Java Custom Licence Linux 64
Windows
Mac OS X
Vmatch A versatile software tool for efficiently solving large scale sequence matching tasks Mapping Commercial Linux
WebApollo WebApollo is a browser-based tool for distributed community annotation of sequences. Genomics
Sequence annotation
Sequence functional annotation
Annotation Real time updating; lazy-loading; sequence visualization; sequence annotation JavaScript; Perl; Java Open Source Requires Tomcat server
WebPrInSeS WebPrInSeS encompasses two separate software applications. The first is WebPrInSeS-C which performs automated sequence verification of user-defined open-reading frame (ORF) clone libraries. The second is WebPrInSeS-E, which identifies positive hits in cDNA or ORF-based library screening experiments such as yeast one- or two-hybrid assays Clone verification
Yeast two-hybrid
Yeast one-hybrid
browser based
WHAM WHAM is a high-throughput sequence alignment tool. Mapping GPLv3
XMatchView A visual tool for analyzing cross_match alignments. Viewer Python GPLv3 Windows
Linux
YASS YASS is a genomic similarity search tool, for nucleic (DNA/RNA) sequences in fasta or plain text format (it produces local pairwise alignments).
ZINBA Identifies genomic regions enriched in a variety of ChIP-seq and related next-generation sequencing experiments ChIP-Seq
DNA-Seq
GPLv3
ZOOM ZOOM (Zillions Of Oligos Mapped) is designed to map millions of short reads, emerged by next-generation sequencing technology, back to the reference genomes, and carry out post-analysis. ZOOM is developed to be highly accurate, flexible, and user-friendly with speed being a critical priority. Mapping Linux
Windows
ZORRO ZORRO is an hybrid sequencing technology assembler. It takes to sets of pre-assembled contigs and merge them into a more contiguous and consistent assembly. The main caracteristic of Zorro is the treatment before and after assembly to avoid errors. Genomic Assembly
Genomics
Assembly
Hybrid assembly
Perl GPL Linux
Personal tools
Namespaces

Variants
Actions
wiki navigation
Software
Toolbox