MUMmer

From SEQwiki
Jump to: navigation, search

Application data

Created by Kurtz S, Phillippy A, Delcher AL, Smoot M, Shumway M, Antonescu C, Salzberg SL
Biological application domain(s) Genomics, Transcriptomics
Principal bioinformatics method(s) Alignment, Mapping
Technology Any
Created at The Institute for Genomic Research (TIGR) and the Center for Bioinformatics and Computational Biology at the University of Maryland, College Park. Now maintained at the Center for Computational Biology, Johns Hopkins University
Maintained? Yes
Input format(s) FASTA
Output format(s) delta
Licence Artistic License
Operating system(s) Linux
Contact: [email protected]

Summary: MUMmer is a modular system for the rapid whole genome alignment of finished or draft sequence. Basically it is a ultra-fast alignment of large-scale DNA and protein sequences

MUMmer is released as a package providing an efficient suffix tree library, seed-and-extend alignment, SNP detection, repeat detection, and visualization tools.

MUMmer is a system for rapidly aligning entire genomes, whether in complete or draft form. For example, MUMmer 3.0 can find all 20-basepair or longer exact matches between a pair of 5-megabase genomes in 13.7 seconds, using 78 MB of memory, on a 2.4 GHz Linux desktop computer. MUMmer can also align incomplete genomes; it can easily handle the 100s or 1000s of contigs from a shotgun sequencing project, and will align them to another set of contigs or a genome using the NUCmer program included with the system. If the species are too divergent for a DNA sequence alignment to detect similarity, then the PROmer program can generate alignments based upon the six-frame translations of both input sequences. The original MUMmer system, version 1.0, is described in our 1999 Nucleic Acids Research paper. Version 2.1 appeared a few years later and is described in our 2002 Nucleic Acids Research paper, while MUMmer 3.0 was recently described in our 2004 Genome Biology paper.

This paper by another research group claims to be interface equivalent, and more memory efficient. See http://compbio.cs.princeton.edu/mems


Contents

USAGE NOTES

nucmer

nucmer -h 

  USAGE: nucmer  [options]  <Reference>  <Query>

  DESCRIPTION:
    nucmer generates nucleotide alignments between two mutli-FASTA input
    files. Two output files are generated. The .cluster output file lists
    clusters of matches between each sequence. The .delta file lists the
    distance between insertions and deletions that produce maximal scoring
    alignments between each sequence.

  MANDATORY:
    Reference       Set the input reference multi-FASTA filename
    Query           Set the input query multi-FASTA filename

  OPTIONS:
    --mum           Use anchor matches that are unique in both the reference
                    and query
    --mumcand       Same as --mumreference
    --mumreference  Use anchor matches that are unique in in the reference
                    but not necessarily unique in the query (default behavior)
    --maxmatch      Use all anchor matches regardless of their uniqueness
    -b|breaklen     Set the distance an alignment extension will attempt to
                    extend poor scoring regions before giving up (default 200)
    -c|mincluster   Sets the minimum length of a cluster of matches (default 65)
    --[no]delta     Toggle the creation of the delta file (default --delta)
    --depend        Print the dependency information and exit
    -d|diagfactor   Set the clustering diagonal difference separation factor
                    (default 0.12)
    --[no]extend    Toggle the cluster extension step (default --extend)
    -f|forward      Use only the forward strand of the Query sequences
    -g|maxgap       Set the maximum gap between two adjacent matches in a
                    cluster (default 90)
    -h|help         Display help information and exit
    -l|minmatch     Set the minimum length of a single match (default 20)
    -o|coords       Automatically generate the original NUCmer1.1 coords
                    output file using the 'show-coords' program
    --[no]optimize  Toggle alignment score optimization, i.e. if an alignment
                    extension reaches the end of a sequence, it will backtrack
                    to optimize the alignment score instead of terminating the
                    alignment at the end of the sequence (default --optimize)
    -p|prefix       Set the prefix of the output files (default "out")
    -r|reverse      Use only the reverse complement of the Query sequences
    --[no]simplify  Simplify alignments by removing shadowed clusters. Turn
                    this option off if aligning a sequence to itself to look
                    for repeats (default --simplify)
    -V|version      Display the version information and exit

delta-filter

Filters a delta alignment file produced by either nucmer or promer, leaving only the desired alignments which are output to STDOUT in the same delta format as the input. Its primary function is the LIS algorithm which calculates the longest increasing subset of alignments. This allows for the calculation of a global set of alignments i.e. 1-to-1 and mutually consistent order) with the -g option or locally consistent with -1 or -m. Reference sequences can be mapped to query sequences with -r, or queries to references with -q.

This allows the user to exclude chance and repeat induced alignments, leaving only the "best" alignments between the two data sets. Filtering can also be performed on length, identity, and uniqueness.

USAGE:
        delta-filter  [options]  <deltafile>

        [options]    type 'delta-filter -h' for a list of options.
        <deltafile>  the .delta output file from either nucmer or promer.

        OUTPUT:
        stdout  The same delta alignment format as output by nucmer and promer.

        NOTES:
        For most cases the -m option is recommended, however -1 is
        useful for applications that require a 1-to-1 mapping, such as
        SNP finding. Use the -q option for mapping query contigs to
        their best reference location.

-1            1-to-1 alignment allowing for rearrangements
              (intersection of -r and -q alignments)
-g            1-to-1 global alignment not allowing rearrangements
-h            Display help information
-i float      Set the minimum alignment identity [0, 100], default 0
-l int        Set the minimum alignment length, default 0
-m            Many-to-many alignment allowing for rearrangements
              (union of -r and -q alignments)
-q            Maps each position of each query to its best hit in
              the reference, allowing for reference overlaps
-r            Maps each position of each reference to its best hit
              in the query, allowing for query overlaps
-u float      Set the minimum alignment uniqueness, i.e. percent of
              the alignment matching to unique reference AND query
              sequence [0, 100], default 0
-o float      Set the maximum alignment overlap for -r and -q options
              as a percent of the alignment length [0, 100], default 100

show-aligns

USAGE: show-aligns  [options]  <deltafile>  <ref ID>  <qry ID>

-h            Display help information
-q            Sort alignments by the query start coordinate
-r            Sort alignments by the reference start coordinate
-w int        Set the screen width - default is 60
-x int        Set the matrix type - default is 2 (BLOSUM 62),
              other options include 1 (BLOSUM 45) and 3 (BLOSUM 80)
              note: only has effect on amino acid alignments

Input is the .delta output of either the "nucmer" or the "promer" program passed on the command line. Output is to STDOUT, and consists of all the alignments between the query and reference sequences identified on the command line. NOTE: No sorting is done by default, therefore the alignments will be ordered as found in the <deltafile> input.

mummerplot

mummerplot -h 

  USAGE: mummerplot  [options]  <match file>

  DESCRIPTION:
    mummerplot generates plots of alignment data produced by mummer, nucmer,
    promer or show-tiling by using the GNU gnuplot utility. After generating
    the appropriate scripts and datafiles, mummerplot will attempt to run
    gnuplot to generate the plot. If this attempt fails, a warning will be
    output and the resulting .gp and .[frh]plot files will remain so that the
    user may run gnuplot independently. If the attempt succeeds, either an x11
    window will be spawned or an additional output file will be generated
    (.ps or .png depending on the selected terminal). Feel free to edit the
    resulting gnuplot script (.gp) and rerun gnuplot to change line thinkness,
    labels, colors, plot size etc.

  MANDATORY:
    match file      Set the alignment input to 'match file'
                    Valid inputs are from mummer, nucmer, promer and
                    show-tiling (.out, .cluster, .delta and .tiling)

  OPTIONS:
    -b|breaklen     Highlight alignments with breakpoints further than
                    breaklen nucleotides from the nearest sequence end
    --[no]color     Color plot lines with a percent similarity gradient or
                    turn off all plot color (default color by match dir)
                    If the plot is very sparse, edit the .gp script to plot
                    with 'linespoints' instead of 'lines'
    -c
    --[no]coverage  Generate a reference coverage plot (default for .tiling)
    --depend        Print the dependency information and exit
    -f
    --filter        Only display .delta alignments which represent the "best"
                    hit to any particular spot on either sequence, i.e. a
                    one-to-one mapping of reference and query subsequences
    -h
    --help          Display help information and exit
    -l
    --layout        Layout a .delta multiplot in an intelligible fashion,
                    this option requires the -R -Q options
    --fat           Layout sequences using fattest alignment only
    -p|prefix       Set the prefix of the output files (default 'out')
    -rv             Reverse video for x11 plots
    -r|IdR          Plot a particular reference sequence ID on the X-axis
    -q|IdQ          Plot a particular query sequence ID on the Y-axis
    -R|Rfile        Plot an ordered set of reference sequences from Rfile
    -Q|Qfile        Plot an ordered set of query sequences from Qfile
                    Rfile/Qfile Can either be the original DNA multi-FastA
                    files or lists of sequence IDs, lens and dirs [ /+/-]
    -r|rport        Specify the port to send reference ID and position on
                    mouse double click in X11 plot window
    -q|qport        Specify the port to send query IDs and position on mouse
                    double click in X11 plot window
    -s|size         Set the output size to small, medium or large
                    --small --medium --large (default 'small')
    -S
    --SNP           Highlight SNP locations in each alignment
    -t|terminal     Set the output terminal to x11, postscript or png
                    --x11 --postscript --png (default 'x11')
    -t|title        Specify the gnuplot plot title (default none)
    -x|xrange       Set the xrange for the plot '[min:max]'
    -y|yrange       Set the yrange for the plot '[min:max]'
    -V
    --version       Display the version information and exit

Links


References

  1. Delcher AL, Kasif S, Fleischmann RD, Peterson J, White O, Salzberg SL. 1999. Nucleic Acids Research
  2. Delcher AL, Phillippy A, Carlton J, Salzberg SL. 2002. Nucleic Acids Research
  3. Kurtz S, Phillippy A, Delcher AL, Smoot M, Shumway M, Antonescu C, Salzberg SL.. 2004. Genome Biology
  4. Khan Z, Bloom JS, Kruglyak L, Singh M. 2009. Bioinformatics


To add a reference for MUMmer, enter the PubMed ID in the field below and click 'Add'.


[ edit box ]

Search for "MUMmer" in the SEQanswers forum / BioStar or:

Web Search Wiki Sites Scientific
Personal tools
Namespaces

Variants
Actions
wiki navigation
Software
Toolbox