ATAC

From SEQwiki
Jump to: navigation, search

Application data

Created by Walenz B
Principal bioinformatics method(s) Assembly validation, Alignment
Maintained? Maybe
Input format(s) FASTA
Output format(s) Custom
Operating system(s) Linux

Summary: ATAC is a computational process for comparative mapping between two genome assemblies, or between two different genomes.

A2Amapper, ATAC, Assembly To Assembly Comparison

Contents

Description

Assembly To Assembly Comparison (ATAC) is a fast way to compare whole genome assemblies or whole genomes.

NOTE: ATAC is currently available as part of the k-mer package

Method

Taken from the paper:

A2Amapper is based on the identification of seed alignments, in this case unique exact matches, followed by a more aggressive local alignment phase between seeds within nonoverlapping chains of seeds. Cutoffs were carefully tuned to balance sensitivity (finding all correlations), specificity (finding only the true ones), and computational requirements (see Data Set 1). Details about A2Amapper will be presented elsewhere (H.S., J.R.M., C.M.M., M.J.F., S.Y., and G.G.S., unpublished work; R.L., X. Zhao, L.F., C.M.M., and S.I., unpublished work). A2Amapper produces a set of one-to-one matches that are alignments of nearly identical pairs of segments imputed to be analogous up to polymorphisms. Each match aligns a segment of the target genome against a segment of NCBI-34. The segments are nonoverlapping by construction, and we consider the coverage of NCBI-34 to be the sum of the lengths of these segments. This set of matches is the basis for further analysis regarding correctness of order and orientation for which we develop three concepts: runs, heaviest common subsequence, and clumps. One match is consistent with another if in each assembly the segments of the matches are in the same relative order and orientation with no intervening matches between them. A run is a maximal chain of consistent matches. The heaviest common subsequence between two genomes is a subset of the matches for which the sum of the lengths of the matches is maximal and removing all other matches from consideration leaves a single run. Intuitively, the heaviest common subsequence is a global measure of the largest subset of the two assemblies that agree with each other. A clump is a run of 50 kbp or more that can be obtained by eliminating out-of-order matches, giving a local equivalent of the heaviest common subsequence (Supporting Text 1).


Usage

  • Download from SF [1]
    • Actually, there isn't a download as yet, so you need to check out from SVN:
svn co https://kmer.svn.sourceforge.net/svnroot/kmer
  • Build... For instructions, see [2]
    • You will need the python devel libraries to provide include/Pyhton.h, and you can probably just use make where it asks for gmake.



Links


References

  1. Istrail S, Sutton GG, Florea L, Halpern AL, Mobarry CM, Lippert R, Walenz B, Shatkay H, Dew I, Miller JR, Flanigan MJ, Edwards NJ, Bolanos R, Fasulo D, Halldorsson BV, Hannenhalli S, Turner R, Yooseph S, Lu F, Nusskern DR, Shue BC, Zheng XH, Zhong F, Delcher AL, Huson DH, Kravitz SA, Mouchard L, Reinert K, Remington KA, Clark AG, Waterman MS, Eichler EE, Adams MD, Hunkapiller MW, Myers EW, Venter JC. 2004. PNAS


To add a reference for ATAC, enter the PubMed ID in the field below and click 'Add'.


[ edit box ]

Search for "ATAC" in the SEQanswers forum / BioStar or:

Web Search Wiki Sites Scientific
Personal tools
Namespaces

Variants
Actions
wiki navigation
Software
Toolbox