Samscope

From SEQwiki
Jump to: navigation, search

Application data

Created by Kris Popendorf
Biological application domain(s) ChIP-seq, RNA-Seq, Genomics
Principal bioinformatics method(s) Visualisation, Read mapping
Technology Any
Created at Keio University
Maintained? Yes
Input format(s) SAM/BAM, BED, BEDGraph, WIG, CSV, GTF, GFF
Output format(s) BIP
Programming language(s) C++
Software libraries Boost, OpenGL, GLUT, DevIL
Licence AGPL
Operating system(s) POSIX, Linux

Summary: Samscope is a lightweight SAM/BAM file viewer that makes visually exploring next generation sequencing data as intuitive as Google Maps. Samscope uses multiple layers to simultaneously (or sequentially) view SAM/BAM related features like coverage or allele frequency, or ChIP-SEQ features like polarity from as many files as you like. The paging-friendly binary file layout makes it feasible to browse data sets far larger than the system's available RAM.

Contents

Description

Samscope is an interactive OpenGL based viewer to examine aggregate statistics from SAM/BAM files. Unlike read-centric viewers like IGV or Tablet, Samscope operates primarily on aggregate statistics (1 value per base of sequence) for examining features like coverage, polarity, or minor allele frequency. Samscope adopts a layer-based display model, where each layer reflects a SAM mapping feature, such as coverage. Layers are stored as BAM MIP Maps ("BIPs") on disk in a binary format allowing instantaneous load and seeks with minimal memory requirements. Multiple layers can then be displayed simultaneously as different colors, and in multiple synchronized windows. This layer-based design makes it simple to display results from multiple SAM files as different layers, and visually compare results from different experiments. Annotation can be displayed from GTF/GFF files. Individual reads can also be rendered with per-base mutation/consensus calling if BAM indexes are available.

The viewer itself loads BIP files using mmap(2) and only needs access to the pages describing the area in view, so the viewer is very lightweight. However, before running as a viewer, Samscope needs to generate the requisite BIP layer files. The layers generated are chosen based on runtime options, so examine the OPTIONS section for more details.

Space Requirements

The space requirements at BIP generation are dictated primarly by the size of the reference sequence, as the index includes an entry for each MIP column. Raw layer values are stored 1 per reference base using the precision specified at build time (from 1 to 8 bytes (by default 2)), however files BIP files are sparse, so any untouched spans of genome (larger than your system's pagesize) require no disk or RAM for raw value storage. As layers are downsampled for building MIP maps, having sufficient RAM to hold the previous resolution in memory helps, but is not necessary. Because BIP files are generated sequentionally and enitrely in place and backed by disk using mmap(2), if there's insufficient memory for both resolutions at once, the former resolution can be paged to and from disk (because generation is sequential, each page is faulted done at most twice, elimiting "thrashing"). Even a laptop with a gigabyte of RAM can generate BIPs for 8Gbp of reads across a 3Gbp genome (it will be bound by the speed of its disk, but it will get the job done).

On disk, approximately (2M)(ValuePrecision + CoordinateSize) space is required per layer. Note that BIP files generated by Samscope are "sparse" and only require disk space for pages that have actually been used to store data.

Time Requirements

The time to run Samscope is effectively O((N + M)(L)), where N is the size of input data (in bases), M is the size of the reference genome, and L is the nubmer of layers being generated. All bases are read in and statistics are counted for each base requiring O(N) time. This works even with insufficient RAM if reads are sorted. If ValuePrecision*M memory is available, it completes in O(N) time regardless of whether or not reads are sorted. Each mip looks at 1/2 the columns of the previous iteration, thus finishing in O(M) time. Finally, while some layers can be generated concurrently and don't require additional passes over the O(N) read data, some layers depend on prior layers (e.g. polarity = forward - reverse) and some layer types are not presently generated concurrently. Furthermore some users may wish to limit concurrency (see --maxconcur) to limit memory consuption, requiring an additional O(L) passes over the data.

Links


References

none specified


To add a reference for Samscope, enter the PubMed ID in the field below and click 'Add'.


[ edit box ]

Search for "Samscope" in the SEQanswers forum / BioStar or:

Web Search Wiki Sites Scientific
Personal tools
Namespaces

Variants
Actions
wiki navigation
Software
Toolbox
vBSSO Login Form

Register
Reset Password
Single Sign On provided by vBSSO