Go Back   SEQanswers > Literature Watch

Similar Threads
Thread Thread Starter Forum Replies Last Post
ChIP-Seq: Enabling Data Analysis on High-Throughput Data in Large Data Depository Usi Newsbot! Literature Watch 1 04-18-2018 10:50 PM
ChIP-Seq: Pyicos: A versatile toolkit for the analysis of high-throughput sequencing Newsbot! Literature Watch 0 10-14-2011 03:40 AM
ChIP-Seq: Systematic bias in high-throughput sequencing data and its correction by BE Newsbot! Literature Watch 0 06-08-2011 03:50 AM
ChIP-Seq: ChIP-Seq Using High-Throughput DNA Sequencing for Genome-Wide Identificatio Newsbot! Literature Watch 0 10-16-2010 03:00 AM
ChIP-Seq: Savant: Genome Browser for High Throughput Sequencing Data. Newsbot! Literature Watch 0 06-22-2010 03:00 AM

Thread Tools
Old 10-16-2010, 03:00 AM   #1
RSS Posting Maniac

Join Date: Feb 2008
Posts: 1,443
Default ChIP-Seq: Data structures and compression algorithms for high-throughput sequencing t

Syndicated from PubMed RSS Feeds

Data structures and compression algorithms for high-throughput sequencing technologies.

BMC Bioinformatics. 2010 Oct 14;11(1):514

Authors: Daily K, Rigor P, Christley S, Xie X, Baldi P

ABSTRACT: BACKGROUND: High-throughput sequencing (HTS) technologies play important roles in the life sciences by allowing the rapid parallel sequencing of very large numbers of relatively short nucleotide sequences, in applications ranging from genome sequencing and resequencing to digital microarrays and ChIP-Seq experiments. As experiments scale up, HTS technologies create new bioinformatics challenges for the storage and sharing of HTS data. RESULTS: We develop data structures and compression algorithms for HTS data. A processing stage maps short sequences to a reference genome or a large table of sequences. Then the integers representing the short sequence absolute or relative addresses, their length, and the substitutions they may contain are compressed and stored using various entropy coding algorithms, including both old and new fixed codes (e.g Golomb, Elias Gamma, MOV) and variable codes (e.g. Huffman). The general methodology is illustrated and applied to several HTS data sets. Results show that the information contained in HTS files can be compressed by a factor of 10 or more, depending on the statistical properties of the data sets and various other choices and constraints. Our algorithms fair well against general purpose compression programs such as gzip, bzip2 and 7zip; timing results show that our algorithms are consistently faster than the best general purpose compression programs. CONCLUSIONS: It is not likely that exactly one encoding strategy will be optimal for all types of HTS data. Different experimental conditions are going to generate various data distributions whereby one encoding strategy can be more effective than another. We have implemented some of our encoding algorithms into the software package GenCompress which is available upon request from the authors. With the advent of HTS technology and increasingly new experimental protocols for using the technology, sequence databases are expected to continue rising in size. The methodology we have proposed is general, and these advanced compression techniques should allow researchers to manage and share their HTS data in a more timely fashion.

PMID: 20946637 [PubMed - as supplied by publisher]

Newsbot! is offline   Reply With Quote

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

All times are GMT -8. The time now is 12:39 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO