Syndicated from PubMed RSS Feeds
Characteristics of 454 pyrosequencing data--enabling realistic simulation with flowsim.
Bioinformatics. 2010 Sep 15;26(18):i420-i425
Authors: Balzer S, Malde K, Lanzén A, Sharma A, Jonassen I
MOTIVATION: The commercial launch of 454 pyrosequencing in 2005 was a milestone in genome sequencing in terms of performance and cost. Throughout the three available releases, average read lengths have increased to approximately 500 base pairs and are thus approaching read lengths obtained from traditional Sanger sequencing. Study design of sequencing projects would benefit from being able to simulate experiments. RESULTS: We explore 454 raw data to investigate its characteristics and derive empirical distributions for the flow values generated by pyrosequencing. Based on our findings, we implement Flowsim, a simulator that generates realistic pyrosequencing data files of arbitrary size from a given set of input DNA sequences. We finally use our simulator to examine the impact of sequence lengths on the results of concrete whole-genome assemblies, and we suggest its use in planning of sequencing projects, benchmarking of assembly methods and other fields. AVAILABILITY: Flowsim is freely available under the General Public License from http://blog.malde.org/index.php/flowsim/ CONTACT: [email protected]; [email protected].
PMID: 20823302 [PubMed - in process]
More...
Characteristics of 454 pyrosequencing data--enabling realistic simulation with flowsim.
Bioinformatics. 2010 Sep 15;26(18):i420-i425
Authors: Balzer S, Malde K, Lanzén A, Sharma A, Jonassen I
MOTIVATION: The commercial launch of 454 pyrosequencing in 2005 was a milestone in genome sequencing in terms of performance and cost. Throughout the three available releases, average read lengths have increased to approximately 500 base pairs and are thus approaching read lengths obtained from traditional Sanger sequencing. Study design of sequencing projects would benefit from being able to simulate experiments. RESULTS: We explore 454 raw data to investigate its characteristics and derive empirical distributions for the flow values generated by pyrosequencing. Based on our findings, we implement Flowsim, a simulator that generates realistic pyrosequencing data files of arbitrary size from a given set of input DNA sequences. We finally use our simulator to examine the impact of sequence lengths on the results of concrete whole-genome assemblies, and we suggest its use in planning of sequencing projects, benchmarking of assembly methods and other fields. AVAILABILITY: Flowsim is freely available under the General Public License from http://blog.malde.org/index.php/flowsim/ CONTACT: [email protected]; [email protected].
PMID: 20823302 [PubMed - in process]
More...