SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
SMRT Tools, Parsnp, CLG GWB - NGS Tools System requirements weehzer Bioinformatics 0 03-30-2017 03:59 AM
NGS data management tools and analysis tools hcraigwriter@gmail.com Bioinformatics 1 09-22-2014 01:50 AM
Comparison of alignment tools genbio64 Bioinformatics 6 07-18-2011 10:57 AM

Reply
 
Thread Tools
Old 06-04-2019, 03:30 AM   #1
costanza
Junior Member
 
Location: Belgium

Join Date: Oct 2018
Posts: 4
Default Comparison of programming languages for NGS tools

We have just published a paper about comparing Go, Java, and C++ for implementing our sequencing tool elPrep in BMC Bioinformatics.

elPrep is available at https://github.com/exascience/elprep

Title: A comparison of three programming languages for a full-fledged next-generation sequencing tool

Authors: Pascal Costanza, Charlotte Herzeel, Wilfried Verachtert

URL: https://doi.org/10.1186/s12859-019-2903-5

Background:
elPrep is an established multi-threaded framework for preparing SAM and BAM files in sequencing pipelines. To achieve good performance, its software architecture makes only a single pass through a SAM/BAM file for multiple preparation steps, and keeps sequencing data as much as possible in main memory. Similar to other SAM/BAM tools, management of heap memory is a complex task in elPrep, and it became a serious productivity bottleneck in its original implementation language during recent further development of elPrep. We therefore investigated three alternative programming languages: Go and Java using a concurrent, parallel garbage collector on the one hand, and C++17 using reference counting on the other hand for handling large amounts of heap objects. We reimplemented elPrep in all three languages and benchmarked their runtime performance and memory use.

Results:
The Go implementation performs best, yielding the best balance between runtime performance and memory use. While the Java benchmarks report a somewhat faster runtime than the Go benchmarks, the memory use of the Java runs is significantly higher. The C++17 benchmarks run significantly slower than both Go and Java, while using somewhat more memory than the Go runs. Our analysis shows that concurrent, parallel garbage collection is better at managing a large heap of objects than reference counting in our case.

Conclusions:
Based on our benchmark results, we selected Go as our new implementation language for elPrep, and recommend considering Go as a good candidate for developing other bioinformatics tools for processing SAM/BAM data as well.
costanza is offline   Reply With Quote
Reply

Tags
runtime performance, sam/bam files, sequence analysis

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:37 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO