SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
python and NGS data markero Bioinformatics 8 05-16-2014 11:27 AM
[NGS - analysis of gene expression data] Machine Learning + RNAseq data Chuckytah Bioinformatics 7 03-05-2012 03:16 AM
Loss of data in low-diversity libraries can be recovered by deferred cluster calling fkrueger Bioinformatics 17 01-24-2012 05:29 PM
Looking for a few NGS-ers willing to share a bad experience about NGS data analysis CHoyt Bioinformatics 8 12-09-2011 11:06 PM
Sequencing low complexity libraries: effects on data casbon Illumina/Solexa 7 09-05-2011 11:51 PM

Reply
 
Thread Tools
Old 03-17-2011, 02:36 AM   #1
pasta
Member
 
Location: Alps

Join Date: Jan 2011
Posts: 27
Question SOLVED - C++ libraries for NGS data and such

Hi there,

I am a Delphi/php guy and I am new using C++ in bioinformatics.
I am looking for the most mature c++ library than can deal with Next Generation Sequecing data and common file formats used in biology(gbk, ptt, fasta, sam, etc...).

Do you have any idea ?

Thank you for your help,

cheers,

toni

Last edited by pasta; 03-18-2011 at 02:29 AM.
pasta is offline   Reply With Quote
Old 03-17-2011, 03:31 AM   #2
pasta
Member
 
Location: Alps

Join Date: Jan 2011
Posts: 27
Default

Partially solved for SAM format: A C library is available on SAMtools website (http://samtools.sourceforge.net/samtools-c.shtml), dumb me !!
Any other good library for Gbk, fasta and ptt ?
pasta is offline   Reply With Quote
Old 03-17-2011, 05:05 AM   #3
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,178
Default

The NCBI Toolbox? Not a C++ programmer myself so I can't say much more about them.

http://www.ncbi.nlm.nih.gov/IEB/ToolBox/index.cgi
kmcarr is offline   Reply With Quote
Old 03-17-2011, 05:31 AM   #4
colindaven
Senior Member
 
Location: Germany

Join Date: Oct 2008
Posts: 415
Default

Try http://www.seqan.de/
colindaven is offline   Reply With Quote
Old 03-17-2011, 06:42 AM   #5
pasta
Member
 
Location: Alps

Join Date: Jan 2011
Posts: 27
Default

Thank you guys for your answers
The NCBI toolbox is maybe a bit limited but Seqan looks pretty cool.

Thanks++

toni
pasta is offline   Reply With Quote
Old 03-17-2011, 12:20 PM   #6
n00c
Member
 
Location: Boston, MA

Join Date: Nov 2009
Posts: 12
Default

SeqAn is a well-rounded and a well-designed library (there's even a book about it: http://www.amazon.com/Biological-Seq.../dp/142007623X, which I found to be a useful reference), but if you want to work with SAM/BAM format specifically, I would seriously recommend studying the C functions in Samtools for manipulating SAM-/BAM-formatted data. Looking at Samtools code is useful because (a) it was written by the author of SAM format, and (b) in Samtools code, you can find many examples on how to manipulate the exposed data structures.

There is also a C++ library called Bamtools (http://sourceforge.net/projects/bamtools/), but so far my preferred way has been to use a custom-written lightweight C++ wrapper for the C interface exposed by "bam.h" from Samtools.

License-wise, Samtools and Bamtools are under MIT licenses, and SeqAn is under BSD.

Last edited by n00c; 03-17-2011 at 12:29 PM.
n00c is offline   Reply With Quote
Old 03-18-2011, 02:28 AM   #7
pasta
Member
 
Location: Alps

Join Date: Jan 2011
Posts: 27
Default

Thank you guys and n00c for your answer .

Cheers,

Toni
pasta is offline   Reply With Quote
Old 03-18-2011, 08:05 AM   #8
schmima
Member
 
Location: Zürich

Join Date: Apr 2010
Posts: 56
Default

Quote:
Partially solved for SAM format: A C library is available on SAMtools website (http://samtools.sourceforge.net/samtools-c.shtml), dumb me !!
in case you get angry while trying to read the "documentation" of the C-API, have a look at bamtools instead (on samtools.com on the right there's a section "other language bindings" - bamtools is the one in c++). Found it way easier to understand and use


[EDIT - uups - sorry for the double post...]
schmima is offline   Reply With Quote
Old 03-18-2011, 10:49 AM   #9
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994
Default

Any specific reason you want to use C++? If you want to develop general-use high-performance bioinformatics tools, it is the way to go, of course -- but in most cases, a scripting language might be the better choice as it allows more rapid development and gives a performance sufficient for most use cases.

Hence, maybe have a look at our Python framework, HTSeq: http://www-huber.embl.de/users/anders/HTSeq/
Simon Anders is offline   Reply With Quote
Old 03-21-2011, 12:18 AM   #10
schmima
Member
 
Location: Zürich

Join Date: Apr 2010
Posts: 56
Default

I would agree with Simon - used/using python quite a lot (there's also a SAM/BAM interface). However - personally I like to write certain things in c++ instead of python - my code is better organised and I normally think more before writing it and regarding memory efficiency and speed, c++ is way better than python (at least in my programs and scripts).
schmima is offline   Reply With Quote
Old 03-21-2011, 02:31 AM   #11
pasta
Member
 
Location: Alps

Join Date: Jan 2011
Posts: 27
Default

Quote:
Originally Posted by schmima View Post
in case you get angry while trying to read the "documentation" of the C-API, have a look at bamtools instead (on samtools.com on the right there's a section "other language bindings" - bamtools is the one in c++). Found it way easier to understand and use


[EDIT - uups - sorry for the double post...]
You read my mind - thanks for your tip !


@Simon: Thanks but I am working on large files and I want to make it very memory efficient.
pasta is offline   Reply With Quote
Old 03-25-2011, 02:42 AM   #12
pasta
Member
 
Location: Alps

Join Date: Jan 2011
Posts: 27
Default

*Just for the community*
I also found TIGR++ :http://www.cbcb.umd.edu/software/pirate/tigr++.shtml

C++ class library used by several TIGR genefinders and other packages. Covers string & sequence processing, math/statistics, many efficient data structures, GFF parsing, sorting, and I/O.

Cheers

toni
pasta is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 02:27 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO