Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • pasta
    Member
    • Jan 2011
    • 27

    C++ libraries for NGS data and such

    Hi there,

    I am a Delphi/php guy and I am new using C++ in bioinformatics.
    I am looking for the most mature c++ library than can deal with Next Generation Sequecing data and common file formats used in biology(gbk, ptt, fasta, sam, etc...).

    Do you have any idea ?

    Thank you for your help,

    cheers,

    toni
    Last edited by pasta; 03-18-2011, 02:29 AM.
  • pasta
    Member
    • Jan 2011
    • 27

    #2
    Partially solved for SAM format: A C library is available on SAMtools website (http://samtools.sourceforge.net/samtools-c.shtml), dumb me !!
    Any other good library for Gbk, fasta and ptt ?

    Comment

    • kmcarr
      Senior Member
      • May 2008
      • 1181

      #3
      The NCBI Toolbox? Not a C++ programmer myself so I can't say much more about them.

      The National Center for Biotechnology Information (NCBI) provides an integrated approach to the use of gene and protein sequence information, the scientific literature (MEDLINE), molecular structures, and related resources, in biomedicine.

      Comment

      • colindaven
        Senior Member
        • Oct 2008
        • 417

        #4
        Try http://www.seqan.de/

        Comment

        • pasta
          Member
          • Jan 2011
          • 27

          #5
          Thank you guys for your answers
          The NCBI toolbox is maybe a bit limited but Seqan looks pretty cool.

          Thanks++

          toni

          Comment

          • n00c
            Member
            • Nov 2009
            • 12

            #6
            SeqAn is a well-rounded and a well-designed library (there's even a book about it: http://www.amazon.com/Biological-Seq.../dp/142007623X, which I found to be a useful reference), but if you want to work with SAM/BAM format specifically, I would seriously recommend studying the C functions in Samtools for manipulating SAM-/BAM-formatted data. Looking at Samtools code is useful because (a) it was written by the author of SAM format, and (b) in Samtools code, you can find many examples on how to manipulate the exposed data structures.

            There is also a C++ library called Bamtools (http://sourceforge.net/projects/bamtools/), but so far my preferred way has been to use a custom-written lightweight C++ wrapper for the C interface exposed by "bam.h" from Samtools.

            License-wise, Samtools and Bamtools are under MIT licenses, and SeqAn is under BSD.
            Last edited by n00c; 03-17-2011, 12:29 PM.

            Comment

            • pasta
              Member
              • Jan 2011
              • 27

              #7
              Thank you guys and n00c for your answer .

              Cheers,

              Toni

              Comment

              • schmima
                Member
                • Apr 2010
                • 56

                #8
                Partially solved for SAM format: A C library is available on SAMtools website (http://samtools.sourceforge.net/samtools-c.shtml), dumb me !!
                in case you get angry while trying to read the "documentation" of the C-API, have a look at bamtools instead (on samtools.com on the right there's a section "other language bindings" - bamtools is the one in c++). Found it way easier to understand and use


                [EDIT - uups - sorry for the double post...]

                Comment

                • Simon Anders
                  Senior Member
                  • Feb 2010
                  • 995

                  #9
                  Any specific reason you want to use C++? If you want to develop general-use high-performance bioinformatics tools, it is the way to go, of course -- but in most cases, a scripting language might be the better choice as it allows more rapid development and gives a performance sufficient for most use cases.

                  Hence, maybe have a look at our Python framework, HTSeq: http://www-huber.embl.de/users/anders/HTSeq/

                  Comment

                  • schmima
                    Member
                    • Apr 2010
                    • 56

                    #10
                    I would agree with Simon - used/using python quite a lot (there's also a SAM/BAM interface). However - personally I like to write certain things in c++ instead of python - my code is better organised and I normally think more before writing it and regarding memory efficiency and speed, c++ is way better than python (at least in my programs and scripts).

                    Comment

                    • pasta
                      Member
                      • Jan 2011
                      • 27

                      #11
                      Originally posted by schmima View Post
                      in case you get angry while trying to read the "documentation" of the C-API, have a look at bamtools instead (on samtools.com on the right there's a section "other language bindings" - bamtools is the one in c++). Found it way easier to understand and use


                      [EDIT - uups - sorry for the double post...]
                      You read my mind - thanks for your tip !


                      @Simon: Thanks but I am working on large files and I want to make it very memory efficient.

                      Comment

                      • pasta
                        Member
                        • Jan 2011
                        • 27

                        #12
                        *Just for the community*
                        I also found TIGR++ :http://www.cbcb.umd.edu/software/pirate/tigr++.shtml

                        C++ class library used by several TIGR genefinders and other packages. Covers string & sequence processing, math/statistics, many efficient data structures, GFF parsing, sorting, and I/O.

                        Cheers

                        toni

                        Comment

                        Latest Articles

                        Collapse

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by SEQadmin2, Yesterday, 10:09 AM
                        0 responses
                        9 views
                        0 reactions
                        Last Post SEQadmin2  
                        Started by SEQadmin2, 06-04-2026, 08:59 AM
                        0 responses
                        17 views
                        0 reactions
                        Last Post SEQadmin2  
                        Started by SEQadmin2, 06-02-2026, 12:03 PM
                        0 responses
                        26 views
                        0 reactions
                        Last Post SEQadmin2  
                        Started by SEQadmin2, 06-02-2026, 11:40 AM
                        0 responses
                        21 views
                        0 reactions
                        Last Post SEQadmin2  
                        Working...