Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • C++ libraries for NGS data and such

    Hi there,

    I am a Delphi/php guy and I am new using C++ in bioinformatics.
    I am looking for the most mature c++ library than can deal with Next Generation Sequecing data and common file formats used in biology(gbk, ptt, fasta, sam, etc...).

    Do you have any idea ?

    Thank you for your help,

    cheers,

    toni
    Last edited by pasta; 03-18-2011, 02:29 AM.

  • #2
    Partially solved for SAM format: A C library is available on SAMtools website (http://samtools.sourceforge.net/samtools-c.shtml), dumb me !!
    Any other good library for Gbk, fasta and ptt ?

    Comment


    • #3
      The NCBI Toolbox? Not a C++ programmer myself so I can't say much more about them.

      The National Center for Biotechnology Information (NCBI) provides an integrated approach to the use of gene and protein sequence information, the scientific literature (MEDLINE), molecular structures, and related resources, in biomedicine.

      Comment


      • #4
        Try http://www.seqan.de/

        Comment


        • #5
          Thank you guys for your answers
          The NCBI toolbox is maybe a bit limited but Seqan looks pretty cool.

          Thanks++

          toni

          Comment


          • #6
            SeqAn is a well-rounded and a well-designed library (there's even a book about it: http://www.amazon.com/Biological-Seq.../dp/142007623X, which I found to be a useful reference), but if you want to work with SAM/BAM format specifically, I would seriously recommend studying the C functions in Samtools for manipulating SAM-/BAM-formatted data. Looking at Samtools code is useful because (a) it was written by the author of SAM format, and (b) in Samtools code, you can find many examples on how to manipulate the exposed data structures.

            There is also a C++ library called Bamtools (http://sourceforge.net/projects/bamtools/), but so far my preferred way has been to use a custom-written lightweight C++ wrapper for the C interface exposed by "bam.h" from Samtools.

            License-wise, Samtools and Bamtools are under MIT licenses, and SeqAn is under BSD.
            Last edited by n00c; 03-17-2011, 12:29 PM.

            Comment


            • #7
              Thank you guys and n00c for your answer .

              Cheers,

              Toni

              Comment


              • #8
                Partially solved for SAM format: A C library is available on SAMtools website (http://samtools.sourceforge.net/samtools-c.shtml), dumb me !!
                in case you get angry while trying to read the "documentation" of the C-API, have a look at bamtools instead (on samtools.com on the right there's a section "other language bindings" - bamtools is the one in c++). Found it way easier to understand and use


                [EDIT - uups - sorry for the double post...]

                Comment


                • #9
                  Any specific reason you want to use C++? If you want to develop general-use high-performance bioinformatics tools, it is the way to go, of course -- but in most cases, a scripting language might be the better choice as it allows more rapid development and gives a performance sufficient for most use cases.

                  Hence, maybe have a look at our Python framework, HTSeq: http://www-huber.embl.de/users/anders/HTSeq/

                  Comment


                  • #10
                    I would agree with Simon - used/using python quite a lot (there's also a SAM/BAM interface). However - personally I like to write certain things in c++ instead of python - my code is better organised and I normally think more before writing it and regarding memory efficiency and speed, c++ is way better than python (at least in my programs and scripts).

                    Comment


                    • #11
                      Originally posted by schmima View Post
                      in case you get angry while trying to read the "documentation" of the C-API, have a look at bamtools instead (on samtools.com on the right there's a section "other language bindings" - bamtools is the one in c++). Found it way easier to understand and use


                      [EDIT - uups - sorry for the double post...]
                      You read my mind - thanks for your tip !


                      @Simon: Thanks but I am working on large files and I want to make it very memory efficient.

                      Comment


                      • #12
                        *Just for the community*
                        I also found TIGR++ :http://www.cbcb.umd.edu/software/pirate/tigr++.shtml

                        C++ class library used by several TIGR genefinders and other packages. Covers string & sequence processing, math/statistics, many efficient data structures, GFF parsing, sorting, and I/O.

                        Cheers

                        toni

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Current Approaches to Protein Sequencing
                          by seqadmin


                          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                          04-04-2024, 04:25 PM
                        • seqadmin
                          Strategies for Sequencing Challenging Samples
                          by seqadmin


                          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                          03-22-2024, 06:39 AM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, 04-11-2024, 12:08 PM
                        0 responses
                        25 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-10-2024, 10:19 PM
                        0 responses
                        28 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-10-2024, 09:21 AM
                        0 responses
                        24 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-04-2024, 09:00 AM
                        0 responses
                        52 views
                        0 likes
                        Last Post seqadmin  
                        Working...
                        X