Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • perencia
    Junior Member
    • Jun 2010
    • 6

    Performance improvements

    Hi,

    I was looking for any open-source tool on bioinformatics algorithms to make performance improvements on it - I was thinking on cluster parallelization but I accept suggestions, multi-core utilization, threads, etc.. -. As a newcomer to bioinformatics, i've seen that many of the tools already have some kind of parallelization, but i don't known if there's something worth to improve and what are the projects which would benefit more from that... actually, i'm a little bit lost

    So, any suggestion is greatly appreciated

    Thanks!
  • dawe
    Senior Member
    • Apr 2009
    • 258

    #2
    Originally posted by perencia View Post
    Hi,

    I was looking for any open-source tool on bioinformatics algorithms to make performance improvements on it - I was thinking on cluster parallelization but I accept suggestions, multi-core utilization, threads, etc.. -. As a newcomer to bioinformatics, i've seen that many of the tools already have some kind of parallelization, but i don't known if there's something worth to improve and what are the projects which would benefit more from that... actually, i'm a little bit lost

    So, any suggestion is greatly appreciated

    Thanks!
    I believe it's quite the opposite... the level of parallelism is quite low (and it has been increasing only in the last year)...
    BTW, take a look at this:

    http://savannah.gnu.org/projects/parallel

    d

    Comment

    • perencia
      Junior Member
      • Jun 2010
      • 6

      #3
      Thanks dawe,

      Actually i was thinking more at a source level, MPI cluster approach, in the line of mpiblast http://www.mpiblast.org/ or OpenMP, for shared memory.

      Comment

      • dawe
        Senior Member
        • Apr 2009
        • 258

        #4
        Originally posted by perencia View Post
        Thanks dawe,

        Actually i was thinking more at a source level, MPI cluster approach, in the line of mpiblast http://www.mpiblast.org/ or OpenMP, for shared memory.
        I think improvements are (or should be) very welcome, although patches are not easily merged to mainstream code, especially when there's no team behind an application and there's a single developer instead...
        I've written myself a patch to add OpenMP support to clover, but it has never been accepted :-(
        Also, consider that most of the code is written by bio-experts and sometimes it may be hard to parallelize it, mainly for two reasons:
        1- poorly commented code
        2- obscure blocks which may need code refactoring...

        BTW, there are examples of optimization for NGS, take a look to mummer-gpu (by the same authors of bowtie and tophat).

        d

        Comment

        • NGSfan
          Senior Member
          • Apr 2009
          • 181

          #5
          Originally posted by perencia View Post
          Hi,

          I was looking for any open-source tool on bioinformatics algorithms to make performance improvements on it - I was thinking on cluster parallelization but I accept suggestions, multi-core utilization, threads, etc.. -. As a newcomer to bioinformatics, i've seen that many of the tools already have some kind of parallelization, but i don't known if there's something worth to improve and what are the projects which would benefit more from that... actually, i'm a little bit lost

          So, any suggestion is greatly appreciated

          Thanks!
          Well, if your are looking for a problem - I would say if someone could come up with a parallelization of multiple sequence realignment of the reads (either CPU or GPU based), that would be an opportunity. The idea of going back to realign reads to form a consensus around an indel is relatively new compared to the regular read-to-reference mapping problem, which is already saturated. Of course multiple sequence alignment is an old problem, so the algorithms are out there to do it - just the parallelization is needed.

          Right now the two leading programs GATK and SRMA:



          Download Short Read Micro re-Aligner for free. SRMA is a post-alignment micro re-aligner for next-generation high throughput sequencing data.

          Comment

          • lh3
            Senior Member
            • Feb 2008
            • 686

            #6
            In the area of alignment, there are not so many parallelization problems. You can just split your read set in alignment. Realignment can be done region by region on an indexed BAM. You need scripts to automate the process, but this is of little academic interest.

            In addition, if you write your programs for big sequencing centers, you should not use MPI when this can be avoided such as for alignment. For assembly, MPI is a reasonable solution as you can hardly split the read set. GPU is even worse. You cannot expect we put an expensive GPU on every computer purely for the purpose of one or two programs.

            I see SSE as a more reasonable solution to parallelization, although it is algorithm dependent (this is also true for GPU). In addition, if you are mainly interested in research rather than practical applications, you may also try MPI/GPU, just few will use your product.

            Comment

            Latest Articles

            Collapse

            • SEQadmin2
              Nine Things a Sample Prep Scientist Thinks About Before Sequencing
              by SEQadmin2


              I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

              Here are nine questions we think about, in roughly the order they matter, before...
              06-18-2026, 07:11 AM
            • SEQadmin2
              From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
              by SEQadmin2


              Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


              The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
              ...
              06-02-2026, 10:05 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by SEQadmin2, 06-26-2026, 11:10 AM
            0 responses
            14 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-17-2026, 06:09 AM
            0 responses
            48 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-09-2026, 11:58 AM
            0 responses
            107 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-05-2026, 10:09 AM
            0 responses
            125 views
            0 reactions
            Last Post SEQadmin2  
            Working...