Unconfigured Ad

**dawe** · 06-30-2010, 08:37 PM

Originally posted by perencia View Post

Hi,

I was looking for any open-source tool on bioinformatics algorithms to make performance improvements on it - I was thinking on cluster parallelization but I accept suggestions, multi-core utilization, threads, etc.. -. As a newcomer to bioinformatics, i've seen that many of the tools already have some kind of parallelization, but i don't known if there's something worth to improve and what are the projects which would benefit more from that... actually, i'm a little bit lost

So, any suggestion is greatly appreciated

Thanks!

I believe it's quite the opposite... the level of parallelism is quite low (and it has been increasing only in the last year)...
BTW, take a look at this:

http://savannah.gnu.org/projects/parallel

d

**perencia** · 06-30-2010, 10:51 PM

Thanks dawe,

Actually i was thinking more at a source level, MPI cluster approach, in the line of mpiblast http://www.mpiblast.org/ or OpenMP, for shared memory.

**dawe** · 06-30-2010, 11:23 PM

Originally posted by perencia View Post

Thanks dawe,

Actually i was thinking more at a source level, MPI cluster approach, in the line of mpiblast http://www.mpiblast.org/ or OpenMP, for shared memory.

I think improvements are (or should be) very welcome, although patches are not easily merged to mainstream code, especially when there's no team behind an application and there's a single developer instead...
I've written myself a patch to add OpenMP support to clover, but it has never been accepted :-(
Also, consider that most of the code is written by bio-experts and sometimes it may be hard to parallelize it, mainly for two reasons:
1- poorly commented code
2- obscure blocks which may need code refactoring...

BTW, there are examples of optimization for NGS, take a look to mummer-gpu (by the same authors of bowtie and tophat).

d

**NGSfan** · 07-01-2010, 05:08 AM

Originally posted by perencia View Post

Hi,

I was looking for any open-source tool on bioinformatics algorithms to make performance improvements on it - I was thinking on cluster parallelization but I accept suggestions, multi-core utilization, threads, etc.. -. As a newcomer to bioinformatics, i've seen that many of the tools already have some kind of parallelization, but i don't known if there's something worth to improve and what are the projects which would benefit more from that... actually, i'm a little bit lost

So, any suggestion is greatly appreciated

Thanks!

Well, if your are looking for a problem - I would say if someone could come up with a parallelization of multiple sequence realignment of the reads (either CPU or GPU based), that would be an opportunity. The idea of going back to realign reads to form a consensus around an indel is relatively new compared to the regular read-to-reference mapping problem, which is already saturated. Of course multiple sequence alignment is an old problem, so the algorithms are out there to do it - just the parallelization is needed.

Right now the two leading programs GATK and SRMA:

Access Denied

http://www.broadinstitute.org/gsa/wiki/index.php/Local_realignment_around_indels

Short Read Micro re-Aligner

http://sourceforge.net/apps/mediawiki/srma/index.php?title=Main_Page

Download Short Read Micro re-Aligner for free. SRMA is a post-alignment micro re-aligner for next-generation high throughput sequencing data.

**lh3** · 07-02-2010, 06:08 AM

In the area of alignment, there are not so many parallelization problems. You can just split your read set in alignment. Realignment can be done region by region on an indexed BAM. You need scripts to automate the process, but this is of little academic interest.

In addition, if you write your programs for big sequencing centers, you should not use MPI when this can be avoided such as for alignment. For assembly, MPI is a reasonable solution as you can hardly split the read set. GPU is even worse. You cannot expect we put an expensive GPU on every computer purely for the purpose of one or two programs.

I see SSE as a more reasonable solution to parallelization, although it is algorithm dependent (this is also true for GPU). In addition, if you are mainly interested in research rather than practical applications, you may also try MPI/GPU, just few will use your product.

Topics	Statistics	Last Post
Large-Scale Protein Screen Uncovers Hidden Regulators of Alternative Polyadenylation by SEQadmin2 Started by SEQadmin2, 06-26-2026, 11:10 AM	0 responses 14 views 0 reactions	Last Post by SEQadmin2 06-26-2026, 11:10 AM
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, 06-17-2026, 06:09 AM	0 responses 48 views 0 reactions	Last Post by SEQadmin2 06-17-2026, 06:09 AM
Sequencing the Two-Toed Sloth Genome Reveals Jumping Genes Tied to Its Extreme Metabolism by SEQadmin2 Started by SEQadmin2, 06-09-2026, 11:58 AM	0 responses 107 views 0 reactions	Last Post by SEQadmin2 06-09-2026, 11:58 AM
A New Method Makes Hantavirus Genome Analysis Faster and More Accessible by SEQadmin2 Started by SEQadmin2, 06-05-2026, 10:09 AM	0 responses 125 views 0 reactions	Last Post by SEQadmin2 06-05-2026, 10:09 AM

Unconfigured Ad

Performance improvements

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News