Seqanswers Leaderboard Ad

**lh3** · 09-10-2011, 06:48 AM

As to GPU computing, you would like to check out this post first:

Why has no one ported popluar aligners to GPGPU code? - SEQanswers

http://seqanswers.com/forums/showthread.php?t=13497

Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

**PratikC** · 09-11-2011, 09:29 AM

I already checked that. But mummer and ugene are ported to GPGPU and my question is not limited to GPU but a comparison like workstation/server having multi core processor and single/dual HDD will be having problem because of extensive I/O in single HDD while clusters will have HDD in each node so will not face I/O problem but then we need software to be compatible with MPI. While GPU is very much faster but it will also face problem of I/O like server/workstation. So I just wanted some people's opinion and wanted to know about their experience in different platforms.

**GenoMax** · 09-12-2011, 04:07 AM

There is no single answer for a question such as this except saying "it would depend on what exactly you want to do".

From my experience, you can't go wrong with a small cluster (< 10 nodes) for doing all of the analyses that you have mentioned (if you need to do "de novo" assemblies of large genomes then that may require special consideration).

You would need to develop some patience depending on what hardware you choose since something may take an hour or two more to finish if you went with slightly slower CPU's/less RAM.

You have mentioned having a disk or two in each node but you would not want to move files in and out to these since that will cause a major overhead specially if you intend to analyze hundreds of samples. Your storage subsystem will be become important if you are going to go with a cluster since your processing will likely become I/O bound beyond a certain number of simultaneous processes. You would also want to invest in a better switch that would be tying all this together.

**gprakhar** · 09-12-2011, 04:41 AM

1) For GPU usage you obviously need algorithms that have been ported for GPU. And the list as you know is very small. We have a small test set of Tesla GPUs, but nothing to rum on that.
2) If you are looking at human Whole Genome alignment, then at least 10 Nodes are required to do that, that too if the Aligner is MPI enabled.
3) We started of 9 months ago with 20 Nodes with 48Gb RAM and Intel Xeon 12core CPU. Had to upgrade to 30 Nodes in less than six months, Same for storage, started with 25Tb and now upgraded it 50Tb.
4) Still 90% of the time the cluster is working on full capacity.
5) The storage has to be on the Network, a distributed Storage system will be a good idea. Panasas is a good choice. Its MPI enabled.
6) Also you would need Tape storage to archive the raw and analyzed data, from time to time.

**PratikC** · 09-12-2011, 06:57 AM

Thanx GenoMax and gprakhar.
I understand that GPU is not suitable for NGS data analysis at this point because no good software ported to GPU.
About cluster, what you people suggest is for network drive. I was thinking dedicated drive in each node will be better but as per your suggestion, I checked Panasas and seems to be good.
We are going to analyze 100s of samples of human whole exome. So if you have any idea about the computational requirement, that will be very useful.

For cluster, I have one more question, has anyone compiled NGS analysis software with MPI? I googled for bowtie and other software compiled with MPI but didnt found anything good. So if you have any suggestions for MPI enabled software, that will be much appreciated.

**rskr** · 09-12-2011, 07:37 AM

Its not worth writing GPU/FPGA oriented code for general bioinformatcs, what we really want are mpi code for clusters. The problem with single chip code is that generally the code doesn't scale, so while sometimes it can get great speedups, for a gigabase of data, when you have an exabase you still have to use a cluster, meanwhile there are many embarrassing parallel applications that run perfectly well on clusters, that don't take advantage of single chip speedups.

There might be applications for FPGA/GPU type code in the sequencing machines themselves, since they are pushing the limits of what an on board signal analysis pipeline can do, and need to be tightly coupled to the sequencer itself, and can't be disrupted by the eccentricities of a cluster, and aren't expected to do a wide range of computation.

**lh3** · 09-12-2011, 07:59 AM

For many NGS applications, I do not see much use of MPI. Many NGS tasks can be naturally parallelized by simply splitting reads into several chunks and then launching separate jobs on the computing nodes. I agree supporting MPI would be more convenient, but I do not think that deserves the extra development time. Also, MPI is not available everywhere. Configurations alone will push some users away. In the few genome centers I am/was working at, MPI supports are also very limited. These centers happily process terabytes of data every week across hundreds nodes without MPI at all.

MPI is only necessary for a few applications (such as some de novo assembly algorithms), but other than those it increases development time for little reward. The lack of MPI-supported tools itself implies my speculation is likely to be true.

**rskr** · 09-12-2011, 08:07 AM

Originally posted by lh3 View Post

For many NGS applications, I do not see much use of MPI. Many NGS tasks can be naturally parallelized by simply splitting reads into several chunks and then launching separate jobs on the computing nodes. I agree supporting MPI would be more convenient, but I do not think that deserves the extra development time. Also, MPIs are not available everywhere. Configurations alone will push some users away. In the few genome centers I am/was working at, MPI supports are also very limited. These centers happily process terabytes of data every week without MPI at all.

MPI is only necessary for a few applications (such as some de novo assembly algorithms), but other than those it increases development time for little reward. The lack of MPI-supported tools itself implies my speculation is likely to be true.

I said between MPI and embarrassing parallel jobs, there is little that GPU's can bring to the party. If you aren't aware, embarrassing parallel refer to jobs that can be "chunked" and run on separate processors with no interaction.

**lh3** · 09-12-2011, 08:11 AM

@rskr: I was not replying to you. I was replying to PratikC about why he sees so few MPI programs and why he should not rely on MPI programs. The thread I pointed out above has already concluded the discussion related to GPU computing.

**PratikC** · 09-12-2011, 10:09 AM

Thanks rskr and lh3. I understood the limitation of GPU at this time, may be in near future there will be many other software ported to GPU but right now cluster is better.
I still dont understand difference between using MPI and running job by spliting. What do I understand is if we have source code, we can compile executable using MPI library and we do have bowtie source code. And if we can not compile bowtie using MPI than how to split job and run them on cluster, I mean how do we distribute job in each node from a single terminal?
If that task is tedious than should we go for multi processor workstation? And what if we have a workstation with 2 or more Intel Xeon processors(quad core), do we need to do anything to utilize all cores for bowtie?

**GenoMax** · 09-12-2011, 12:07 PM

Have a look at this site: http://www.rocksclusters.org/wordpress/ This is a popular cluster OS distribuition.

Jobs are generally handled by a "job queuing system" (SGE (sun grid engine) or PBS are available for "rocks") in a cluster. There are commercial job schedulers (e.g. LSF from platform computing) that are generally pretty expensive.

Some of the aligners will allow you to use multiple threads for single jobs but it may be easiest to go the route of starting multiple alignment jobs in parallel for multiple files (or for a single sequence file split into pieces). You will need to combine and reconcile the multiple BAM files if you choose to split the original file.

Originally posted by PratikC View Post

I mean how do we distribute job in each node from a single terminal?
If that task is tedious than should we go for multi processor workstation? And what if we have a workstation with 2 or more Intel Xeon processors(quad core), do we need to do anything to utilize all cores for bowtie?

**zee** · 09-12-2011, 07:29 PM

We happily process NGS data using MPI and/or a scheduling system. The two work quite well if the configuration is done correctly.
Go for simple commodity hardware in the form of boxes or blade servers, depending on how many lanes/plates you need to process per week. A 96-200 core system will be sufficient to support most non-denovo assembly work for large eukaryote genomes. But with the cost of memory diminishing 96Gb per node is nice to have.
Be sure to have enough storage available for your analysis work. We usually recommend a minimum of 12Tb for small labs supporting 1 Illumina sequencer.

**gprakhar** · 09-13-2011, 01:20 AM

Originally posted by PratikC View Post

I still dont understand difference between using MPI and running job by spliting. What do I understand is if we have source code, we can compile executable using MPI library and we do have bowtie source code. And if we can not compile bowtie using MPI than how to split job and run them on cluster, I mean how do we distribute job in each node from a single terminal?

Just having a MPI library and the source code of the program is not enough. The program has to written using MPI functions. So eg, Bowtie which is not MPI enabled can not be run on multiple nodes. Hence for a big input, divide the total number of reads into 'n' parts then run each part as a separate input to Bowtie, on different Nodes.
But If you have something like Novoalign MPI, which is MPI enabled, this aligner can be run on multiple Nodes. We use it for most of our Alignment related work, it accurate and very fast. But its a commercial software.
Hence to use MPI, the dependency is the code and not the computing environment. As for using Bowtie with MPI, someone will have to re-write the code.

If that task is tedious than should we go for multi processor workstation? And what if we have a workstation with 2 or more Intel Xeon processors(quad core), do we need to do anything to utilize all cores for bowtie?

Yes that will give you a speed up. On a single node with multiple cores, run Bowtie with the one part of the split input. Hence for 'n' inputs run on 'n' nodes. On each Node use all the cores available. At the end merge all the files using eg. samtools

Regards
--
pg

**PratikC** · 09-13-2011, 06:28 AM

Originally posted by GenoMax View Post

Have a look at this site: http://www.rocksclusters.org/wordpress/ This is a popular cluster OS distribuition.

Jobs are generally handled by a "job queuing system" (SGE (sun grid engine) or PBS are available for "rocks") in a cluster. There are commercial job schedulers (e.g. LSF from platform computing) that are generally pretty expensive.

Some of the aligners will allow you to use multiple threads for single jobs but it may be easiest to go the route of starting multiple alignment jobs in parallel for multiple files (or for a single sequence file split into pieces). You will need to combine and reconcile the multiple BAM files if you choose to split the original file.

Thanks GenoMax.
I checked rocks, its really very good. I will try it out ASAP!

Originally posted by zee View Post

We happily process NGS data using MPI and/or a scheduling system. The two work quite well if the configuration is done correctly.
Go for simple commodity hardware in the form of boxes or blade servers, depending on how many lanes/plates you need to process per week. A 96-200 core system will be sufficient to support most non-denovo assembly work for large eukaryote genomes. But with the cost of memory diminishing 96Gb per node is nice to have.
Be sure to have enough storage available for your analysis work. We usually recommend a minimum of 12Tb for small labs supporting 1 Illumina sequencer.

Thanks zee,
as I said we are not doing whole genome analysis but exome and targeted sequencing and so I am thinking to start with 48 GB/node and 8 TB HDD. Please correct me if I am underestimating the work load of exome.

Originally posted by gprakhar View Post

Just having a MPI library and the source code of the program is not enough. The program has to written using MPI functions. So eg, Bowtie which is not MPI enabled can not be run on multiple nodes. Hence for a big input, divide the total number of reads into 'n' parts then run each part as a separate input to Bowtie, on different Nodes.
But If you have something like Novoalign MPI, which is MPI enabled, this aligner can be run on multiple Nodes. We use it for most of our Alignment related work, it accurate and very fast. But its a commercial software.
Hence to use MPI, the dependency is the code and not the computing environment. As for using Bowtie with MPI, someone will have to re-write the code.

Yes that will give you a speed up. On a single node with multiple cores, run Bowtie with the one part of the split input. Hence for 'n' inputs run on 'n' nodes. On each Node use all the cores available. At the end merge all the files using eg. samtools

Regards
--
pg

Thanks gprakhar,
I got your point. So for bowtie, spliting job and using scheduler will be suitable.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 30 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 32 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 53 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

GPU/Cluster/Server based systems for NGS

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News