Unconfigured Ad

**Chipper** · 03-07-2008, 11:32 AM

Hi,
thanks for a nice benchmark. So far, I have only just tried the SOAP program for Solexa reads but it seems to be very fast. How can Eland be so much faster, even when SOAP is doing ungapped alignments? As I understand it the algorithm is more or less the same. What system was this run on and are the programs using the same number of cores?

cheers,
Ola

**bioinfosm** · 03-10-2008, 09:12 AM

great work !!

**Malarky67** · 03-10-2008, 10:27 AM

Hi
SOAP does seem slow. You haven't set it to work on more than one processor. Was this simulation done on a single processor machine?

e.g. from SOAP usage

-p <int> number of processors to use, default=1

Does Eland automatically have the same behaviour?

**Chipper** · 03-10-2008, 02:12 PM

Originally posted by Malarky67 View Post

Hi
SOAP does seem slow. You haven't set it to work on more than one processor. Was this simulation done on a single processor machine?

e.g. from SOAP usage

-p <int> number of processors to use, default=1

Does Eland automatically have the same behaviour?

I tried SOAP some more and it seems to adjust seed size to available memory, which is likely to affect the run time. As I understand it Eland places reads not reference in memory.

**Malarky67** · 03-10-2008, 02:37 PM

Originally posted by Chipper View Post

I tried SOAP some more and it seems to adjust seed size to available memory, which is likely to affect the run time. As I understand it Eland places reads not reference in memory.

Yes. That is what I have heard. Does anyone understand how these algorithms are parallelised across multiple processors or even nodes of a cluster? (especially in the case of nodes are reference tables built for each node?)

**lh3** · 03-13-2008, 03:47 PM

Eland supports multithreading in its source codes, but apparently it is not activated by default (if I am right). In fact, as Eland is fast and small, we can invoke several eland on a node at the same time. Parallelization on a large cluster can be easily managed by LSF or SGE.

On a node, SOAP can be parallelized with -p (the CPU time should be similar with or without -p) and on a cluster you can also use LSF/SGE. However, as soap may require huge memory, you will have to design a clever strategy, based on the hardware configurations, to run it efficiently without breaking your clusters. In addition, LSF/SGE usually prefers single-thread jobs. Multi-thread jobs may reduce the overall efficiency of a cluster unless all the nodes in the cluster has dozens of processors.

In my view, indexing the reference helps to get faster speed but tends to be memory demanding for human alignment. Indexing reads is more scalable but may sacrify some speed. Eland is still faster firstly because Tony Cox at Illumina is one of the best programmers in this field and secondly because soap has to trade speed for memory. Using long seed for human alignment is impractical.

Anyway, all the software in this benchmark are great. Sometimes which to use is just subjected to your appetite.

**Chipper** · 03-14-2008, 07:24 AM

I just tried SOAP with and without -p 4 on a quadcore, it was about 3 times faster with all four cores active (2.5 M reads, chr1).

**lh3** · 03-14-2008, 08:41 AM

Originally posted by Chipper View Post

I just tried SOAP with and without -p 4 on a quadcore, it was about 3 times faster with all four cores active (2.5 M reads, chr1).

You must be couting the wall-clock time or the time soap was printing out. Although wall-clock time should be considered, it is more important to evaluate on processor time. If you split your read file into 4 chunks and align them with 4 eland, I am sure eland will be still several times faster.

Here is the follow-up. When '-p 4 -g 0 -c 0 -r 0' is applied:

real 6m12.443s
user 23m37.292s (=1417.292 sec)
sys 0m3.799s

As we would expect, it takes more user time (vs. 1228.12 sec). Anyway, multithreading is still useful if you do not have many reads and want to get the result quickly. This is definitely the gain of multithreading. SOAP is also a great software. Thank you for pointing this out.

**Chipper** · 03-14-2008, 09:36 AM

I was not aware that there is more than one clock to watch...

Anyway, the main advantage for SOAP over Eland is that you can actually download and use it, plus the ability to use longer reads, or trim them if no match is found. Which happens quite frequently...

**brudno** · 03-21-2008, 07:50 AM

Hi --

Any interest on trying this with SHRiMP? I am guessing it'll be much slower than the other tools, though we are working to improve this. While the main use for shrimp is probably going to be SOLiD, we do support solexa as well.

Cheers, -Mike

**zee** · 04-12-2008, 11:37 PM

Dataset

Nice work Ih3,

I think this is an invaluable exercise amidst all the emerging tools. Is the benchmark dataset available anywhere because I'd like to test similar metrics on tools that we have in our lab?

Thanks for your help.

**lh3** · 04-14-2008, 01:39 AM

The simulated data are free to use, of course, but unfortunately I could not find a FTP site to upload the data. You may try maq to simulate the reads by yourself at the moment. The read positions are coded in read names and so you can write your own script to evaluate the accuracy.

**zee** · 04-14-2008, 06:48 AM

Thanks, I just got the maq package and I'd like to simulate data according to the same rules you used.
Is it correct to assume that in cases where I use eland that I won't be able to make use of the fastq format and instead I need to convert to fasta using a script?
The simulation steps seem quite straightforward.

**zee** · 05-02-2008, 07:08 AM

How does maq calculate mapping quality? I looked in the documentation and couldn't find anything. Is it based on the ASCII quality code in the query sequence?

Topics	Statistics	Last Post
Sequencing the Two-Toed Sloth Genome Reveals Jumping Genes Tied to Its Extreme Metabolism by SEQadmin2 Started by SEQadmin2, 06-09-2026, 11:58 AM	0 responses 15 views 0 reactions	Last Post by SEQadmin2 06-09-2026, 11:58 AM
A New Method Makes Hantavirus Genome Analysis Faster and More Accessible by SEQadmin2 Started by SEQadmin2, 06-05-2026, 10:09 AM	0 responses 26 views 0 reactions	Last Post by SEQadmin2 06-05-2026, 10:09 AM
A New Single-Cell Method Maps DNA-Protein Interactions by SEQadmin2 Started by SEQadmin2, 06-04-2026, 08:59 AM	0 responses 37 views 0 reactions	Last Post by SEQadmin2 06-04-2026, 08:59 AM
Long-Read RNA Sequencing Uncovers a Hidden Layer of Immune Cell Regulation by SEQadmin2 Started by SEQadmin2, 06-02-2026, 12:03 PM	0 responses 61 views 0 reactions	Last Post by SEQadmin2 06-02-2026, 12:03 PM

Unconfigured Ad

Preliminary benchmark of different alignment programs

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News