The manual suggests one day given enough CPUs. Anyone with experience here? What is the speed of your CPUs? How many do you use? How long does it take in your hands?
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
Unless I have missed something in the new release, each of the corona lite programs runs on a single processor. However like many bioinformatics programs the corona lite programs can be run in 'embarrassingly parallel' mode. I.e., break down your reference sequence by chromosome or other convenient segment and/or the SOLiD file into enough parts to use up your processors.
The matching part of the corona lite pipeline has 6 parts with the 1st and 3rd part being able to be split up. The other 4 parts are solely single processor but also are really just file copies and thus can be fairly fast.
As for overall time it depends, obviously, on the size of your SOLiD data set -- those 14-20 GB files take a while to toss around -- and your reference sequence. The time also go up in a non-linear fashion depending on how many mis-matches you wish to take into consideration.
A big consideration is having enough disk space, both temporary and permanent, to handle the files.
Since I usually work with partially assembled genomes (i.e., lots of contigs) or CDS or EST projects it is quite often the case that I split up the reference into 64 parts and use all 64 CPUs that I have at my disposal. The ultimate speed of the CPUs really doesn't matter that much. Obviously the faster the better. But I would concentrate more on disk speed and physical memory and exactly how many mismatches you want. 1 mismatch is trivial. 3 (the recommend) less so. 6 or more almost impossible on any sizable dataset.
And, yes, I would say 1-2 days of processing given enough CPUs. My recent work on the bee assembly 4 took about 36 hours to go through the matching steps. But I didn't break down the chromosomes nor SOLiD data set and so only used about 1/4 of my CPUs. There are other people on the machine and despite my hoggish nature I did want to play nicely (for once!) SNP calling added time to that process.Last edited by westerman; 01-12-2009, 01:48 PM.
-
It should be possible to match using 1 CPU given enough memory (4 GB). Given my experience I would expect running times of about 3 weeks for a non-paired mapping of a SOLiD data set to the reference bee genome. SNP calling would probably take an extra week. But I may be pessimistic.
In any case it will take time and you better hope that your computer stays up and running during the process. Last week I had two instances of the computer or file server crashing on me. They were rare instances that should not occur but irritating never-the-less.
Comment
-
You could also try a program I have authored caled BFAST: the Blat-like Fast Accurate Search Tool. You can find download instructions at:
Nils Homer
Comment
-
If your time is precious, try ISAS. Native colorspace and as far as we know its the fastest - if I'm wrong and there is a faster solution please enlighten me !
100 million 25mers on one computer in 30 minutes.
3G human reference, 2 mismatches.
Results identical to corona (just 100 times faster) and same format
See the ISAS thread for more info.
Comment
Latest Articles
Collapse
-
by seqadmin
The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...-
Channel: Articles
05-06-2024, 07:48 AM -
-
by seqadmin
The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...-
Channel: Articles
04-22-2024, 07:01 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 05-10-2024, 06:35 AM
|
0 responses
19 views
0 likes
|
Last Post
by seqadmin
05-10-2024, 06:35 AM
|
||
Started by seqadmin, 05-09-2024, 02:46 PM
|
0 responses
21 views
0 likes
|
Last Post
by seqadmin
05-09-2024, 02:46 PM
|
||
Started by seqadmin, 05-07-2024, 06:57 AM
|
0 responses
20 views
0 likes
|
Last Post
by seqadmin
05-07-2024, 06:57 AM
|
||
Started by seqadmin, 05-06-2024, 07:17 AM
|
0 responses
21 views
0 likes
|
Last Post
by seqadmin
05-06-2024, 07:17 AM
|
Comment