The manual suggests one day given enough CPUs. Anyone with experience here? What is the speed of your CPUs? How many do you use? How long does it take in your hands?
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
Unless I have missed something in the new release, each of the corona lite programs runs on a single processor. However like many bioinformatics programs the corona lite programs can be run in 'embarrassingly parallel' mode. I.e., break down your reference sequence by chromosome or other convenient segment and/or the SOLiD file into enough parts to use up your processors.
The matching part of the corona lite pipeline has 6 parts with the 1st and 3rd part being able to be split up. The other 4 parts are solely single processor but also are really just file copies and thus can be fairly fast.
As for overall time it depends, obviously, on the size of your SOLiD data set -- those 14-20 GB files take a while to toss around -- and your reference sequence. The time also go up in a non-linear fashion depending on how many mis-matches you wish to take into consideration.
A big consideration is having enough disk space, both temporary and permanent, to handle the files.
Since I usually work with partially assembled genomes (i.e., lots of contigs) or CDS or EST projects it is quite often the case that I split up the reference into 64 parts and use all 64 CPUs that I have at my disposal. The ultimate speed of the CPUs really doesn't matter that much. Obviously the faster the better. But I would concentrate more on disk speed and physical memory and exactly how many mismatches you want. 1 mismatch is trivial. 3 (the recommend) less so. 6 or more almost impossible on any sizable dataset.
And, yes, I would say 1-2 days of processing given enough CPUs. My recent work on the bee assembly 4 took about 36 hours to go through the matching steps. But I didn't break down the chromosomes nor SOLiD data set and so only used about 1/4 of my CPUs. There are other people on the machine and despite my hoggish nature I did want to play nicely (for once!) SNP calling added time to that process.Last edited by westerman; 01-12-2009, 01:48 PM.
-
It should be possible to match using 1 CPU given enough memory (4 GB). Given my experience I would expect running times of about 3 weeks for a non-paired mapping of a SOLiD data set to the reference bee genome. SNP calling would probably take an extra week. But I may be pessimistic.
In any case it will take time and you better hope that your computer stays up and running during the process. Last week I had two instances of the computer or file server crashing on me. They were rare instances that should not occur but irritating never-the-less.
Comment
-
You could also try a program I have authored caled BFAST: the Blat-like Fast Accurate Search Tool. You can find download instructions at:
Nils Homer
Comment
-
If your time is precious, try ISAS. Native colorspace and as far as we know its the fastest - if I'm wrong and there is a faster solution please enlighten me !
100 million 25mers on one computer in 30 minutes.
3G human reference, 2 mismatches.
Results identical to corona (just 100 times faster) and same format
See the ISAS thread for more info.
Comment
Latest Articles
Collapse
-
by seqadmin
The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...-
Channel: Articles
04-22-2024, 07:01 AM -
-
by seqadmin
Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...-
Channel: Articles
04-04-2024, 04:25 PM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, Yesterday, 11:49 AM
|
0 responses
15 views
0 likes
|
Last Post
by seqadmin
Yesterday, 11:49 AM
|
||
Started by seqadmin, 04-24-2024, 08:47 AM
|
0 responses
16 views
0 likes
|
Last Post
by seqadmin
04-24-2024, 08:47 AM
|
||
Started by seqadmin, 04-11-2024, 12:08 PM
|
0 responses
62 views
0 likes
|
Last Post
by seqadmin
04-11-2024, 12:08 PM
|
||
Started by seqadmin, 04-10-2024, 10:19 PM
|
0 responses
60 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 10:19 PM
|
Comment