Ondov, thanks for the notes!
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
Hi ondovb,
Many thanks for the notes! That was actually my concern.. My team had a discussion over the running time as we were not looking at Arabidopsis samples.
Which brings me to something else I just thought of: What is the difference between running on multiple threads and multiple nodes? I currently put threads=1 while total nodes=10..
On a side note, I realised that my processes that have been split into 10 nodes are all going into sleep mode.. Could this be because I didn't allocate enough RAM?
Comment
-
Threads: should be the number of cores you want to use on each node. You mentioned you have 8 cores per node, so you'll want 8 threads to use them all.
Running time: will be linear with respect to genome length. Our data took 480 cpu hours, so yours (assuming a similar # of reads) should take 480 * 30 = 14400 cpu hours. If you use all 40 * 8 cores on your cluster, you're looking at about 45 hours.
Sleeping: if you remembered to include the -p flag, I'm not sure what else could cause this. Have you tried running it locally with the same settings and watching the output?
Comment
-
Hi ondovb,
Yes, Im running it locally but it seems to be stuck at the aligning stage:
Round 1 / 4 (2101986 reads):
Sensitivity 4:
EDIT: I have used strace on the process and found it to be at the following state:
futex(0x40dd79d0, FUTEX_WAIT, 25312, NULLLast edited by Haneko; 07-01-2010, 07:06 PM.
Comment
-
I think each instance might appear to be sleeping to the OS because the parent thread just sits and waits for the child threads to finish their computation (even if only one thread is chosen). What does the CPU usage look like?
Sensitivity 4 will take a pretty long time (even on your cluster), which could make it appear to be stuck. I wouldn't recommend going higher than 3. If you set the trim to at least 3, that should get rid of a lot of the errors and you should still be able to align a lot of reads.
Comment
-
-
-
I have aligned a subset of the reads on my machine and have some questions.
I received several warnings (e.g. '5719579 substrings of chr1.fa ignored due to 5718003 character(s) other than [ACGTacgt]'). The Ns in the reference file(s) cause this problem and I don't know the impact of the warnings on the overall analysis.
At the end of the aligning part is says 'computing error frequencies'. What does this mean?
Does SOCS-B run faster, if all reference files would be merged into one multiFASTA reference file?
I struggle to understand the difference between the mismatch sensitivity (s) and the tolerance (t). Could you briefly explain these two parameters? Can I set them independently?
Comment
-
Originally posted by fwessely View PostI received several warnings (e.g. '5719579 substrings of chr1.fa ignored due to 5718003 character(s) other than [ACGTacgt]'). The Ns in the reference file(s) cause this problem and I don't know the impact of the warnings on the overall analysis.
Originally posted by fwessely View PostAt the end of the aligning part is says 'computing error frequencies'. What does this mean?
Originally posted by fwessely View PostDoes SOCS-B run faster, if all reference files would be merged into one multiFASTA reference file?
Originally posted by fwessely View PostI struggle to understand the difference between the mismatch sensitivity (s) and the tolerance (t). Could you briefly explain these two parameters? Can I set them independently?
Comment
Latest Articles
Collapse
-
by seqadmin
The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...-
Channel: Articles
Yesterday, 07:01 AM -
-
by seqadmin
Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...-
Channel: Articles
04-04-2024, 04:25 PM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 04-11-2024, 12:08 PM
|
0 responses
55 views
0 likes
|
Last Post
by seqadmin
04-11-2024, 12:08 PM
|
||
Started by seqadmin, 04-10-2024, 10:19 PM
|
0 responses
51 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 10:19 PM
|
||
Started by seqadmin, 04-10-2024, 09:21 AM
|
0 responses
45 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 09:21 AM
|
||
Started by seqadmin, 04-04-2024, 09:00 AM
|
0 responses
55 views
0 likes
|
Last Post
by seqadmin
04-04-2024, 09:00 AM
|
Comment