Hi,
This has been cross-posted at the TGICL sourceforge page (https://sourceforge.net/p/tgicl/support-requests/3/)
I'm using TGICL clustering pipeline, but I'm really stuck in the assembly (cap3 step) of just one cluster which is holding up my whole project, and I'm hoping someone has seen this before and can provide a solution.
I have no problem with the clustering phase, but I'm running into a segmentation fault for one of my asm directories during assembly.
I've installed the 64-bit, Intel version of CAP3 as part of the TGICL pipeline. My server is running on Intel Xeon processors (SUSE Linux Enterprise Server), and I can allocate up to 256 Gb of memory per node, but I don't think memory limitation is the problem.
Using 16 CPUS, all but one assembly directory have assembled ok. In the asm_1 directory that runs into the error, this is what I get:
####
sh: line 1: 63326 Segmentation fault cap3 CL1 -p 93 > CL1.align
Error! cap3 failure detected (code=35584) on: CL1
####
It seg faults almost immediately when cap3 tries to assemble this cluster (other asm dirs are ok).
I have tried it independently (cap3 standalone) on just this directory as well, with the same error.
In contrast, I have another assembly which runs to completion just fine. The only difference is the size of the cluster.
Failed cluster:
1,359,975,661 bytes (1.35Gb)
962,144 sequences
Successful cluster:
1,135,345,720 bytes (1.1Gb)
731,031 sequences
The successful cluster ran on a 64Gb node, but the failed cluster gave seg fault even on 256Gb node. As an aside, I did try the 64bit opteron version of cap3, and it actually started running but died on a memory error after about 60 hours (this happened even on 256Gb node).
####
Ran out of memory: 52067036860 bytes requested.
Error! cap3 failure detected (code=256) on: CL1
####
So I think it's something in the cap3 (intel, 64bit) code, that is not allocating memory correctly when input is above certain size.
I might have to contact cap3 author to address this problem, but has anyone else encountered this and came up with a solution? Any suggestions/comments greatly appreciated.
This has been cross-posted at the TGICL sourceforge page (https://sourceforge.net/p/tgicl/support-requests/3/)
I'm using TGICL clustering pipeline, but I'm really stuck in the assembly (cap3 step) of just one cluster which is holding up my whole project, and I'm hoping someone has seen this before and can provide a solution.
I have no problem with the clustering phase, but I'm running into a segmentation fault for one of my asm directories during assembly.
I've installed the 64-bit, Intel version of CAP3 as part of the TGICL pipeline. My server is running on Intel Xeon processors (SUSE Linux Enterprise Server), and I can allocate up to 256 Gb of memory per node, but I don't think memory limitation is the problem.
Using 16 CPUS, all but one assembly directory have assembled ok. In the asm_1 directory that runs into the error, this is what I get:
####
sh: line 1: 63326 Segmentation fault cap3 CL1 -p 93 > CL1.align
Error! cap3 failure detected (code=35584) on: CL1
####
It seg faults almost immediately when cap3 tries to assemble this cluster (other asm dirs are ok).
I have tried it independently (cap3 standalone) on just this directory as well, with the same error.
In contrast, I have another assembly which runs to completion just fine. The only difference is the size of the cluster.
Failed cluster:
1,359,975,661 bytes (1.35Gb)
962,144 sequences
Successful cluster:
1,135,345,720 bytes (1.1Gb)
731,031 sequences
The successful cluster ran on a 64Gb node, but the failed cluster gave seg fault even on 256Gb node. As an aside, I did try the 64bit opteron version of cap3, and it actually started running but died on a memory error after about 60 hours (this happened even on 256Gb node).
####
Ran out of memory: 52067036860 bytes requested.
Error! cap3 failure detected (code=256) on: CL1
####
So I think it's something in the cap3 (intel, 64bit) code, that is not allocating memory correctly when input is above certain size.
I might have to contact cap3 author to address this problem, but has anyone else encountered this and came up with a solution? Any suggestions/comments greatly appreciated.