Hi All,
I have some pacbio long reads that I am attempting to error correct using using the Celera 7 pipeline. Essentially, it is a 2mb genome. I have ~50x coverage in long reads in a .fastq, and I'm using ~50x Circular Consensus Reads to error correct the long reads. I'm running this on a single server with 16 logical cores, and 72gb of RAM. Essentially, I'm using the default "high memory" spec file found at:
the only thing I changed in it was the "merylMemory" variable from 128,000 to 72,000 (commas added by me here for readability, not included in the .spec file). I was able to follow along the sourceforge wiki:
and its up and running, but it has been now for 8 days. Additionally, the memory usage on the machine is only ~3,300/72,000 and hasn't really moved around at all (although all 16 processors have been running at 100% this entire time). I feel like I'm under utilizing the system resources, and that this process shouldn't take as long as it has on this system.
Has anyone run a data set similar to this, on a machine like this? Or does it seem reasonable that it is taking as long as it is to complete this process?
Thanks!
I have some pacbio long reads that I am attempting to error correct using using the Celera 7 pipeline. Essentially, it is a 2mb genome. I have ~50x coverage in long reads in a .fastq, and I'm using ~50x Circular Consensus Reads to error correct the long reads. I'm running this on a single server with 16 logical cores, and 72gb of RAM. Essentially, I'm using the default "high memory" spec file found at:
the only thing I changed in it was the "merylMemory" variable from 128,000 to 72,000 (commas added by me here for readability, not included in the .spec file). I was able to follow along the sourceforge wiki:
and its up and running, but it has been now for 8 days. Additionally, the memory usage on the machine is only ~3,300/72,000 and hasn't really moved around at all (although all 16 processors have been running at 100% this entire time). I feel like I'm under utilizing the system resources, and that this process shouldn't take as long as it has on this system.
Has anyone run a data set similar to this, on a machine like this? Or does it seem reasonable that it is taking as long as it is to complete this process?
Thanks!
Comment