Hi all,
I'm working on aligning bisulfite converted sequence reads using Bismark and Bowtie (v1). The samples that I'm running are being aligned to the hg19 genome from UCSC. I believe that the Fastq files that I'm working with (i.e. submitting to Bismark) have about 140-150 million sequence pairs in total.
These alignments are being run on a central cluster at my research institute but I'm getting odd errors that I've been unable to interpret. Most of them seem to be related to running out of memory, but the cluster has a large amount of RAM (as far as I can tell, about 18 Gb per node).
Most alignments are failing before the ever start processing any sequences. Typical errors are:
Failed to open zcat pipe to [$filename] Cannot allocate memory
Error while flushing and closing output
terminate called after throwing an instance of 'int'
Also:
Out of memory!
Error while flushing and closing output
terminate called after throwing an instance of 'int'
And:
Out of memory allocating the ebwt[] array for the Bowtie index. Please try
again on a computer with more memory.
Others have failed during the sequence processing:
gzip: stdout: Broken pipe
Error while flushing and closing output
The odd thing is that a lot of these alignments failed the first or second time, but then worked perfectly well on the next attempt without changing any parameters.
From what I've read, Bismark and/or Bowtie hold the entire reference genome in memory (i.e. RAM) during the alignment process, so there needs to be enough space to hold that information (for hg19, I've read that this is about 8–10 Gb). So I'm wondering if I need to allocate a specific amount of memory to each script when I submit it to the cluster, but I have no idea how much I should give. The cluster is using the PBS job resource manager, so I tried using pmem=4gb (across each of 4 processors), but that failed, and I tried again using pmem=9gb, but that never ran (I'm guessing because the system didn't have enough resources to ever run it).
Does anyone have experience running Bismark (or even just Bowtie) on a scientific cluster, or have suggestions for what I could try next? Any help at all would be appreciated.
Thanks,
Daniel
I'm working on aligning bisulfite converted sequence reads using Bismark and Bowtie (v1). The samples that I'm running are being aligned to the hg19 genome from UCSC. I believe that the Fastq files that I'm working with (i.e. submitting to Bismark) have about 140-150 million sequence pairs in total.
These alignments are being run on a central cluster at my research institute but I'm getting odd errors that I've been unable to interpret. Most of them seem to be related to running out of memory, but the cluster has a large amount of RAM (as far as I can tell, about 18 Gb per node).
Most alignments are failing before the ever start processing any sequences. Typical errors are:
Failed to open zcat pipe to [$filename] Cannot allocate memory
Error while flushing and closing output
terminate called after throwing an instance of 'int'
Also:
Out of memory!
Error while flushing and closing output
terminate called after throwing an instance of 'int'
And:
Out of memory allocating the ebwt[] array for the Bowtie index. Please try
again on a computer with more memory.
Others have failed during the sequence processing:
gzip: stdout: Broken pipe
Error while flushing and closing output
The odd thing is that a lot of these alignments failed the first or second time, but then worked perfectly well on the next attempt without changing any parameters.
From what I've read, Bismark and/or Bowtie hold the entire reference genome in memory (i.e. RAM) during the alignment process, so there needs to be enough space to hold that information (for hg19, I've read that this is about 8–10 Gb). So I'm wondering if I need to allocate a specific amount of memory to each script when I submit it to the cluster, but I have no idea how much I should give. The cluster is using the PBS job resource manager, so I tried using pmem=4gb (across each of 4 processors), but that failed, and I tried again using pmem=9gb, but that never ran (I'm guessing because the system didn't have enough resources to ever run it).
Does anyone have experience running Bismark (or even just Bowtie) on a scientific cluster, or have suggestions for what I could try next? Any help at all would be appreciated.
Thanks,
Daniel
Comment