SEQanswers

Go Back   SEQanswers > Applications Forums > Epigenetics



Similar Threads
Thread Thread Starter Forum Replies Last Post
MiSeq MCS memory issues mcnelson.phd Illumina/Solexa 27 05-06-2014 09:15 AM
seq_crumbs memory issues DrWorm Bioinformatics 2 07-19-2013 06:40 AM
Bowtie: Ultrafast and memory-efficient alignment of short reads to the human genome Ben Langmead Literature Watch 2 03-04-2013 03:06 AM
Issues in running sff_extract yy01 Bioinformatics 3 09-23-2010 08:58 AM
BWA memory issues tgenahmet Bioinformatics 3 06-04-2009 03:55 AM

Reply
 
Thread Tools
Old 09-26-2013, 09:55 AM   #1
daniel_g
Junior Member
 
Location: Canada

Join Date: Sep 2013
Posts: 7
Question Memory issues running Bismark/Bowtie alignment on a cluster

Hi all,

I'm working on aligning bisulfite converted sequence reads using Bismark and Bowtie (v1). The samples that I'm running are being aligned to the hg19 genome from UCSC. I believe that the Fastq files that I'm working with (i.e. submitting to Bismark) have about 140-150 million sequence pairs in total.

These alignments are being run on a central cluster at my research institute but I'm getting odd errors that I've been unable to interpret. Most of them seem to be related to running out of memory, but the cluster has a large amount of RAM (as far as I can tell, about 18 Gb per node).

Most alignments are failing before the ever start processing any sequences. Typical errors are:

Failed to open zcat pipe to [$filename] Cannot allocate memory
Error while flushing and closing output
terminate called after throwing an instance of 'int'


Also:

Out of memory!
Error while flushing and closing output
terminate called after throwing an instance of 'int'


And:

Out of memory allocating the ebwt[] array for the Bowtie index. Please try
again on a computer with more memory.


Others have failed during the sequence processing:

gzip: stdout: Broken pipe
Error while flushing and closing output


The odd thing is that a lot of these alignments failed the first or second time, but then worked perfectly well on the next attempt without changing any parameters.

From what I've read, Bismark and/or Bowtie hold the entire reference genome in memory (i.e. RAM) during the alignment process, so there needs to be enough space to hold that information (for hg19, I've read that this is about 8–10 Gb). So I'm wondering if I need to allocate a specific amount of memory to each script when I submit it to the cluster, but I have no idea how much I should give. The cluster is using the PBS job resource manager, so I tried using pmem=4gb (across each of 4 processors), but that failed, and I tried again using pmem=9gb, but that never ran (I'm guessing because the system didn't have enough resources to ever run it).

Does anyone have experience running Bismark (or even just Bowtie) on a scientific cluster, or have suggestions for what I could try next? Any help at all would be appreciated.

Thanks,
Daniel
daniel_g is offline   Reply With Quote
Old 09-26-2013, 10:09 AM   #2
winsettz
Member
 
Location: US

Join Date: Sep 2012
Posts: 91
Default

Have you tried subsampling your data? Take ten percent of your reads or something like it through the workflow?

I'm more a bowtie2 user...but in either case it may help to post the commands you're using.
winsettz is offline   Reply With Quote
Old 09-26-2013, 10:16 AM   #3
daniel_g
Junior Member
 
Location: Canada

Join Date: Sep 2013
Posts: 7
Default

Yes, I have, using the -u flag that Bismark provides for reading the first x reads. As far as I can remember this worked quite reliably.

Actually I should also mention that the jobs which are currently failing are chunked files, each containing 1/4 of the reads in the original Fastq file.

The command that I'm running is just a shell script containing the command
bismark -n 2 --non_directional -1 [$gzipped fastq file] -2 [$gzipped fastq file] &> [$log file]

It's being submitted to the cluster with
qsub -l nodes=1: ppn=4,walltime=20:00:00:00 [$shell script]
daniel_g is offline   Reply With Quote
Old 09-26-2013, 11:00 AM   #4
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,015
Default

Daniel: If this is a "shared" use cluster then you are likely running out of memory because there may be other jobs that are running on the same node (18GB per node is not a large amount BTW).

Have you tried to run the job requesting exclusive access (so the only job on that node will be yours)?

Are you running PBS or SGE (looks like PBS but want to confirm)?
GenoMax is offline   Reply With Quote
Old 09-26-2013, 11:05 AM   #5
daniel_g
Junior Member
 
Location: Canada

Join Date: Sep 2013
Posts: 7
Default

Yes, it is a shared cluster. Perhaps that's the issue then.

It is indeed PBS. I understood though that if I use a command like:

qsub -l nodes=1: ppn=4

then I would get a node entirely to myself—is that not the case?
daniel_g is offline   Reply With Quote
Old 09-26-2013, 11:21 AM   #6
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,015
Default

Quote:
Originally Posted by daniel_g View Post
Yes, it is a shared cluster. Perhaps that's the issue then.

It is indeed PBS. I understood though that if I use a command like:

qsub -l nodes=1: ppn=4

then I would get a node entirely to myself—is that not the case?
Yes that looks like you are requesting exclusive access (not a PBS user myself but the command looks right). Is there a space between the : and ppn=4. That should not be there.

Also want to verify that bowtie you are using has been compiled for 64-bit?
GenoMax is offline   Reply With Quote
Old 09-26-2013, 12:07 PM   #7
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

18 gigs may simply not be enough for a non-directional library (I was running into issues with non-directional libraries on my desktop computer when I had about that much RAM). Remember that in addition to storing the entire genome in memory, you're also loading a bowtie index for that genome 4 times (plus all of the memory for buffering). That can occupy a fair bit of space.

If you can use more than one node on that cluster at a time, are fine with using bowtie2, and are generally comfortable with compiling code, you might try bison. It should have a lower per-node memory requirement, since it splits the instances of bowtie2 onto individual nodes.
dpryan is offline   Reply With Quote
Old 09-26-2013, 12:37 PM   #8
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 622
Default

When running Bismark for a human genome on the cluster (default mode) I personally tend to request 7 cores and ~12-14GB of RAM (the many cores are 1 for Bismark, 2 for Bowtie, and 4 for streaming/writing to gzipped files). Bowtie2 might use slightly more than that.

Just for the record, the number of reads in the input files doesn't have any significant impact on the amount of memory used, it really is what the others have described above.
fkrueger is offline   Reply With Quote
Old 09-26-2013, 12:40 PM   #9
daniel_g
Junior Member
 
Location: Canada

Join Date: Sep 2013
Posts: 7
Default

The space between ":" and "p" was just to prevent the forum from turning it into an emoticon.

Yes, I'm using 64-bit bowtie.

Hm, that might explain why I've had fewer issues with directional alignment compared to non-directional. Good to know, thanks.

I think we're pretty set on using bowtie1 but perhaps I'll take a look at bison. Thank you.
daniel_g is offline   Reply With Quote
Old 09-26-2013, 06:04 PM   #10
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,015
Default

Quote:
Originally Posted by daniel_g View Post
The space between ":" and "p" was just to prevent the forum from turning it into an emoticon.
For future reference: When you edit a post use the "Go Advanced" button. That gives you additional tools (look for them at the top of the edit box) which can be used to mark command lines as "quotes" or "code" that prevents the translation into emoticons. Also improves their readability.
GenoMax is offline   Reply With Quote
Old 10-04-2013, 08:18 PM   #11
gturco
Junior Member
 
Location: Davis

Join Date: Oct 2013
Posts: 1
Default

This happened to me, turned out I was out of space on my hard drive. Had to remove/zip some files.
gturco is offline   Reply With Quote
Reply

Tags
alignment, bismark, bisulfite sequencing, bowtie, cluster

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 03:18 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO