Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Memory issues running Bismark/Bowtie alignment on a cluster

    Hi all,

    I'm working on aligning bisulfite converted sequence reads using Bismark and Bowtie (v1). The samples that I'm running are being aligned to the hg19 genome from UCSC. I believe that the Fastq files that I'm working with (i.e. submitting to Bismark) have about 140-150 million sequence pairs in total.

    These alignments are being run on a central cluster at my research institute but I'm getting odd errors that I've been unable to interpret. Most of them seem to be related to running out of memory, but the cluster has a large amount of RAM (as far as I can tell, about 18 Gb per node).

    Most alignments are failing before the ever start processing any sequences. Typical errors are:

    Failed to open zcat pipe to [$filename] Cannot allocate memory
    Error while flushing and closing output
    terminate called after throwing an instance of 'int'


    Also:

    Out of memory!
    Error while flushing and closing output
    terminate called after throwing an instance of 'int'


    And:

    Out of memory allocating the ebwt[] array for the Bowtie index. Please try
    again on a computer with more memory.


    Others have failed during the sequence processing:

    gzip: stdout: Broken pipe
    Error while flushing and closing output


    The odd thing is that a lot of these alignments failed the first or second time, but then worked perfectly well on the next attempt without changing any parameters.

    From what I've read, Bismark and/or Bowtie hold the entire reference genome in memory (i.e. RAM) during the alignment process, so there needs to be enough space to hold that information (for hg19, I've read that this is about 8–10 Gb). So I'm wondering if I need to allocate a specific amount of memory to each script when I submit it to the cluster, but I have no idea how much I should give. The cluster is using the PBS job resource manager, so I tried using pmem=4gb (across each of 4 processors), but that failed, and I tried again using pmem=9gb, but that never ran (I'm guessing because the system didn't have enough resources to ever run it).

    Does anyone have experience running Bismark (or even just Bowtie) on a scientific cluster, or have suggestions for what I could try next? Any help at all would be appreciated.

    Thanks,
    Daniel

  • #2
    Have you tried subsampling your data? Take ten percent of your reads or something like it through the workflow?

    I'm more a bowtie2 user...but in either case it may help to post the commands you're using.

    Comment


    • #3
      Yes, I have, using the -u flag that Bismark provides for reading the first x reads. As far as I can remember this worked quite reliably.

      Actually I should also mention that the jobs which are currently failing are chunked files, each containing 1/4 of the reads in the original Fastq file.

      The command that I'm running is just a shell script containing the command
      bismark -n 2 --non_directional -1 [$gzipped fastq file] -2 [$gzipped fastq file] &> [$log file]

      It's being submitted to the cluster with
      qsub -l nodes=1: ppn=4,walltime=20:00:00:00 [$shell script]

      Comment


      • #4
        Daniel: If this is a "shared" use cluster then you are likely running out of memory because there may be other jobs that are running on the same node (18GB per node is not a large amount BTW).

        Have you tried to run the job requesting exclusive access (so the only job on that node will be yours)?

        Are you running PBS or SGE (looks like PBS but want to confirm)?

        Comment


        • #5
          Yes, it is a shared cluster. Perhaps that's the issue then.

          It is indeed PBS. I understood though that if I use a command like:

          qsub -l nodes=1: ppn=4

          then I would get a node entirely to myself—is that not the case?

          Comment


          • #6
            Originally posted by daniel_g View Post
            Yes, it is a shared cluster. Perhaps that's the issue then.

            It is indeed PBS. I understood though that if I use a command like:

            qsub -l nodes=1: ppn=4

            then I would get a node entirely to myself—is that not the case?
            Yes that looks like you are requesting exclusive access (not a PBS user myself but the command looks right). Is there a space between the : and ppn=4. That should not be there.

            Also want to verify that bowtie you are using has been compiled for 64-bit?

            Comment


            • #7
              18 gigs may simply not be enough for a non-directional library (I was running into issues with non-directional libraries on my desktop computer when I had about that much RAM). Remember that in addition to storing the entire genome in memory, you're also loading a bowtie index for that genome 4 times (plus all of the memory for buffering). That can occupy a fair bit of space.

              If you can use more than one node on that cluster at a time, are fine with using bowtie2, and are generally comfortable with compiling code, you might try bison. It should have a lower per-node memory requirement, since it splits the instances of bowtie2 onto individual nodes.

              Comment


              • #8
                When running Bismark for a human genome on the cluster (default mode) I personally tend to request 7 cores and ~12-14GB of RAM (the many cores are 1 for Bismark, 2 for Bowtie, and 4 for streaming/writing to gzipped files). Bowtie2 might use slightly more than that.

                Just for the record, the number of reads in the input files doesn't have any significant impact on the amount of memory used, it really is what the others have described above.

                Comment


                • #9
                  The space between ":" and "p" was just to prevent the forum from turning it into an emoticon.

                  Yes, I'm using 64-bit bowtie.

                  Hm, that might explain why I've had fewer issues with directional alignment compared to non-directional. Good to know, thanks.

                  I think we're pretty set on using bowtie1 but perhaps I'll take a look at bison. Thank you.

                  Comment


                  • #10
                    Originally posted by daniel_g View Post
                    The space between ":" and "p" was just to prevent the forum from turning it into an emoticon.
                    For future reference: When you edit a post use the "Go Advanced" button. That gives you additional tools (look for them at the top of the edit box) which can be used to mark command lines as "quotes" or "code" that prevents the translation into emoticons. Also improves their readability.

                    Comment


                    • #11
                      This happened to me, turned out I was out of space on my hard drive. Had to remove/zip some files.

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Techniques and Challenges in Conservation Genomics
                        by seqadmin



                        The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                        Avian Conservation
                        Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                        03-08-2024, 10:41 AM
                      • seqadmin
                        The Impact of AI in Genomic Medicine
                        by seqadmin



                        Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
                        02-26-2024, 02:07 PM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, 03-14-2024, 06:13 AM
                      0 responses
                      33 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 03-08-2024, 08:03 AM
                      0 responses
                      72 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 03-07-2024, 08:13 AM
                      0 responses
                      81 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 03-06-2024, 09:51 AM
                      0 responses
                      68 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X