Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Bowtie2 alignment error

    Using Bowtie2 I am currently trying to align only short scaffolds (<1kb) of a draft genome (C. chinense) to a reference genome (C. annuum Zunla). At first, I tried to accomplish the job by piping the output of Bowtie2 directly to SAMtools like this:

    $ bowtie2-build -f Capsicum.annuum.L_Zunla-1_Release_2.0.fasta Zunla_index
    $ bowtie2 --local --time --threads 8 -D 15 -R 2 -N 0 -L 20 -i S,1,0.65 -I 0 -X 500 -f -x Zunla_index -U Capsicum.chinense_PI159236_Release_0.5.fasta | samtools view -bSu - | samtools sort -m 90000000000 - Chinense_Zunla.sorted
    $ samtools index Chinense_Zunla.sorted.bam


    After roughly one hour the job gets kicked out with the following error message:

    Error: Out of memory allocating 17179877436 __m128i's for DP matrix: 'std::bad_alloc'
    terminate called after throwing an instance of 'std::bad_alloc'
    what(): std::bad_alloc
    (ERR): bowtie2-align died with signal 6 (ABRT) (core dumped)


    Alternatively I tried the same, but this time only the alignment step without piping:

    $ bowtie2 --local --time --threads 8 -D 15 -R 2 -N 0 -L 20 -i S,1,0.65 -I 0 -X 500 -f -x Zunla_index -U Capsicum.chinense_PI159236_Release_0.5.fasta -S Chinense_Zunla.sam

    This too resulted in the same error message as above. After this I tried running the job with less threads (4, 2 and even 1) and less memory allocation for SAMtools (as low as the default 500M). No matter what settings I use, the same error is being thrown. The resulting SAM output file contains 13 fully aligned scaffolds, but nothing appears to be wrong or weird about the scaffolds still to be aligned.

    The cluster I use has over 140GB of RAM, so that should not be the problem, especially not when using 2 or even 1 thread. There is also plenty of disk space present for the resulting output and temporary files.

    Some additional information:
    - The reference input genome is 3.18GB (fasta format)
    - The draft genome input file is 2.75GB and consists of scaffolds (fasta format)
    - The cluster is shared, but almost full RAM capacity was present during the run
    - Both fasta input files are validated not to be corrupted during downloading from the source, or to contain any characters/formatting not allowed
    - The Bowtie2 settings used have been successful for creating exactly the same alignment, as reported in a publication on sequencing of the Capsicum annuum genome

    Does anybody have any idea what might be causing this std::bad_alloc error?
    Last edited by MichielMunckhof; 02-06-2015, 04:31 AM.

  • #2
    I don't think I'd use bowtie2, a short read aligner, to try to align scaffolds to a reference genome. I suspect that that's what's blowing up the RAM usage. I suspect MUMmer would work better.

    Comment


    • #3
      When faced with a mysterious bug, I always start by upgrading to the latest software version available.
      In this case, it would be Bowtie 2.2.4.

      Comment


      • #4
        Thank you for your replies. @dpryan I have previously tried to use MUMmer for the job, but it failed due to bad memory allocation/out of memory as well. Apart from MUMmer I also tried LAST and Mugsy, but both failed to align the scaffolds. After reading the paper on the sequencing of Capsicum, I switched to Bowtie2 since the sequencing team exactly used the same scaffolds and reference genome with success.

        @blancha our cluster is currently running Bowtie 2.2.3, but I have also tried to run the job with the latest Bowtie2 version on another server, to make sure this is not an issue caused by an old version. This however seems not to be the case since the same error is thrown.
        Last edited by MichielMunckhof; 12-08-2014, 08:14 AM.

        Comment


        • #5
          Incidentally, your pipeline command is doubtful.
          You seem to be trying to sort a file before it is complete.
          You're also asking for 90GB of memory for sorting, which seems huge, but I suppose you could if you have the memory available. On the other hand, you're only using one thread for sorting whereas samtools does allow multithreaded sorting, at least the newer versions do.

          I didn't point this out in the previous post since you did specify that you did try without the piping command, and the error still occured.

          Comment


          • #6
            Thank you for the hint about the pipeline. I will check this command with some smaller fasta files, but I assumed it would work since others posted about this way of being able to pipe to samtools view and sort similtaneously. It should however not solve the bad allocation error, since it indeed also persists when using Bowtie2 align without pipe.

            Concerning the memory, you are right that 90GB might be huge for 1 thread, but this amount of memory can be allocated on the cluster and did seem appropriate for the 8 threads.

            Today I will check if the errors might be specific for the reference or draft genome fasta files by splitting them shortly after the scaffold 13 where the alignment fails.

            Comment


            • #7
              You may be right about the piping.
              I don't actually use piping very much because I find it complicated to recover the output and error messages, which I keep for logging purposes even when the command has run correctly.

              Anyway, here is the argument for multi-threaded sorting with samtools.
              I'll get back to work now.
              Code:
                -@ INT     Set number of sorting and compression threads [1]

              Comment


              • #8
                After abandoning this alignment for quite a while, I figured out why MUMmer and other more obvious software packages for scaffolds failed on the dataset. The input length of the reference genome exceeded the 32bit max integer size by far, causing the job to be kicked out of the cluster.

                Re-compiling these packages using CPPFLAGS="-O3 -DSIXTYFOURBITS" on Linux completely solved the problem. Not all packages reported the integer size as the error that caused the job to be cancelled, so this was causing some confusion.

                Thank you for all the help.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Current Approaches to Protein Sequencing
                  by seqadmin


                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                  04-04-2024, 04:25 PM
                • seqadmin
                  Strategies for Sequencing Challenging Samples
                  by seqadmin


                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                  03-22-2024, 06:39 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 04-11-2024, 12:08 PM
                0 responses
                27 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 10:19 PM
                0 responses
                30 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 09:21 AM
                0 responses
                26 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-04-2024, 09:00 AM
                0 responses
                52 views
                0 likes
                Last Post seqadmin  
                Working...
                X