Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Bowtie2 alignment error

    Using Bowtie2 I am currently trying to align only short scaffolds (<1kb) of a draft genome (C. chinense) to a reference genome (C. annuum Zunla). At first, I tried to accomplish the job by piping the output of Bowtie2 directly to SAMtools like this:

    $ bowtie2-build -f Capsicum.annuum.L_Zunla-1_Release_2.0.fasta Zunla_index
    $ bowtie2 --local --time --threads 8 -D 15 -R 2 -N 0 -L 20 -i S,1,0.65 -I 0 -X 500 -f -x Zunla_index -U Capsicum.chinense_PI159236_Release_0.5.fasta | samtools view -bSu - | samtools sort -m 90000000000 - Chinense_Zunla.sorted
    $ samtools index Chinense_Zunla.sorted.bam


    After roughly one hour the job gets kicked out with the following error message:

    Error: Out of memory allocating 17179877436 __m128i's for DP matrix: 'std::bad_alloc'
    terminate called after throwing an instance of 'std::bad_alloc'
    what(): std::bad_alloc
    (ERR): bowtie2-align died with signal 6 (ABRT) (core dumped)


    Alternatively I tried the same, but this time only the alignment step without piping:

    $ bowtie2 --local --time --threads 8 -D 15 -R 2 -N 0 -L 20 -i S,1,0.65 -I 0 -X 500 -f -x Zunla_index -U Capsicum.chinense_PI159236_Release_0.5.fasta -S Chinense_Zunla.sam

    This too resulted in the same error message as above. After this I tried running the job with less threads (4, 2 and even 1) and less memory allocation for SAMtools (as low as the default 500M). No matter what settings I use, the same error is being thrown. The resulting SAM output file contains 13 fully aligned scaffolds, but nothing appears to be wrong or weird about the scaffolds still to be aligned.

    The cluster I use has over 140GB of RAM, so that should not be the problem, especially not when using 2 or even 1 thread. There is also plenty of disk space present for the resulting output and temporary files.

    Some additional information:
    - The reference input genome is 3.18GB (fasta format)
    - The draft genome input file is 2.75GB and consists of scaffolds (fasta format)
    - The cluster is shared, but almost full RAM capacity was present during the run
    - Both fasta input files are validated not to be corrupted during downloading from the source, or to contain any characters/formatting not allowed
    - The Bowtie2 settings used have been successful for creating exactly the same alignment, as reported in a publication on sequencing of the Capsicum annuum genome

    Does anybody have any idea what might be causing this std::bad_alloc error?
    Last edited by MichielMunckhof; 02-06-2015, 04:31 AM.

  • #2
    I don't think I'd use bowtie2, a short read aligner, to try to align scaffolds to a reference genome. I suspect that that's what's blowing up the RAM usage. I suspect MUMmer would work better.

    Comment


    • #3
      When faced with a mysterious bug, I always start by upgrading to the latest software version available.
      In this case, it would be Bowtie 2.2.4.

      Comment


      • #4
        Thank you for your replies. @dpryan I have previously tried to use MUMmer for the job, but it failed due to bad memory allocation/out of memory as well. Apart from MUMmer I also tried LAST and Mugsy, but both failed to align the scaffolds. After reading the paper on the sequencing of Capsicum, I switched to Bowtie2 since the sequencing team exactly used the same scaffolds and reference genome with success.

        @blancha our cluster is currently running Bowtie 2.2.3, but I have also tried to run the job with the latest Bowtie2 version on another server, to make sure this is not an issue caused by an old version. This however seems not to be the case since the same error is thrown.
        Last edited by MichielMunckhof; 12-08-2014, 08:14 AM.

        Comment


        • #5
          Incidentally, your pipeline command is doubtful.
          You seem to be trying to sort a file before it is complete.
          You're also asking for 90GB of memory for sorting, which seems huge, but I suppose you could if you have the memory available. On the other hand, you're only using one thread for sorting whereas samtools does allow multithreaded sorting, at least the newer versions do.

          I didn't point this out in the previous post since you did specify that you did try without the piping command, and the error still occured.

          Comment


          • #6
            Thank you for the hint about the pipeline. I will check this command with some smaller fasta files, but I assumed it would work since others posted about this way of being able to pipe to samtools view and sort similtaneously. It should however not solve the bad allocation error, since it indeed also persists when using Bowtie2 align without pipe.

            Concerning the memory, you are right that 90GB might be huge for 1 thread, but this amount of memory can be allocated on the cluster and did seem appropriate for the 8 threads.

            Today I will check if the errors might be specific for the reference or draft genome fasta files by splitting them shortly after the scaffold 13 where the alignment fails.

            Comment


            • #7
              You may be right about the piping.
              I don't actually use piping very much because I find it complicated to recover the output and error messages, which I keep for logging purposes even when the command has run correctly.

              Anyway, here is the argument for multi-threaded sorting with samtools.
              I'll get back to work now.
              Code:
                -@ INT     Set number of sorting and compression threads [1]

              Comment


              • #8
                After abandoning this alignment for quite a while, I figured out why MUMmer and other more obvious software packages for scaffolds failed on the dataset. The input length of the reference genome exceeded the 32bit max integer size by far, causing the job to be kicked out of the cluster.

                Re-compiling these packages using CPPFLAGS="-O3 -DSIXTYFOURBITS" on Linux completely solved the problem. Not all packages reported the integer size as the error that caused the job to be cancelled, so this was causing some confusion.

                Thank you for all the help.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Advancing Precision Medicine for Rare Diseases in Children
                  by seqadmin




                  Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
                  12-16-2024, 07:57 AM
                • seqadmin
                  Recent Advances in Sequencing Technologies
                  by seqadmin



                  Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

                  Long-Read Sequencing
                  Long-read sequencing has seen remarkable advancements,...
                  12-02-2024, 01:49 PM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 12-17-2024, 10:28 AM
                0 responses
                33 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 12-13-2024, 08:24 AM
                0 responses
                49 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 12-12-2024, 07:41 AM
                0 responses
                34 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 12-11-2024, 07:45 AM
                0 responses
                46 views
                0 likes
                Last Post seqadmin  
                Working...
                X