Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • BWA "Error generating alignments"

    I'm new to bioinformatics and I'm having an issue with BWA.

    I'm using BWA (the Map with BWA for Illumina tool in Galaxy) to align some reads to the human hg19 reference. The alignment is failing with the following error:

    The alignment failed.
    Error generating alignments. [bwa_sai2sam_pe_core] convert to sequence coordinate...
    [infer_isize] (25, 50, 75) percentile: (270, 370, 464)
    [infer_isize] low and high boundaries: 251 and 852 for estimating avg and std
    [infer_isize] inferred external isize from 175111 pairs: 422.966 +/- 112.280
    [infer_isize] skewness: 0.783; kurtosis: 0.372; ap_prior: 1.65e-05
    [infer_isize] inferred maximum insert size: 1192 (6.85 sigma)
    Killed


    The command to run BWA was:

    python /mnt/galaxy/shed_tools/toolshed.g2.bx.psu.edu/repos/devteam/bwa_wrappers/ffa8aaa14f7c/bwa_wrappers/bwa_wrapper.py --threads="${GALAXY_SLOTS:-4}" --fileSource="history" --ref="/mnt/galaxy/files/000/dataset_488.dat" --dbkey="?" --input1="/mnt/galaxy/files/000/dataset_530.dat" --input2="/mnt/galaxy/files/000/dataset_531.dat" --output="/mnt/galaxy/files/000/dataset_534.dat" --genAlignType="paired" --params="pre_set" --suppressHeader="false"

    I had previously, within the same Galaxy instance, successfully aligned some bacterial reads to a bacterial reference using the same workflow. However, when I try to align some reads from a human sample to hg19 I get this error. So it must have something to do with either the reference or the reads themselves. (I should note that the human and bacterial sequence data came off the same run on the same sequencer.)

    Can anyone help me figure out what the problem is?

    Mike

  • #2
    How long does the job run before it gets killed? The "killed" part from the log seems to indicate that your galaxy job is running up against some limit (memory/disk) on your galaxy instance (or is this on public galaxy at PSU?).

    Comment


    • #3
      The job ran for about 30 minutes or so. There is definitely not a disk problem; the disk is only about 5% full. Instance has 16 GB of memory but I don't know how to tell if it's running out of RAM. (Theoretically, it should swap if RAM is full.) I'm running my own instance of Galaxy/Cloudman on Amazon EC2.

      Comment


      • #4
        See this answer from Heng Li about memory requirements for bwa for aligning against human genome: http://sourceforge.net/p/bio-bwa/mai...sage/32268544/

        I am not sure how many threads you are running but it is possible that you are running out of memory/swap space (how much swap is configured on this instance?).

        Comment


        • #5
          It looks like you were probably right about memory. I was using a very small instance (c3large) on Amazon Web Services. That instance only had 4 GB of RAM. Couple that with the fact that whoever designed the Galaxy Cloudman AMIs didn't include a swap partition and the instance is severely memory-limited. I had been using it for small bacterial genomes previously, and it worked perfectly, so I didn't really think about it when I tried to run a human alignment, especially since the human genomes I'm running are extremely low depth (same number of reads as the bacterial genomes). I fired up a c3.8xlarge instance and ran the human alignments on that and they complete without error. I assume it's the reference taking up all the RAM.

          Thanks for your help!
          Last edited by cheezemeister; 02-28-2015, 03:01 PM.

          Comment


          • #6
            Originally posted by cheezemeister View Post
            I assume it's the reference taking up all the RAM.
            Correct, short-read aligners use memory proportional to the reference size.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM
            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            18 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            22 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 09:21 AM
            0 responses
            17 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-04-2024, 09:00 AM
            0 responses
            48 views
            0 likes
            Last Post seqadmin  
            Working...
            X