Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • bwa aln Segmentation fault

    Hi

    I'd like to use bwa alignment tool for aligning SOLiD color space reads (SREK)onto reference sequences (human miRNAs).
    After converting the csfasta und qual files to fastq by using solid2fastq that was provided in the bwa-software, I have run bwa aln but got a segmentation fault. The fastq files contains 15'643'846 reads. When I use gdb, I got the following output:

    [SREK] gdb /apps/bi/bwa-0.5.7/bwa
    GNU gdb Red Hat Linux (6.3.0.0-1.153.el4_6.2rh)
    Copyright 2004 Free Software Foundation, Inc.
    GDB is free software, covered by the GNU General Public License, and you are
    welcome to change it and/or distribute copies of it under certain conditions.
    Type "show copying" to see the conditions.
    There is absolutely no warranty for GDB. Type "show warranty" for details.
    This GDB was configured as "x86_64-redhat-linux-gnu"...Using host libthread_db library "/lib64/tls/libthread_db.so.1".

    (gdb) run aln -n 2 -c /data/hum_miRNA2DNA.fa 6A1_T.single.fastq.gz > test.sai
    Starting program: /apps/bi/bwa-0.5.7/bwa aln -n 2 -c /data/hum_miRNA2DNA.fa 6A1_T.single.fastq.gz > test.sai
    [Thread debugging using libthread_db enabled]
    [New Thread 182894247456 (LWP 30893)]
    [bwa_aln_core] calculate SA coordinate... 5.42 sec
    [bwa_aln_core] write to the disk... 0.02 sec
    [bwa_aln_core] 262144 sequences have been processed.
    [bwa_aln_core] calculate SA coordinate... 5.55 sec
    [bwa_aln_core] write to the disk... 0.02 sec
    [bwa_aln_core] 524288 sequences have been processed.
    [bwa_aln_core] calculate SA coordinate... 5.83 sec
    [bwa_aln_core] write to the disk... 0.03 sec
    [bwa_aln_core] 786432 sequences have been processed.
    [bwa_aln_core] calculate SA coordinate... 4.93 sec
    [bwa_aln_core] write to the disk... 0.03 sec
    [bwa_aln_core] 1048576 sequences have been processed.
    [bwa_aln_core] calculate SA coordinate... 5.62 sec
    [bwa_aln_core] write to the disk... 0.04 sec
    [bwa_aln_core] 1310720 sequences have been processed.
    [bwa_aln_core] calculate SA coordinate... 6.11 sec
    [bwa_aln_core] write to the disk... 0.03 sec
    [bwa_aln_core] 1572864 sequences have been processed.
    [bwa_aln_core] calculate SA coordinate... 5.84 sec
    [bwa_aln_core] write to the disk... 0.03 sec
    [bwa_aln_core] 1835008 sequences have been processed.
    [bwa_aln_core] calculate SA coordinate...
    Program received signal SIGSEGV, Segmentation fault.
    [Switching to Thread 182894247456 (LWP 30893)]
    0x00000000004038f6 in bwt_cal_width (rbwt=0x532a20, len=0, str=0x532ec0 "", width=0x0) at bwtaln.c:76
    76 bwtaln.c: No such file or directory.
    in bwtaln.c
    (gdb)


    Source code from bwtaln.c :
    ####################
    // width must be filled as zero
    static int bwt_cal_width(const bwt_t *rbwt, int len, const ubyte_t *str, bwt_width_t *width)
    {
    bwtint_t k, l, ok, ol;
    int i, bid;
    bid = 0;
    k = 0; l = rbwt->seq_len;
    for (i = 0; i < len; ++i) {
    ubyte_t c = str[i];
    if (c < 4) {
    bwt_2occ(rbwt, k - 1, l, c, &ok, &ol);
    k = rbwt->L2[c] + ok + 1;
    l = rbwt->L2[c] + ol;
    }
    if (k > l || c > 3) { // then restart
    k = 0;
    l = rbwt->seq_len;
    ++bid;
    }
    width[i].w = l - k + 1;
    width[i].bid = bid;
    }
    width[len].w = 0;
    width[len].bid = ++bid; // ###### line 76 #####
    return bid;
    }

    I have no clue how to solve the problem.
    Any help is greatly appreciated!

    Many thanks!

  • #2
    Please check out the latest SVN. Someone has spotted that 0.5.7 may use excessive memory due to a bug/typo.

    Comment


    • #3
      Thanks for the advice!
      I've tried it but it didn't change, either using 1.5 Mio reads.

      The memory usage was always < 70 MB

      Comment


      • #4
        Lately I had been encountering inexplicable segmentation faults during the 'aln' command for SOLiD reads, and I came across this thread.

        This is identical to the problems I've been experiencing, and I found a solution.

        The problem occurs when the first read of a 262144 block has a length of zero. This is why it's so rare and so hard to reproduce. The w[0] and w[1] structures in the bwa_cal_sa_reg_gap function are only allocated memory when the current sequence length strictly exceeds the current maximum, which is initialized to 0. If the first read encountered is of length zero, it will not be allocated memory and thus the segfault occurs in the bwt_cal_width function as described by the original poster of this thread.

        I was able to fix this by initializing the max_l variable at the beginning of the bwa_cal_sa_reg_gap function to -1 instead of 0.
        Last edited by dp05yk; 02-26-2011, 08:14 AM.

        Comment


        • #5
          I had experienced segmentation fault too - upon investigation of the input file(s) around the line number where the seg fault occurs, I found some commandline text (e.g., 0.11 sec) in the input files (generated from a previous script). Removing those texts solved the problem for me.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM
          • seqadmin
            Strategies for Sequencing Challenging Samples
            by seqadmin


            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
            03-22-2024, 06:39 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          25 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          29 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 09:21 AM
          0 responses
          24 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-04-2024, 09:00 AM
          0 responses
          52 views
          0 likes
          Last Post seqadmin  
          Working...
          X