Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • adkostic
    Junior Member
    • Apr 2009
    • 4

    BWA Alignment Segmentation Fault

    I'm trying to align a set of 51bp paired-end Illumina reads from a human cell line cDNA library. BWA does a great job aligning my reads to the indexed human cDNA database (Homo_sapiens.NCBI36.53.cdna.all.fa) from Ensembl or to individual chromosomes (for example, Homo_sapiens.NCBI36.53.dna.chromosome.1.fa), but when I try to align to the full human DNA database (Homo_sapiens.NCBI36.53.dna.toplevel.fa) the 'index' step works fine, but the 'aln' step gives a 'Segmentation fault'.

    The output looks like this:

    [bwa_aln] 17bp reads: max_diff = 2
    [bwa_aln] 38bp reads: max_diff = 3
    [bwa_aln] 64bp reads: max_diff = 4
    [bwa_aln] 93bp reads: max_diff = 5
    [bwa_aln] 124bp reads: max_diff = 6
    [bwa_aln] 157bp reads: max_diff = 7
    [bwa_aln] 190bp reads: max_diff = 8
    [bwa_aln] 225bp reads: max_diff = 9
    Segmentation fault


    This happens even when I run the command on machines on my cluster with 32GB of RAM, so I don't think memory is an issue.
    Maybe the database is too big for BWA to handle (I doubt that)? Maybe there's something about the way this database is indexed that BWA doesn't like (I don't understand enough about the way BWA indexes and reads SA coordinates and chromosomal coordinates to know if this is the issue)?
    Does anyone have any ideas?

    Thanks!
  • lh3
    Senior Member
    • Feb 2008
    • 686

    #2
    I will try that reference file by myself to see if I can recreate the segfault. I never used that toplevel contig file. The weird thing is bwa segfaults at that step where actually little computation has been done.

    PS: At least on my machine, it does not segfault before you see "calculate SA coordinate...". Could you check if your cluster has memory limit by default? Or you are using 32-bit version?
    Last edited by lh3; 04-19-2009, 06:12 AM.

    Comment

    • adkostic
      Junior Member
      • Apr 2009
      • 4

      #3
      I appreciate your looking into this.

      When I run the job with a higher memory allocation it takes a little more time to process and it does output 'calculate SA coordinate...' and runs for about a minute before it segfaults. The version of BWA I'm using is 0.4.6 (the latest version on sourceforge), I'm not sure if that's a 32-bit version.

      It's not so important for me to use this toplevel contig database - is there a specific human genome database that you use which works?

      Thanks

      Comment

      • lh3
        Senior Member
        • Feb 2008
        • 686

        #4
        Is it possible for you to put the first 256k reads on some FTP? I cannot recreate the segfault with my data. Many thanks.

        Comment

        • adkostic
          Junior Member
          • Apr 2009
          • 4

          #5
          I've posted one paired set at at the address below:

          ftp://ftp.broad.mit.edu/outgoing/DFC...ublic/adkostic

          This is about 40,000 paired reads that are left from my original set (they did not map using my previous aligning methods). But using the cDNA database mentioned above a good portion of these are mapped by BWA, so this should probably also be the case for the full DNA database.

          Thanks again.

          Comment

          • lh3
            Senior Member
            • Feb 2008
            • 686

            #6
            Thanks for posting this. Unfortuantely, I cannot download the "F" reads. The error is "550 30BV1.1.F.unmapped.reads.fastq: Permission denied".

            I tried the "R" reads and they are mapped fine against the toplevel fasta. Debugger like valgrind did not report hidden bugs (bugs that cause segfault on some machines but not the others). I think these should not be the reads causing segfaults. Probably these are reads bridging exon boundaries.

            Maybe the first 256k reads in the fastq causing the segfault are more helpful to find the reason. Thanks.

            Comment

            • adkostic
              Junior Member
              • Apr 2009
              • 4

              #7
              Thank you for trying those reads out for yourself. Both the "F" and "R" reads cause the segfault on my machine, so it seems that it must be a machine-specific cause. I personally have not tried to map more than these sets of reads using BWA; I received these reads from my coworker who got the rest of the reads to align using Arachne (I requested these reads so that I can get used to using the aligners before my sequence data comes in, and when it does I'll be using BWA (as well as Maq and Bowtie) to align them and I'll post if I'm still having trouble with my complete data set).

              Thanks for your help Heng.

              Alex

              Comment

              • lh3
                Senior Member
                • Feb 2008
                • 686

                #8
                I guess this is caused by the configuration of your machines. When valgrind complains nothing, it is less likely a hidden bug in bwa. Another test would be to index half of the genome (say first 5 chromosomes) and to see if segfault occurs. Note that for that toplevel fasta, bwa requires 2.7GB memory. Maybe this is a problem. Anyway, this is wild guess. Probably it does not help.

                By the way, about your other questions.

                1. this top level fasta contains different haplotypes for chr6 and chr22 and is not a good reference for the purpose of read mapping. You can find the reference genome used by the 1000 genomes project somewhere on its ftp (I do not know). I have reasons to believe that is the best reference genome for human mapping.

                2. For "R" reads, with the default option, bwa maps 433 reads; a more sensitive mode maps 597. Nearly all of the mapped reads contain short indels (the vast majority) or >=3 mismatches.
                Last edited by lh3; 04-22-2009, 04:33 AM.

                Comment

                • Fabien Campagne
                  Member
                  • Feb 2010
                  • 39

                  #9
                  similar error on different machine

                  We observe a segmentation fault with bwa 0.5.7 (and 0.5.5) at approximately the same step of alignment on a different machine. The bug seems to be triggered by some datasets only (reference or input reads).

                  Here's a valgrind output (shown for version 0.5.7):

                  gobyweb@spanky FSMIQXN-solid-HBR $ valgrind /home/gobyweb/goby/nextgen-tools/bwa/bwa aln -c -l 35 -o 1 -e -1 /scratchLocal/gobyweb/input-data/reference-db/Transcript-GRCh37.57/homo_sapiens/colorspace/bwa/index.00T 14.fastq==12461== Memcheck, a memory error detector.==12461== Copyright (C) 2002-2006, and GNU GPL'd, by Julian Seward et al.
                  ==12461== Using LibVEX rev 1658, a library for dynamic binary translation.
                  ==12461== Copyright (C) 2004-2006, and GNU GPL'd, by OpenWorks LLP.
                  ==12461== Using valgrind-3.2.1, a dynamic binary instrumentation framework.
                  ==12461== Copyright (C) 2000-2006, and GNU GPL'd, by Julian Seward et al.
                  ==12461== For more details, rerun with: -v
                  ==12461==
                  [bwa_aln] 17bp reads: max_diff = 2
                  [bwa_aln] 38bp reads: max_diff = 3
                  [bwa_aln] 64bp reads: max_diff = 4
                  [bwa_aln] 93bp reads: max_diff = 5
                  [bwa_aln] 124bp reads: max_diff = 6
                  [bwa_aln] 157bp reads: max_diff = 7
                  [bwa_aln] 190bp reads: max_diff = 8
                  [bwa_aln] 225bp reads: max_diff = 9


                  ??
                  [bwa_aln_core] calculate SA coordinate... ==12461== Invalid read of size 4
                  ==12461== at 0x402E4B: bwt_2occ (bwt.c:100)
                  ==12461== by 0x404E26: bwt_cal_width (bwtaln.c:63)
                  ==12461== by 0x40511B: bwa_cal_sa_reg_gap (bwtaln.c:122)
                  ==12461== by 0x405631: bwa_aln_core (bwtaln.c:185)
                  ==12461== by 0x4059A1: bwa_aln (bwtaln.c:297)
                  ==12461== by 0x3CF4E1D993: (below main) (in /lib64/libc-2.5.so)
                  ==12461== Address 0x1580D100 is not stack'd, malloc'd or (recently) free'd
                  ==12461==
                  ==12461== Process terminating with default action of signal 11 (SIGSEGV)
                  ==12461== Access not within mapped region at address 0x1580D100
                  ==12461== at 0x402E4B: bwt_2occ (bwt.c:100)

                  ==12461== by 0x404E26: bwt_cal_width (bwtaln.c:63)
                  ==12461== by 0x40511B: bwa_cal_sa_reg_gap (bwtaln.c:122)
                  ==12461== by 0x405631: bwa_aln_core (bwtaln.c:185)
                  ==12461== by 0x4059A1: bwa_aln (bwtaln.c:297)
                  ==12461== by 0x3CF4E1D993: (below main) (in /lib64/libc-2.5.so)
                  ?#=????#==12461==
                  ==12461== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 5 from 1)
                  ==12461== malloc/free: in use at exit: 114,637,923 bytes in 1,048,662 blocks.
                  ==12461== malloc/free: 1,048,672 allocs, 10 frees, 114,639,241 bytes allocated.
                  ==12461== For counts of detected errors, rerun with: -v
                  ==12461== searching for pointers to 1,048,662 not-freed blocks.
                  ==12461== checked 111,967,840 bytes.
                  ==12461==
                  ==12461== LEAK SUMMARY:
                  ==12461== definitely lost: 0 bytes in 0 blocks.
                  ==12461== possibly lost: 0 bytes in 0 blocks.
                  ==12461== still reachable: 114,637,923 bytes in 1,048,662 blocks.
                  ==12461== suppressed: 0 bytes in 0 blocks.
                  ==12461== Reachable blocks (those to which a pointer was found) are not shown.
                  ==12461== To see them, rerun with: --show-reachable=yes
                  Segmentation fault

                  We have a small dataset that triggers the problem. Let me know if you are interested.

                  Program: bwa (alignment via Burrows-Wheeler transformation)
                  Version: 0.5.5 (r1273)
                  Contact: Heng Li <[email protected]>

                  Program: bwa (alignment via Burrows-Wheeler transformation)
                  Version: 0.5.7 (r1310)
                  Contact: Heng Li <[email protected]>

                  Comment

                  • lh3
                    Senior Member
                    • Feb 2008
                    • 686

                    #10
                    Yes, please send me the example file. Thank you.

                    Comment

                    • dawe
                      Senior Member
                      • Apr 2009
                      • 258

                      #11
                      Hi all, I've got now a segfault error with bwa. Apparently this happens only if I enable threaded alignment on a NFS file system.
                      Code:
                      $ bwa aln -t 2 /db/bwa/hg19/hg19.fa s_1_2.fastq > s_1_2.sai[bwa_aln] 17bp reads: max_diff = 2
                      [bwa_aln] 38bp reads: max_diff = 3
                      [bwa_aln] 64bp reads: max_diff = 4
                      [bwa_aln] 93bp reads: max_diff = 5
                      [bwa_aln] 124bp reads: max_diff = 6
                      [bwa_aln] 157bp reads: max_diff = 7
                      [bwa_aln] 190bp reads: max_diff = 8
                      [bwa_aln] 225bp reads: max_diff = 9
                      [bwa_aln_core] calculate SA coordinate... Segmentation fault
                      while

                      Code:
                      $ bwa aln  /db/bwa/hg19/hg19.fa s_1_1.fastq > s_1_1.sai
                      [bwa_aln] 17bp reads: max_diff = 2
                      [bwa_aln] 38bp reads: max_diff = 3
                      [bwa_aln] 64bp reads: max_diff = 4
                      [bwa_aln] 93bp reads: max_diff = 5
                      [bwa_aln] 124bp reads: max_diff = 6
                      [bwa_aln] 157bp reads: max_diff = 7
                      [bwa_aln] 190bp reads: max_diff = 8
                      [bwa_aln] 225bp reads: max_diff = 9
                      [bwa_aln_core] calculate SA coordinate... 114.28 sec
                      [bwa_aln_core] write to the disk... 0.08 sec
                      [bwa_aln_core] 262144 sequences have been processed.
                      [bwa_aln_core] calculate SA coordinate... 118.51 sec
                      [bwa_aln_core] write to the disk... 0.07 sec
                      [bwa_aln_core] 524288 sequences have been processed.
                      [bwa_aln_core] calculate SA coordinate...
                      works

                      Code:
                      Program: bwa (alignment via Burrows-Wheeler transformation)
                      Version: 0.5.8 (r1442)

                      Comment

                      • golharam
                        Member
                        • Dec 2009
                        • 55

                        #12
                        resolution?

                        I'm seeing this as well. Has there been any resolution to this? I'm running BWA 64-bit v0.5.7

                        Comment

                        • dp05yk
                          Member
                          • Dec 2010
                          • 66

                          #13
                          Hi golharam,

                          Try upgrading to the most recent version. If that doesn't help, here's a reply I made on another thread:

                          Lately I had been encountering inexplicable segmentation faults during the 'aln' command for SOLiD reads. The problem occurs when the first read of a 262144 block has a length of zero. This is why it's so rare and so hard to reproduce. I was able to fix this by initializing the max_l variable at the beginning of the bwa_cal_sa_reg_gap function to -1 instead of 0.
                          It makes sense that it would occur more often with threading enabled, since the chances are multiplied by the number of threads running.

                          I'd recommend:

                          1. Upgrading to BWA 0.5.9.
                          2. If you are still getting segmentation faults, modify line 82 of bwtaln.c: change "max_l = 0", to "max_l = -1", and recompile. This is what fixed it for me.

                          Comment

                          • golharam
                            Member
                            • Dec 2009
                            • 55

                            #14
                            solved

                            I upgraded bwa to 0.5.9. I rebuilt the index with the latest version and re-aligned with the latest version and everything seems okay.

                            Comment

                            • oiiio
                              Senior Member
                              • Jan 2011
                              • 105

                              #15
                              Originally posted by golharam View Post
                              I upgraded bwa to 0.5.9. I rebuilt the index with the latest version and re-aligned with the latest version and everything seems okay.
                              Did you need to modify bwtaln.c as suggested? I am having similar segfault errors like this, even though I am using the latest bersion of BWA.

                              Comment

                              Latest Articles

                              Collapse

                              • SEQadmin2
                                Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                                by SEQadmin2


                                I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.


                                Here are nine questions we think about, in roughly the order they matter, before...
                                Today, 07:11 AM
                              • SEQadmin2
                                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                by SEQadmin2


                                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                ...
                                06-02-2026, 10:05 AM
                              • SEQadmin2
                                Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                                by SEQadmin2


                                With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                                Introduction

                                Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                                05-22-2026, 06:42 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, Yesterday, 06:09 AM
                              0 responses
                              16 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-09-2026, 11:58 AM
                              0 responses
                              36 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-05-2026, 10:09 AM
                              0 responses
                              42 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-04-2026, 08:59 AM
                              0 responses
                              49 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...