Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • soap segmentation fault

    Hi Guys

    I am using soap aligner on fastaq files and I get a "segmentation fault" at the end of the process. The input files have been created by myself with a script that extracted the reads from an old alignment file and, for this reason I suspect that there may be some problem with them although at a first sight they look ok. The command I used is inside a file called "go" and is the following

    Code:
    ./soap -D ./index/genome.fasta.index -a exp_47_s_A1.fastq  -b exp_47_s_A2.fastq -o paired_mapped_v2g3r1_1 -u unpaired_v2g3r1_1 -2 single_mapped_v2g3r1_1 -v 2 -g 3 -m 50 -x 400 -r 1 -t  -p 14

    I get the following output:

    Code:
    Begin Program SOAPaligner/soap2
    Wed Sep 29 15:34:14 2010
    Reference: ./index/genome.fasta.index
    Query File a: exp_47_s_A1.fastq
    Query File b: exp_47_s_A2.fastq
    Output File: paired_mapped_v2g3r1_1
                 single_mapped_v2g3r1_1
                 unpaired_v2g3r1_1
    Load Index Table ...
    lsLoad Index Table OK
    Begin Alignment ...
     131072 ok    3.36 sec
    ..................................
    ..................................
    24510464 ok    3.96 sec
    24641536 ok    3.73 sec
    24772608 ok    3.79 sec
    24903680 ok    3.63 sec
    25034752 ok    3.86 sec
    ./go: line 1: 25344 Segmentation fault
    the last lines of my input files are
    Code:
    tail exp_47_s_A1.fastq 
    +ILLUMINA-C3C24B_0047:1:120:18879:21119#0/1
    abb\bb_]__]]]]]KKDOOWZWWWbbabbbbbbbbbbbbb_bbbOODDOOONNNb\bbba`]Xa`Ya^``_[bb
    @ILLUMINA-C3C24B_0047:1:120:18877:8210#0/1
    agcagatcatgtggtganggactcggctggtcacagtcaggctgtgagccgatggtttgcccctcccccagggat
    +ILLUMINA-C3C24B_0047:1:120:18877:8210#0/1
    bbbbbbbbbbbbbbb``F^`aaaaabbbbb_ba`baab_a``_`BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
    @ILLUMINA-C3C24B_0047:1:120:18872:16339#0/1
    CTTGAAAACCTAGAATCAACACAAAATGAAAAAAAAAAAAAGCCCAAAAAAATGGCTTCCAAACCAGAAAactga
    +ILLUMINA-C3C24B_0047:1:120:18872:16339#0/1
    bbbbbbbbbbbbabbbabb_bbbbbbbbbbbbbb______ZL_[]`_`__b]Y\`]\\`O^]]W^bVb]bBBBBB
    
    
    
    tail exp_47_s_A2.fastq 
    +ILLUMINA-C3C24B_0047:1:120:18879:21119#0/2
    bb^bbbbbbbabbbbS^W[^bbb_bbbbbbVUFZVOIKKO[ZVXWLTWWTT^^^[]RRR__BBBBBBBBBBBBBB
    @ILLUMINA-C3C24B_0047:1:120:18877:8210#0/2
    GTATTATCTACTGTGAGAGGAGTTGAGATCCGATTGAGTCCCGAGAGTATCTgtcgcattctcgacatcccttcg
    +ILLUMINA-C3C24B_0047:1:120:18877:8210#0/2
    bbbbbbbbbbbbbbbbbbbbbbbbb_bbbbabbbbbbbc`c__ababab^U^BBBBBBBBBBBBBBBBBBBBBBB
    @ILLUMINA-C3C24B_0047:1:120:18872:16339#0/2
    CAAATATGCAGCTCAAATGTCATCCCTGCATGCTCTAATACCAATTGATGAACTTTTAaacgacataggatcaca
    +ILLUMINA-C3C24B_0047:1:120:18872:16339#0/2
    bbbbb`bbbbbbbbbbbbb_bbbbbbbbbb_b^]b`abb^bbb`aaa`^`aaU]^^a_BBBBBBBBBBBBBBBBB

    It looks everything right to me...... any idea?

    thanks a lot for helping

  • #2
    Did you ever find an answer to this? I'm having the exact same problem!

    Comment


    • #3
      No unfortunatly! i used bwa for my alignments, which is quite good and fast. i read great things also about bowtie2 that has been recently released. i will soon give it a try. Good luck!

      Comment


      • #4
        Okay well let me get the full story here.

        What does the command "go" do? Are these reads that you suspect to be the problem?

        From what I can tell, when this happens to me I get a number like 1310720 ok X.XX sec. and then I receive the segmentation fault.

        So far I have deduced that the number represents how many read pairs that it has processed before failing.

        Now, it only reports that the alignments are okay for each batch of 131072, so if I take out that block of reads, it continues until it hit something else!

        I'm thinking it might be a q-score problem, but I'm having trouble wrapping my mind around the standards:

        FASTQ formats

        Comment


        • #5
          Hi there,

          so let me get this clear. When you write:

          Now, it only reports that the alignments are okay for each batch of 131072, so if I take out that block of reads, it continues until it hit something else!


          you mean that you cut the file containing the reads starting from line 131072 and until the end? And you got the same error?

          Generally the "segmentation default" error happens (at least in C and C++) when one of the following problems occur (among other):
          1) the software is trying to open a file that does not exist. This is not our case since all the files are recognised and open
          2) the available memory is not sufficient for the process to terminate
          3) Some variable is used to store a value that is retrieved from a file but for some reason the retrieved value is too big to fit in the amount of memory available for that variable.

          If all the reads are ok then point 3 should not happen. In order to verify that point 2 is not happening I would split the file containing the reads in sub-files 131072 line long, launch the alignment and see whether the software fails again.

          let me know whether this suggestion has been of any help

          Comment


          • #6
            A FASTQ has 4 lines per read. When it segfaults at 25034752 it means that it has gone through 12517376 reads from reads_1 and 12517376 from reads 2. And something in the next 131072 reads makes it choke. At least that is what I think I am getting from the evidence here.

            When I removed the first chunk it did continue further than it had before and then dropped out again.

            I thought it was a memory issue, but then I saw it was failing in the same place no matter how much memory I threw at it.

            Luckily, I think I have a small dataset that chokes on this. I will try the idea of separating it into files 65536 from each of the reads files and seeing if there is something there.

            Now I'm really new to soapaligner and I'm not familiar with all of the options, so it is not out of the realm of possibility that my problem could be a max size of insert problem.

            Could it be that soapaligner expects a certain insert size and if those qualifications aren't met it can segfault (point 3)?

            I hope I have more helpful info today.

            Comment


            • #7
              This morning has been productive so far, it seems the removal of the -g allows the alignment to go through. So I'm wondering if my original allowed gap (6bps) is too large? I guess I'll keep chronicling things until I solve this headache.
              Last edited by cwisch88; 04-18-2012, 06:56 AM.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM
              • seqadmin
                Techniques and Challenges in Conservation Genomics
                by seqadmin



                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                Avian Conservation
                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                03-08-2024, 10:41 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Yesterday, 06:37 PM
              0 responses
              7 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, Yesterday, 06:07 PM
              0 responses
              7 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-22-2024, 10:03 AM
              0 responses
              49 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-21-2024, 07:32 AM
              0 responses
              66 views
              0 likes
              Last Post seqadmin  
              Working...
              X