SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   Bioinformatics (http://seqanswers.com/forums/forumdisplay.php?f=18)
-   -   soap segmentation fault (http://seqanswers.com/forums/showthread.php?t=7069)

scami 09-29-2010 06:05 AM

soap segmentation fault
 
Hi Guys

I am using soap aligner on fastaq files and I get a "segmentation fault" at the end of the process. The input files have been created by myself with a script that extracted the reads from an old alignment file and, for this reason I suspect that there may be some problem with them although at a first sight they look ok. The command I used is inside a file called "go" and is the following

Code:

./soap -D ./index/genome.fasta.index -a exp_47_s_A1.fastq  -b exp_47_s_A2.fastq -o paired_mapped_v2g3r1_1 -u unpaired_v2g3r1_1 -2 single_mapped_v2g3r1_1 -v 2 -g 3 -m 50 -x 400 -r 1 -t  -p 14

I get the following output:

Code:


Begin Program SOAPaligner/soap2
Wed Sep 29 15:34:14 2010
Reference: ./index/genome.fasta.index
Query File a: exp_47_s_A1.fastq
Query File b: exp_47_s_A2.fastq
Output File: paired_mapped_v2g3r1_1
            single_mapped_v2g3r1_1
            unpaired_v2g3r1_1
Load Index Table ...
lsLoad Index Table OK
Begin Alignment ...
 131072 ok    3.36 sec
..................................
..................................
24510464 ok    3.96 sec
24641536 ok    3.73 sec
24772608 ok    3.79 sec
24903680 ok    3.63 sec
25034752 ok    3.86 sec
./go: line 1: 25344 Segmentation fault

the last lines of my input files are
Code:

tail exp_47_s_A1.fastq
+ILLUMINA-C3C24B_0047:1:120:18879:21119#0/1
abb\bb_]__]]]]]KKDOOWZWWWbbabbbbbbbbbbbbb_bbbOODDOOONNNb\bbba`]Xa`Ya^``_[bb
@ILLUMINA-C3C24B_0047:1:120:18877:8210#0/1
agcagatcatgtggtganggactcggctggtcacagtcaggctgtgagccgatggtttgcccctcccccagggat
+ILLUMINA-C3C24B_0047:1:120:18877:8210#0/1
bbbbbbbbbbbbbbb``F^`aaaaabbbbb_ba`baab_a``_`BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
@ILLUMINA-C3C24B_0047:1:120:18872:16339#0/1
CTTGAAAACCTAGAATCAACACAAAATGAAAAAAAAAAAAAGCCCAAAAAAATGGCTTCCAAACCAGAAAactga
+ILLUMINA-C3C24B_0047:1:120:18872:16339#0/1
bbbbbbbbbbbbabbbabb_bbbbbbbbbbbbbb______ZL_[]`_`__b]Y\`]\\`O^]]W^bVb]bBBBBB



tail exp_47_s_A2.fastq
+ILLUMINA-C3C24B_0047:1:120:18879:21119#0/2
bb^bbbbbbbabbbbS^W[^bbb_bbbbbbVUFZVOIKKO[ZVXWLTWWTT^^^[]RRR__BBBBBBBBBBBBBB
@ILLUMINA-C3C24B_0047:1:120:18877:8210#0/2
GTATTATCTACTGTGAGAGGAGTTGAGATCCGATTGAGTCCCGAGAGTATCTgtcgcattctcgacatcccttcg
+ILLUMINA-C3C24B_0047:1:120:18877:8210#0/2
bbbbbbbbbbbbbbbbbbbbbbbbb_bbbbabbbbbbbc`c__ababab^U^BBBBBBBBBBBBBBBBBBBBBBB
@ILLUMINA-C3C24B_0047:1:120:18872:16339#0/2
CAAATATGCAGCTCAAATGTCATCCCTGCATGCTCTAATACCAATTGATGAACTTTTAaacgacataggatcaca
+ILLUMINA-C3C24B_0047:1:120:18872:16339#0/2
bbbbb`bbbbbbbbbbbbb_bbbbbbbbbb_b^]b`abb^bbb`aaa`^`aaU]^^a_BBBBBBBBBBBBBBBBB


It looks everything right to me...... any idea?

thanks a lot for helping

cwisch88 04-16-2012 09:04 AM

Did you ever find an answer to this? I'm having the exact same problem!

scami 04-16-2012 10:38 AM

No unfortunatly! i used bwa for my alignments, which is quite good and fast. i read great things also about bowtie2 that has been recently released. i will soon give it a try. Good luck!

cwisch88 04-16-2012 11:17 AM

Okay well let me get the full story here.

What does the command "go" do? Are these reads that you suspect to be the problem?

From what I can tell, when this happens to me I get a number like 1310720 ok X.XX sec. and then I receive the segmentation fault.

So far I have deduced that the number represents how many read pairs that it has processed before failing.

Now, it only reports that the alignments are okay for each batch of 131072, so if I take out that block of reads, it continues until it hit something else!

I'm thinking it might be a q-score problem, but I'm having trouble wrapping my mind around the standards:

FASTQ formats

scami 04-16-2012 09:58 PM

Hi there,

so let me get this clear. When you write:

Now, it only reports that the alignments are okay for each batch of 131072, so if I take out that block of reads, it continues until it hit something else!


you mean that you cut the file containing the reads starting from line 131072 and until the end? And you got the same error?

Generally the "segmentation default" error happens (at least in C and C++) when one of the following problems occur (among other):
1) the software is trying to open a file that does not exist. This is not our case since all the files are recognised and open
2) the available memory is not sufficient for the process to terminate
3) Some variable is used to store a value that is retrieved from a file but for some reason the retrieved value is too big to fit in the amount of memory available for that variable.

If all the reads are ok then point 3 should not happen. In order to verify that point 2 is not happening I would split the file containing the reads in sub-files 131072 line long, launch the alignment and see whether the software fails again.

let me know whether this suggestion has been of any help

cwisch88 04-17-2012 04:38 AM

A FASTQ has 4 lines per read. When it segfaults at 25034752 it means that it has gone through 12517376 reads from reads_1 and 12517376 from reads 2. And something in the next 131072 reads makes it choke. At least that is what I think I am getting from the evidence here.

When I removed the first chunk it did continue further than it had before and then dropped out again.

I thought it was a memory issue, but then I saw it was failing in the same place no matter how much memory I threw at it.

Luckily, I think I have a small dataset that chokes on this. I will try the idea of separating it into files 65536 from each of the reads files and seeing if there is something there.

Now I'm really new to soapaligner and I'm not familiar with all of the options, so it is not out of the realm of possibility that my problem could be a max size of insert problem.

Could it be that soapaligner expects a certain insert size and if those qualifications aren't met it can segfault (point 3)?

I hope I have more helpful info today.

cwisch88 04-17-2012 06:08 AM

This morning has been productive so far, it seems the removal of the -g allows the alignment to go through. So I'm wondering if my original allowed gap (6bps) is too large? I guess I'll keep chronicling things until I solve this headache.


All times are GMT -8. The time now is 04:27 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.