SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
bwa sampe segmentation fault papori Bioinformatics 5 09-22-2013 10:05 PM
BWA Alignment Segmentation Fault adkostic Bioinformatics 24 09-09-2013 12:52 AM
segmentation fault in BWA sampe papori Illumina/Solexa 0 07-28-2011 08:12 AM
bwa aln Segmentation fault DNAjunk Bioinformatics 4 03-02-2011 06:28 AM
BWA Segmentation Fault (aln) raela Bioinformatics 0 05-18-2010 06:41 AM

Reply
 
Thread Tools
Old 07-08-2009, 10:20 AM   #1
xguo
Member
 
Location: Maryland

Join Date: Jul 2008
Posts: 48
Default bwa samse segmentation fault

Hi, there,

I'm trying to use bwa to align SOLID reads to human genome. The alignment step runs fine using "bwa aln -c" after converting color reads/quality files to fastq format and indexing human genomes. However, bwa samse failed and generated segmentation fault.

[bwa_aln_core] convert to sequence coordinate... 4.27 sec
[bwa_aln_core] refine gapped alignments... Segmentation fault

If I use only the first 30 reads to do the alignment, sam file can be generated without error, although no reads are mapped to the genome. The sam output is
NB1001:1279_6_16 4 * 0 0 * * 0
0 ANGGGCNATGANGGTNNCGGANGTTGNAGCGNTGGGNGGGGNNGGGGNG ,-":#49-"2,0%-"8
%8-"-"8'5$-":.4(-"+5''-"<(*+-"%)5
NB1001:1279_6_26 4 * 0 0 * * 0
0 GNACACNGGAGNTCGNNTTTANATCGNGGGGNAGAGNGGAGNNGAGGNG 7-"7+84-"95/3-")
)0-"-"/&+&-"(+36-"#555-"&*(#-"+%*
..........

However, the error appears if I use next 10 reads for the alignment. It seems that the sequence conversion doesn't work for the mapped reads.

Can anyone help me with this problem? thanks a lot.

Xiang

SAIC-Frederick, Inc.
National Cancer Institute
Gaithersburg, MD.
xguo is offline   Reply With Quote
Old 07-17-2009, 10:18 AM   #2
dara
Member
 
Location: texas

Join Date: Apr 2009
Posts: 10
Default

I am also experiencing this issue- bwa samse generates a segmentation fault for the genome the size of human reference and about 30 million reads

any help would be appreciated. thanks
dara is offline   Reply With Quote
Old 07-21-2009, 07:03 AM   #3
luisczul
Member
 
Location: Canada

Join Date: May 2009
Posts: 10
Default

I am having the same error, as early as converting the human genome to a fasta file format with the command fasta2bfa.
luisczul is offline   Reply With Quote
Old 07-21-2009, 01:12 PM   #4
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Quote:
Originally Posted by luisczul View Post
I am having the same error, as early as converting the human genome to a fasta file format with the command fasta2bfa.
Are you running out of RAM?
nilshomer is offline   Reply With Quote
Old 07-24-2009, 11:21 AM   #5
xguo
Member
 
Location: Maryland

Join Date: Jul 2008
Posts: 48
Default

The error I got is not related to memory, since I have even tried it in a machine with 512 GB memory. I suspect that the conversion from SOLID csfasta/quality format to fastq format may have problem. Using bwa samse -n 2 ..., I can get a simplified alignment output. There are some weird records such as:

>-"8$ 2 1865904808
chr10 -90253347 0
chr10 -50629021 0

It seems that part of the quality value is mistaken as a new read record and it was aligned to the genome millions of times. Most of the other reads look fine with the output like:

>test:1279_470_1023 1 1
chr22 +42910109 0
>test:1279_470_1108 1 1
chr18 -43820923 0
>test:1279_470_1122 0 0

Segmentation error occurs if I use bwa samse -n -1 to disable outputting multiple hits.

Any help is greatly appreciated.

Xiang
xguo is offline   Reply With Quote
Old 07-24-2009, 12:03 PM   #6
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Try PMing Heng Li (lh3) who is the author of bwa. If you are in a bind, there are other SOLiD aligners (like my own BFAST), etc.
nilshomer is offline   Reply With Quote
Old 07-27-2009, 07:46 AM   #7
xguo
Member
 
Location: Maryland

Join Date: Jul 2008
Posts: 48
Default missing value in phred Ascii representation

It seems that solid2fastq.pl script doesn't handle missing quality value. It generates -" for phred score -1. Does anyone know how to transform score -1 to ASCII?

thanks
Xiang
xguo is offline   Reply With Quote
Old 07-27-2009, 09:35 AM   #8
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Quote:
Originally Posted by xguo View Post
It seems that solid2fastq.pl script doesn't handle missing quality value. It generates -" for phred score -1. Does anyone know how to transform score -1 to ASCII?

thanks
Xiang
Are missing quality values listed as blanks for you? I will update the code accordingly. If you have a blank quality score, you could always give it a phred score of 1 stating not to trust the color call, or you could give it a maximum value 255 stating that you should trust the uncalled color. Tailor it to your situation. Feel free to PM me to get your issues resolved.
nilshomer is offline   Reply With Quote
Old 07-27-2009, 09:45 AM   #9
xguo
Member
 
Location: Maryland

Join Date: Jul 2008
Posts: 48
Default

The missing quality is encoded as -1 in QV file generated by SOLID platform. The solid2fastq.pl script treated it as two values, so the resulting fastq has uneven length for the read and quality field. I changed -1 to 0, and everything is fine now.

thanks
Xiang
xguo is offline   Reply With Quote
Old 07-27-2009, 11:15 AM   #10
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Quote:
Originally Posted by xguo View Post
The missing quality is encoded as -1 in QV file generated by SOLID platform. The solid2fastq.pl script treated it as two values, so the resulting fastq has uneven length for the read and quality field. I changed -1 to 0, and everything is fine now.

thanks
Xiang
I have changed this in BFAST's solid2fastq.pl script (which now is implemented in C for efficiency). I will release this script in an upcoming update but let me know if you want it earlier.
nilshomer is offline   Reply With Quote
Old 07-31-2009, 10:31 AM   #11
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,077
Default

I would like to add that I am observing the same problem of having bwa "samse" analysis seg fault. The dataset is human illumina reads (~500 million). BWA converted about 220 million reads before the seg fault. The machine I am running this has 32GB of RAM. The process was using only about 2.3 GB.

-- hk
GenoMax is offline   Reply With Quote
Old 07-31-2009, 01:15 PM   #12
totalnew
Member
 
Location: Canada

Join Date: Apr 2009
Posts: 46
Default

I like to build color-space indexing by bwa. The input fast should be in nucleotide space, so I use following command to index whole human genome:

>bwa index -c human.fasta

But segmentation fault occurred everytime like this,

[bwa_index] Pack nucleotide FASTA... 60.48 sec
[bwa_index] Convert nucleotide PAC to color PAC... 31.13 sec
[bwa_index] Reverse the packed sequence... 16.62 sec
[bwa_index] Construct BWT for the packed sequence...
Segmentation fault

Can anyone tell me why that happen?

thanks
totalnew is offline Reply With Quote
totalnew is offline   Reply With Quote
Old 08-05-2009, 04:18 PM   #13
baohua100
Senior Member
 
Location: Canada

Join Date: Jun 2008
Posts: 103
Default

bwa sampe ../../genome/genome.fa aln_sa1.sai aln_sa2.sai 4_1.fq 4_2.fq > pairs.sam

also a [1]+ Segmentation fault
baohua100 is offline   Reply With Quote
Old 08-24-2009, 11:30 AM   #14
zxl124
Junior Member
 
Location: university park, pa

Join Date: Aug 2009
Posts: 4
Default

Same problem here.

Tried the above mentioned method, change -1 to 0 in qual file. Now seq and qual have the same length in fastq file. But still the same segmentation fault problem. Same symptom as above. Use "bwa samse -n 2" can get output, and see some strange read names which are actually part of quality strings.

Could anyone help fix that?
zxl124 is offline   Reply With Quote
Old 09-17-2009, 02:08 AM   #15
fpruzius
Junior Member
 
Location: Utrecht, Netherlands

Join Date: Sep 2009
Posts: 4
Default Probable solution for segfault

I had the same problem when converting the alignment files to SAM format. I have a solution that works for me.

I used version 0.5.1 from BWA.

I'm not convinced that changing the quality value from -1 to 0 helps because the quality values are log values. And zero is not a log value. So I change every -1 and 0 in the quality files to 1.

I have written my own fastq transformation script in C and I tested it, no segmentation faults with 'bwa samse'.
However when I used the perl script on the same data I got segmentation faults.

The C script can create multiple smaller fastq files, because we align on a large cluster.

And the C script is 10 to 20 times faster than the perl script.

csfastaToFastq.tar.gz

Just run 'make' in the extracted folder.

Last edited by fpruzius; 12-20-2009 at 08:05 AM.
fpruzius is offline   Reply With Quote
Old 12-18-2009, 12:38 PM   #16
jperin
Member
 
Location: Philadelphia

Join Date: Feb 2009
Posts: 10
Default

Has anyone found any solution to this problem? I've just tried this C program, which seems to work well, but I am still getting the segmentation fault. I have 32GB of RAM on my system, so again its not memory.

[p@c0-0]$ bwa samse /share/apps/genome/human/bowtie/hg18/hg18.fa /data/Mk/FpMb.sai /data/Mk/FpMb.part1.fastq > /data/Mk/FpMb.2.sam
[bwa_aln_core] convert to sequence coordinate... 5.31 sec
[bwa_aln_core] refine gapped alignments... Segmentation fault
jperin is offline   Reply With Quote
Old 12-18-2009, 12:50 PM   #17
luisczul
Member
 
Location: Canada

Join Date: May 2009
Posts: 10
Default solution

The fastq file is the problem.

You need to use a third party script or program to convert your reads to a fastq files. For ex, for processing form the solid machine reads, on my case, the MAQ to fastq command didn't work. I had to use a third party program or script. Tofasta in these case.

Hope this works.
luisczul is offline   Reply With Quote
Old 12-18-2009, 01:14 PM   #18
jperin
Member
 
Location: Philadelphia

Join Date: Feb 2009
Posts: 10
Default

I tried the provided solid2fastq.pl script with both bwa and maq (they're the same). Diff'd various versions, but they're all the same. They all threw segmentation faults during the bwa samse step. I saw the last response posted about using the attached C program. That was my last failed attempt. The fastq file is in a different order completely so I can't quite tell whether they are much different. The file sizes, however, are quite different the BWA version giving me 6.4 gb roughly and the C version giving me 6.8gb of data. I don't see how QValues alone could make such a difference...

What other third party tools are there that convert csfasta and qval files to fastq? The BWA tool and the C version posted on this thread are the only ones I have been able to find... Thanks!
jperin is offline   Reply With Quote
Old 12-20-2009, 08:17 AM   #19
fpruzius
Junior Member
 
Location: Utrecht, Netherlands

Join Date: Sep 2009
Posts: 4
Exclamation C script fixed

I made a 'mistake' in de C script. Every 3rd line in a FASTQ file begins with a '+', and the rest of that line is an optional comment. However I put the name of the read there, but shorter than the first line '@'. During alignment this is no problem with BWA, however with MAQ and the postproccessing with BWA/SAMtools this gives segmentation errors.

I fixed this in the script. The '+' line contains now only that. And this also reason why the FASTQ file with this script is that much bigger than with the perl script. I'm using this script now for weeks and so far it has worked every time.

I changed the attachment above. However I'll repost it below too:

csfastaToFastq.tar.gz

And no, so far I also have not found any other means to convert (cs)fasta to Fastq.
fpruzius is offline   Reply With Quote
Old 01-28-2010, 06:45 AM   #20
javijevi
Member
 
Location: Spain

Join Date: Jan 2010
Posts: 38
Default solved for me

Quote:
Originally Posted by fpruzius View Post
I made a 'mistake' in de C script. Every 3rd line in a FASTQ file begins with a '+', and the rest of that line is an optional comment. However I put the name of the read there, but shorter than the first line '@'. During alignment this is no problem with BWA, however with MAQ and the postproccessing with BWA/SAMtools this gives segmentation errors.

I fixed this in the script. The '+' line contains now only that. And this also reason why the FASTQ file with this script is that much bigger than with the perl script. I'm using this script now for weeks and so far it has worked every time.

I changed the attachment above. However I'll repost it below too:

Attachment 213

And no, so far I also have not found any other means to convert (cs)fasta to Fastq.
This is just to share that I was having the same segfault error when running 'bwa aln' using a fastq file produced by solid2fastq (C version) script (BFAST 0.6.2a downloaded Jan/2010), in a 32 GB RAM machine with 2 quad-core Intel Xeon processors for a 800 MB reference genome and 300,000,000 25 bp-long SOLiD reads.

Using the last csfastaToFastq script provided by fpruzius to produce the fastq file solved the problem.

(Please, nilshomer, fix it in your great tool package distribution; it does not deserve such a disturbing, although minor, trouble).
javijevi is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:20 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO