SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Segfault in tophat_reports miseiler Bioinformatics 22 03-28-2014 05:41 AM
Diagnosing samtools segfault Bukowski Bioinformatics 3 11-04-2012 11:59 PM
maq map termiantes with segfault ben8seq Bioinformatics 0 05-17-2011 03:57 AM
bwa bug (?) leading to samtools segfault fpepin Bioinformatics 4 05-03-2011 10:49 AM
cuffdiff segfault Ichinichi Bioinformatics 0 03-05-2010 08:58 AM

Reply
 
Thread Tools
Old 10-29-2012, 03:34 PM   #1
rwhet052
Junior Member
 
Location: USA

Join Date: Jan 2011
Posts: 7
Default bwa 0.6.1-r104 segfault problem

Hi all,
I am working with a challenging genome (22 Gb haploid DNA content per nucleus, 18 Gb first-draft assembly in 31 million contigs), and using BWA 0.6.1 because it is the only short-read aligner I have found that can index a genome > 4 Gb. I used an AWS EC2 Cloudbiolinux instance (June 2012 release) with 68 Gb RAM to build the index, then saved it to an S3 bucket and terminated the instance. The version of BWA I used is based on what is available as a Ubuntu package with apt-get install.
I downloaded the index (as a .tgz archive) to my local Ubuntu 12.04 box (16 Gb RAM), unpacked it, and tried to align a fasta file of sample sequences (158,000 sequences), but bwa crashes after the line
[bwa_aln] 225bp reads: max_diff = 9 with the error
Segmentation fault (core dumped)

My local box is running the same version of BWA on essentially the same OS, but presumably different hardware from the AWS EC2 instance. It does not seem to be a problem with the .tgz archive, because I can download that to another Cloudbiolinux instance, unpack it, and map the same set of sample sequences to the index without problems.

Any suggestions for how to solve this would be greatly appreciated.
rwhet052 is offline   Reply With Quote
Old 10-30-2012, 03:55 AM   #2
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,059
Default

In the past when I had a problem with seg faults with bwa I ended up rebuilding the index on the machine where I was running the bwa.

Have you tried that or is 16GB on your local box not enough to re-build the indexes?
GenoMax is offline   Reply With Quote
Old 10-30-2012, 04:11 AM   #3
rwhet052
Junior Member
 
Location: USA

Join Date: Jan 2011
Posts: 7
Default

I used 'top' to monitor memory usage on the cloud instance where I built the index, and it showed > 50Gb of memory in use. My impression is that the bwtsw indexing algorithm requires at least as much memory as the size of the genome to be indexed, and I don't have that on my local machine.
I was hopeful that using the same version of bwa and the same OS would overcome any platform-specific issues. Is the program is sensitive to hardware configuration?
rwhet052 is offline   Reply With Quote
Old 10-30-2012, 04:38 AM   #4
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,059
Default

Quote:
Originally Posted by rwhet052 View Post
I used 'top' to monitor memory usage on the cloud instance where I built the index, and it showed > 50Gb of memory in use. My impression is that the bwtsw indexing algorithm requires at least as much memory as the size of the genome to be indexed, and I don't have that on my local machine.
I was hopeful that using the same version of bwa and the same OS would overcome any platform-specific issues. Is the program is sensitive to hardware configuration?
At least that was my experience in the past.

I suppose you could try to build indexes by splitting your genome into parts.
GenoMax is offline   Reply With Quote
Old 10-30-2012, 05:23 AM   #5
rwhet052
Junior Member
 
Location: USA

Join Date: Jan 2011
Posts: 7
Default

Splitting the genome into 6 files, each < 3 Gb, works fine, with the caveat that the "unique" flags for each mapped read apply only within the subset index file, so some additional merging and consolidation of results is required.

Unfortunately the samtools merge function is not intended for this sort of problem, because it assumes the same reference sequences are used to map different sets of reads, instead of different reference sequences being used to map the same set of reads.
rwhet052 is offline   Reply With Quote
Reply

Tags
bwa index, cloudbiolinux, large genome, short-read aligner

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:09 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO