![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Segfault in tophat_reports | miseiler | Bioinformatics | 22 | 03-28-2014 05:41 AM |
Diagnosing samtools segfault | Bukowski | Bioinformatics | 3 | 11-04-2012 11:59 PM |
maq map termiantes with segfault | ben8seq | Bioinformatics | 0 | 05-17-2011 03:57 AM |
bwa bug (?) leading to samtools segfault | fpepin | Bioinformatics | 4 | 05-03-2011 10:49 AM |
cuffdiff segfault | Ichinichi | Bioinformatics | 0 | 03-05-2010 08:58 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Junior Member
Location: USA Join Date: Jan 2011
Posts: 7
|
![]()
Hi all,
I am working with a challenging genome (22 Gb haploid DNA content per nucleus, 18 Gb first-draft assembly in 31 million contigs), and using BWA 0.6.1 because it is the only short-read aligner I have found that can index a genome > 4 Gb. I used an AWS EC2 Cloudbiolinux instance (June 2012 release) with 68 Gb RAM to build the index, then saved it to an S3 bucket and terminated the instance. The version of BWA I used is based on what is available as a Ubuntu package with apt-get install. I downloaded the index (as a .tgz archive) to my local Ubuntu 12.04 box (16 Gb RAM), unpacked it, and tried to align a fasta file of sample sequences (158,000 sequences), but bwa crashes after the line [bwa_aln] 225bp reads: max_diff = 9 with the error Segmentation fault (core dumped) My local box is running the same version of BWA on essentially the same OS, but presumably different hardware from the AWS EC2 instance. It does not seem to be a problem with the .tgz archive, because I can download that to another Cloudbiolinux instance, unpack it, and map the same set of sample sequences to the index without problems. Any suggestions for how to solve this would be greatly appreciated. |
![]() |
![]() |
![]() |
#2 |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 7,143
|
![]()
In the past when I had a problem with seg faults with bwa I ended up rebuilding the index on the machine where I was running the bwa.
Have you tried that or is 16GB on your local box not enough to re-build the indexes? |
![]() |
![]() |
![]() |
#3 |
Junior Member
Location: USA Join Date: Jan 2011
Posts: 7
|
![]()
I used 'top' to monitor memory usage on the cloud instance where I built the index, and it showed > 50Gb of memory in use. My impression is that the bwtsw indexing algorithm requires at least as much memory as the size of the genome to be indexed, and I don't have that on my local machine.
I was hopeful that using the same version of bwa and the same OS would overcome any platform-specific issues. Is the program is sensitive to hardware configuration? |
![]() |
![]() |
![]() |
#4 | |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 7,143
|
![]() Quote:
I suppose you could try to build indexes by splitting your genome into parts. |
|
![]() |
![]() |
![]() |
#5 |
Junior Member
Location: USA Join Date: Jan 2011
Posts: 7
|
![]()
Splitting the genome into 6 files, each < 3 Gb, works fine, with the caveat that the "unique" flags for each mapped read apply only within the subset index file, so some additional merging and consolidation of results is required.
Unfortunately the samtools merge function is not intended for this sort of problem, because it assumes the same reference sequences are used to map different sets of reads, instead of different reference sequences being used to map the same set of reads. |
![]() |
![]() |
![]() |
Tags |
bwa index, cloudbiolinux, large genome, short-read aligner |
Thread Tools | |
|
|