I'm trying out the BFAST aligner for the first time today, and I'm having trouble getting the 'bfast index' step to run. I was able to successfully run 'bfast fasta2nrg' on my fasta reference db, but then each time I've tried running 'bfast index' on that file, I end up with 0-size .bif files.
Here are some details about what I am doing:
BFAST version: 0.6.3c
Linux OS, 64bit cpu, 8+ Gb memory
Reference db: Ensembl build 37 human genome (+ some novel regions from 2 of the BGI human genomes + some other stuff). Size ~4.5Gb. This fasta file is a little 'lopsided' in that 24 of the sequences (chr 1-22, X, Y) are VERY large, and then there are thousands of other, much smaller fasta sequences for the other components. Those 24 chromosomes make up ~4.4Gb of the total sequence in the file.
What I'm trying to run is:
bfast index -f <my fasta> -m 1111111111111111111111 -w 14 -i 1 -d 1
*note: The 'fasta.brg' file is sitting right next to the fasta I enter in the command line
When I run that the program starts with no problems, but it doesn't seem to finish. Here is a snippit of output I see on my screen:
************************************************************
Checking input parameters supplied by the user ...
Validating fastaFileName Human_build37_expanded_screening_db.100323.fna.
Validating tmpDir path ./.
Input arguments look good!
************************************************************
************************************************************
Printing Program Parameters:
programMode: [ExecuteProgram]
fastaFileName: Human_build37_expanded_screening_db.100323.fna
space: [NT Space]
mask: 1111111111111111111111
depth: 1
hashWidth: 14
indexNumber: 1
repeatMasker: [Not Using]
startContig: 0
startPos: 0
endContig: 2147483647
endPos: 2147483647
exonsFileName: [Not Using]
numThreads: 1
tmpDir: ./
timing: [Not Using]
************************************************************
************************************************************
Reading in reference genome from Human_build37_expanded_screening_db.100323.fna.nt.brg.
In total read 21291 contigs for a total of 4691431462 bases
************************************************************
Creating the index...
************************************************************
Warning: startContig was less than zero.
Defaulting to contig=1 and position=1.
************************************************************
************************************************************
Warning: endContig was greater than the number of contigs in the reference genome.
Defaulting to reference genome's end contig=21291 and position=1112.
************************************************************
[---21291,-------1112]
************************************************************
Creating index (bin 1/4)
Sorting...
0.263 percent complete
At that point the program seems to halt. I notice a few unsettling warning messages in there, but not being familiar with BFAST, I didn't know if they were anything I should be worried about.
Any advice would be appreciated.
Here are some details about what I am doing:
BFAST version: 0.6.3c
Linux OS, 64bit cpu, 8+ Gb memory
Reference db: Ensembl build 37 human genome (+ some novel regions from 2 of the BGI human genomes + some other stuff). Size ~4.5Gb. This fasta file is a little 'lopsided' in that 24 of the sequences (chr 1-22, X, Y) are VERY large, and then there are thousands of other, much smaller fasta sequences for the other components. Those 24 chromosomes make up ~4.4Gb of the total sequence in the file.
What I'm trying to run is:
bfast index -f <my fasta> -m 1111111111111111111111 -w 14 -i 1 -d 1
*note: The 'fasta.brg' file is sitting right next to the fasta I enter in the command line
When I run that the program starts with no problems, but it doesn't seem to finish. Here is a snippit of output I see on my screen:
************************************************************
Checking input parameters supplied by the user ...
Validating fastaFileName Human_build37_expanded_screening_db.100323.fna.
Validating tmpDir path ./.
Input arguments look good!
************************************************************
************************************************************
Printing Program Parameters:
programMode: [ExecuteProgram]
fastaFileName: Human_build37_expanded_screening_db.100323.fna
space: [NT Space]
mask: 1111111111111111111111
depth: 1
hashWidth: 14
indexNumber: 1
repeatMasker: [Not Using]
startContig: 0
startPos: 0
endContig: 2147483647
endPos: 2147483647
exonsFileName: [Not Using]
numThreads: 1
tmpDir: ./
timing: [Not Using]
************************************************************
************************************************************
Reading in reference genome from Human_build37_expanded_screening_db.100323.fna.nt.brg.
In total read 21291 contigs for a total of 4691431462 bases
************************************************************
Creating the index...
************************************************************
Warning: startContig was less than zero.
Defaulting to contig=1 and position=1.
************************************************************
************************************************************
Warning: endContig was greater than the number of contigs in the reference genome.
Defaulting to reference genome's end contig=21291 and position=1112.
************************************************************
[---21291,-------1112]
************************************************************
Creating index (bin 1/4)
Sorting...
0.263 percent complete
At that point the program seems to halt. I notice a few unsettling warning messages in there, but not being familiar with BFAST, I didn't know if they were anything I should be worried about.
Any advice would be appreciated.
Comment