Seqanswers Leaderboard Ad

**nilshomer** · 04-06-2010, 05:28 PM

Originally posted by jmartin View Post

I'm trying out the BFAST aligner for the first time today, and I'm having trouble getting the 'bfast index' step to run. I was able to successfully run 'bfast fasta2nrg' on my fasta reference db, but then each time I've tried running 'bfast index' on that file, I end up with 0-size .bif files.

Here are some details about what I am doing:

BFAST version: 0.6.3c
Linux OS, 64bit cpu, 8+ Gb memory

Reference db: Ensembl build 37 human genome (+ some novel regions from 2 of the BGI human genomes + some other stuff). Size ~4.5Gb. This fasta file is a little 'lopsided' in that 24 of the sequences (chr 1-22, X, Y) are VERY large, and then there are thousands of other, much smaller fasta sequences for the other components. Those 24 chromosomes make up ~4.4Gb of the total sequence in the file.

What I'm trying to run is:

bfast index -f <my fasta> -m 1111111111111111111111 -w 14 -i 1 -d 1

*note: The 'fasta.brg' file is sitting right next to the fasta I enter in the command line

When I run that the program starts with no problems, but it doesn't seem to finish. Here is a snippit of output I see on my screen:

************************************************************
Checking input parameters supplied by the user ...
Validating fastaFileName Human_build37_expanded_screening_db.100323.fna.
Validating tmpDir path ./.
Input arguments look good!
************************************************************
************************************************************
Printing Program Parameters:
programMode: [ExecuteProgram]
fastaFileName: Human_build37_expanded_screening_db.100323.fna
space: [NT Space]
mask: 1111111111111111111111
depth: 1
hashWidth: 14
indexNumber: 1
repeatMasker: [Not Using]
startContig: 0
startPos: 0
endContig: 2147483647
endPos: 2147483647
exonsFileName: [Not Using]
numThreads: 1
tmpDir: ./
timing: [Not Using]
************************************************************
************************************************************
Reading in reference genome from Human_build37_expanded_screening_db.100323.fna.nt.brg.
In total read 21291 contigs for a total of 4691431462 bases
************************************************************
Creating the index...
************************************************************
Warning: startContig was less than zero.
Defaulting to contig=1 and position=1.
************************************************************
************************************************************
Warning: endContig was greater than the number of contigs in the reference genome.
Defaulting to reference genome's end contig=21291 and position=1112.
************************************************************
[---21291,-------1112]
************************************************************
Creating index (bin 1/4)
Sorting...
0.263 percent complete

At that point the program seems to halt. I notice a few unsettling warning messages in there, but not being familiar with BFAST, I didn't know if they were anything I should be worried about.

Any advice would be appreciated.

You wont be able to make an index from a reference genome larger than "2^32" or 4294967296 bases in length. This is true of all aligners I can think of, or at least the most popular ones. hg18 does not have this problem, though if you search around this site, Heng Li (lh3) gave a link to the 1000 Genoems website that has an hg19 reference that is < 2^32 bases in length.

Also, for the human genome, you need 24GB of memory to create the indexes. If you have only 8GB, then use "-d 1". How much ram is "8+"?

**jmartin** · 04-06-2010, 08:14 PM

I was trying -d 1, I do have access to machines with 32Gb memory but they are usually busy and difficult to reserve.

But my problem must be my db size, thanks for the quick reply.

**nilshomer** · 04-06-2010, 08:16 PM

Originally posted by jmartin View Post

I was trying -d 1, I do have access to machines with 32Gb memory but they are usually busy and difficult to reserve.

But my problem must be my db size, thanks for the quick reply.

Let me know how it goes, I would be happy to help further.

**Fabrice ODEFREY** · 08-25-2010, 04:25 PM

Dear Nils,

I have a similar problem except that the size of hg19 I'm using is below 2^32 bases so it shouldn't be a problem. But the bif file stay empty after a few hours running. we are working on a cluster with nodes having 8 cores and 24gb of ram. here is the script:

#! /bin/bash
#parameter for PBS
#PBS -q smp
#PBS -l walltime=10:00:00
#PBS -l mem=24gb
#PBS -M [email]
#PBS -m abe
#PBS -N indexhg19mask1

#start of BFAST
module load bfast-gcc
cd $PBS_O_WORKDIR

#create the index from the ref genome
bfast index -f hg19.fa -n 8 -m 1111111111111111111111 -w 14 -i 1 -A 1

and here is the execution file:

************************************************************
Checking input parameters supplied by the user ...
Validating fastaFileName hg19.fa.
Validating tmpDir path ./.
Input arguments look good!
************************************************************
************************************************************
Printing Program Parameters:
programMode: [ExecuteProgram]
fastaFileName: hg19.fa
space: [Color Space]
mask: 1111111111111111111111
depth: 0
hashWidth: 14
indexNumber: 1
repeatMasker: [Not Using]
startContig: 0
startPos: 0
endContig: 2147483647
endPos: 2147483647
exonsFileName: [Not Using]
numThreads: 8
tmpDir: ./
timing: [Not Using]
************************************************************
************************************************************
Reading in reference genome from hg19.fa.cs.brg.
In total read 25 contigs for a total of 3095693983 bases
************************************************************
Creating the index...
************************************************************
Warning: startContig was less than zero.
Defaulting to contig=1 and position=1.
************************************************************
************************************************************
Warning: endContig was greater than the number of contigs in the reference genome.
Defaulting to reference genome's end contig=25 and position=16571.
************************************************************
Currently on [contig,pos]:
^M[-------0,----------0]^M[-------1,----1000000]^M[-------1,----2000000]^M[-------1,----3000000]
^M[-------1,----4000000]^M[-------1,----5000000]^M[-------1,----6000000]^M[-------1,----7000000]
^M[-------1,----8000000]^M[-------1,----9000000]^M[-------1,---10000000]^M[-------1,---11000000]
^M[-------1,---12000000]^M[-------1,---13000000]^M[-------1,---14000000]^M[-------1,---15000000]
^M[-------1,---16000000]^M[-------1,---17000000]^M[-------1,---18000000]^M[-------1,---19000000]
^M[-------1,---20000000]^M[-------1,---21000000]^M[-------1,---22000000]^M[-------1,---23000000]
^M[-------1,---24000000]^M[-------1,---25000000]^M[-------1,---26000000]^M[-------1,---27000000]
^M[-------1,---28000000]^M[-------1,---29000000]^M[-------1,---30000000]^M[-------1,---31000000]
^M[-------1,---32000000]^M[-------1,---33000000]^M[-------1,---34000000]^M[-------1,---35000000]
^M[-------1,---36000000]^M[-------1,---37000000]^M[-------1,---38000000]^M[-------1,---39000000]
^M[-------1,---40000000]^M[-------1,---41000000]^M[-------1,---42000000]^M[-------1,---43000000]
^M[-------1,---44000000]^M[-------1,---45000000]^M[-------1,---46000000]^M[-------1,---47000000]
^M[-------1,---48000000]^M[-------1,---49000000]^M[-------1,---50000000]^M[-------1,---51000000]
^M[-------1,---52000000]^M[-------1,---53000000]^M[-------1,---54000000]^M[-------1,---55000000]
^M[-------1,---56000000]^M[-------1,---57000000]^M[-------1,---58000000]^M[-------1,---59000000]
^M[-------1,---60000000]^M[-------1,---61000000]^M[-------1,---62000000]^M[-------1,---63000000]
^M[-------1,---64000000]^M[-------1,---65000000]^M[-------1,---66000000]^M[-------1,---67000000]
^M[-------1,---68000000]^M[-------1,---69000000]^M[-------1,---70000000]^M[-------1,---71000000]
^M[-------1,---72000000]^M[-------1,---73000000]^M[-------1,---74000000]^M[-------1,---75000000]
^M[-------1,---76000000]^M[-------1,---77000000]^M[-------1,---78000000]^M[-------1,---79000000]
^M[-------1,---80000000]^M[-------1,---81000000]^M[-------1,---82000000]^M[-------1,---83000000]
^M[-------1,---84000000]^M[-------1,---85000000]^M[-------1,---86000000]^M[-------1,---87000000]
^M[-------1,---88000000]^M[-------1,---89000000]^M[-------1,---90000000]^M[-------1,---91000000]
^M[-------1,---92000000]^M[-------1,---93000000]^M[-------1,---94000000]^M[-------1,---95000000]
^M[-------1,---96000000]^M[-------1,---97000000]^M[-------1,---98000000]^M[-------1,---99000000]
^M[-------1,--100000000]^M[-------1,--101000000]^M[-------1,--102000000]^M[-------1,--103000000]
^M[-------1,--104000000]^M[-------1,--105000000]^M[-------1,--106000000]^M[-------1,--107000000]
^M[-------1,--108000000]^M[-------1,--109000000]^M[-------1,--110000000]^M[-------1,--111000000]
^M[-------1,--112000000]^M[-------1,--113000000]^M[-------1,--114000000]^M[-------1,--115000000]
^M[-------1,--116000000]^M[-------1,--117000000]^M[-------1,--118000000]^M[-------1,--119000000]
^M[-------1,--120000000]^M[-------1,--121000000]^M[-------1,--122000000]^M[-------1,--123000000]
^M[-------1,--124000000]^M[-------1,--125000000]^M[-------1,--126000000]^M[-------1,--127000000]
^M[-------1,--128000000]^M[-------1,--129000000]^M[-------1,--130000000]^M[-------1,--131000000]
^M[-------1,--132000000]^M[-------1,--133000000]^M[-------1,--134000000]^M[-------1,--135000000]
^M[-------1,--136000000]^M[-------1,--137000000]^M[-------1,--138000000]^M[-------1,--139000000]
^M[-------1,--140000000]^M[-------1,--141000000]^M[-------1,--142000000]^M[-------1,--143000000]
^M[-------1,--144000000]^M[-------1,--145000000]^M[-------1,--146000000]^M[-------1,--147000000]
^M[-------1,--148000000]^M[-------1,--149000000]^M[-------1,--150000000]^M[-------1,--151000000]
^M[-------1,--152000000]^M[-------1,--153000000]^M[-------1,--154000000]^M[-------1,--155000000]
^M[-------1,--156000000]^M[-------1,--157000000]^M[-------1,--158000000]^M[-------1,--159000000]
^M[-------1,--160000000]^M[-------1,--161000000]^M[-------1,--162000000]^M[-------1,--163000000]
^M[-------1,--164000000]^M[-------1,--165000000]^M[-------1,--166000000]^M[-------1,--167000000]
^M[-------1,--168000000]^M[-------1,--169000000]^M[-------1,--170000000]^M[-------1,--171000000]
^M[-------1,

thanks in advance for your help!

**Fabrice ODEFREY** · 08-25-2010, 09:17 PM

problem solved!

**nilshomer** · 08-26-2010, 08:49 AM

Originally posted by Fabrice ODEFREY View Post

problem solved!

Just to help others, what was the solution?

**Fabrice ODEFREY** · 08-26-2010, 01:17 PM

yes of course. the solution was patience :-).
let me elaborate a bit more. the 1st time I tested bfast I ran some demo on a small genome (E.coli) and I could see my bif file being created in real time ( size of the file was increasing). But when running on the human genome the size was staying at 0, hence I assumed that nothing was happening. after 10 hours I stopped the process. Also I had assumed that since I was specifing in PBS the use of 1 node 8 cores (smp) it should transfert this info to bfast...but no. So by specifing the n option to 8 and being patient it did work even if I couldn't see my file being created. I assum that because of the size everything happen in a temp dir on the nodes...

**yasashiku** · 08-19-2011, 09:05 AM

Originally posted by Fabrice ODEFREY View Post

yes of course. the solution was patience :-).
let me elaborate a bit more. the 1st time I tested bfast I ran some demo on a small genome (E.coli) and I could see my bif file being created in real time ( size of the file was increasing). But when running on the human genome the size was staying at 0, hence I assumed that nothing was happening. after 10 hours I stopped the process. Also I had assumed that since I was specifing in PBS the use of 1 node 8 cores (smp) it should transfert this info to bfast...but no. So by specifing the n option to 8 and being patient it did work even if I couldn't see my file being created. I assum that because of the size everything happen in a temp dir on the nodes...

Thanks! I was having the same problem, and this helped a ton.

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Yesterday, 11:49 AM	0 responses 15 views 0 likes	Last Post by seqadmin Yesterday, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Trouble indexing reference db for BFAST

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News