SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
BFAST indexing phatjoe Bioinformatics 1 09-08-2011 08:39 PM
Trouble indexing reference db for BFAST jmartin Bioinformatics 8 08-19-2011 10:05 AM
Bfast Fatal Error during indexing fpruzius Bioinformatics 8 08-17-2011 08:26 PM
problem with indexing using bfast fjackling Bioinformatics 2 06-30-2011 06:19 PM
BFAST indexing memory requirements jmartin Bioinformatics 2 04-12-2010 09:42 AM

Reply
 
Thread Tools
Old 12-01-2011, 08:39 AM   #1
cllorens
Member
 
Location: Valencia

Join Date: Nov 2011
Posts: 44
Default A question about BFAST indexing

Hi guys,

This is Carlos from Valencia, I am a new member (congratulations for the site is fantastic).

Although I have some humble experience on bioinformatics, in next gene sequencing I am just starting and there are some things I still am unfamiliar. My first question is begginer´s question and it is about the BFAST indexing process if using the human genome hg19 as a case study.

I downloaded this genome and joined all the chromosomes in a single file installed BFAST and run fasta2brg to create the gen reference. I did that without troubles.

Then, i tried the indexing. First i read something here about how to do that and here is my question (in fact are two).

A priori nothing was wrong. I tried two things regarding masking and I guess I was able to create the bif file in both cases. However I do no find apparent differences in the command outputs of both processes. In short, the indexing was succesful but i am not sure if it was right.

First i tried with one mask

..assembling/tools/bfast/bfast-0.7.0a/bfast> ./bfast index -f hg19.fa -m 1111111111111111 -w 14 -i 1

and then tried 10 masks recommended in another post.

..assembling/tools/bfast/bfast-0.7.0a/bfast> ./bfast index -f hg19.fa -m 1111111111111111 11111110011110011111 111111111001111111 1111000111111111111 111101000110010011111111 1111111111110001111 1111100100111110011111 1111111110011001100111 1100111001110011111111 11110011011110010011111 -w 14 -i 10

Below is the output of the second case (10). The point is that with the exception of being much more larger in time (i only used 1 thread) than the process with one mask. The output is more or less similar. The tool gives you warning messages that apparently do not affect the run.

Question 1) Here I wonder about if this mesages are a normal part of the output or if I must check/amend something of my genome file in order to avoid these messages??

Question 2) the printing program parameters list (below) only list the first mask although this is the output of the 10 mask examples. Does this means that the tool only recognizes the first mask string or is just that it only prints (for simplicity sake) this first string???

Thank you in advande

Carlos

................................................


Checking input parameters supplied by the user ...
Validating fastaFileName hg19.fa.
Validating tmpDir path ./.
Input arguments look good!
************************************************************
************************************************************
Printing Program Parameters:
programMode: [ExecuteProgram]
fastaFileName: hg19.fa
space: [NT Space]
mask: 1111111111111111
depth: 0
hashWidth: 14
indexNumber: 10
repeatMasker: [Not Using]
startContig: 0
startPos: 0
endContig: 2147483647
endPos: 2147483647
exonsFileName: [Not Using]
numThreads: 1
tmpDir: ./
timing: [Not Using]
************************************************************
************************************************************
Reading in reference genome from hg19.fa.nt.brg.
In total read 93 contigs for a total of 3137161264 bases
************************************************************
Creating the index...
************************************************************
Warning: startContig was less than zero.
Defaulting to contig=1 and position=1.
************************************************************
************************************************************
Warning: endContig was greater than the number of contigs in the reference genome.
Defaulting to reference genome's end contig=93 and position=59373566.
************************************************************
Currently on [contig,pos]:
[------93,---59373566]
Sorting... 100.00 percent complete 100.000 percent complete
Sorted.
Creating a hash.
Pass 1 out of 2. Out of 2897303003, currently on:
2897303003
Pass 2 of 2. Out of 268435456, currently on:
268435456
Hash created.
Index created.
Index size is 14.492GB.
Terminating successfully!
************************************************************
cllorens is offline   Reply With Quote
Old 12-01-2011, 11:30 AM   #2
Chipper
Senior Member
 
Location: Sweden

Join Date: Mar 2008
Posts: 324
Default

You need to run bfast index independently for each mask. Each index file is the same size, ~14 Gb. In your example only the first mask was used to create index 10.
Chipper is offline   Reply With Quote
Old 12-01-2011, 12:02 PM   #3
cllorens
Member
 
Location: Valencia

Join Date: Nov 2011
Posts: 44
Default

Ok chipper, got it, thank you for the clarification. I will try to see if the tool allows me to create a little command file to automate this step for each mask and save them in distinct files. If not i´ll do it independently for each mask, you say.
cllorens is offline   Reply With Quote
Old 12-01-2011, 04:35 PM   #4
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Here's a shell script I hope works
Code:
#!/bin/sh
bfast fasta2brg -f hg19.fasta
I=1;
for MASK in 1111111111111111 11111110011110011111 111111111001111111 1111000111111111111 111101000110010011111111 1111111111110001111 1111100100111110011111 1111111110011001100111 1100111001110011111111 11110011011110010011111
do
    bfast index -f hg19.fasta -i ${I} -w 14 -m ${MASK} -n <num threads>;
    I=`echo ${I} + 1`;
done
nilshomer is offline   Reply With Quote
Old 12-02-2011, 07:59 AM   #5
cllorens
Member
 
Location: Valencia

Join Date: Nov 2011
Posts: 44
Default

Nils
It is still running but it seems it works.
Thank you
cllorens is offline   Reply With Quote
Old 12-08-2011, 07:32 AM   #6
cllorens
Member
 
Location: Valencia

Join Date: Nov 2011
Posts: 44
Default

Hi Nils
Just a correction for users having the same starting need. As written, the script above failed just because the bif index file created using the first mask had the same name to that expected to create based on the second mask. Already existing (in regards of name) when trying to create the second index the scripts is aborted with the message.
file "blablabla" already exists.

To solve this, I paste here an easy modification of the nilshomer script to let it finish the loop etc. At least in my computers it works. Hope it to be useful to others.

-----------------------

#!/bin/bash
bfast fasta2brg -f hg19.fasta;
I=1;
for MASK in 1111111111111111 11111110011110011111 111111111001111111 1111000111111111111 111101000110010011111111 1111111111110001111 1111100100111110011111 1111111110011001100111 1100111001110011111111 11110011011110010011111
do
bfast index -f hg19.fasta -i $I -w 14 -m $MASK -n 4;
mv hg19.fasta.nt.1.1.bif hg19.fasta.nt.$I.bif
let I=I+1;
done

-----------------
cllorens is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:46 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO