Hi guys,
This is Carlos from Valencia, I am a new member (congratulations for the site is fantastic).
Although I have some humble experience on bioinformatics, in next gene sequencing I am just starting and there are some things I still am unfamiliar. My first question is begginer´s question and it is about the BFAST indexing process if using the human genome hg19 as a case study.
I downloaded this genome and joined all the chromosomes in a single file installed BFAST and run fasta2brg to create the gen reference. I did that without troubles.
Then, i tried the indexing. First i read something here about how to do that and here is my question (in fact are two).
A priori nothing was wrong. I tried two things regarding masking and I guess I was able to create the bif file in both cases. However I do no find apparent differences in the command outputs of both processes. In short, the indexing was succesful but i am not sure if it was right.
First i tried with one mask
..assembling/tools/bfast/bfast-0.7.0a/bfast> ./bfast index -f hg19.fa -m 1111111111111111 -w 14 -i 1
and then tried 10 masks recommended in another post.
..assembling/tools/bfast/bfast-0.7.0a/bfast> ./bfast index -f hg19.fa -m 1111111111111111 11111110011110011111 111111111001111111 1111000111111111111 111101000110010011111111 1111111111110001111 1111100100111110011111 1111111110011001100111 1100111001110011111111 11110011011110010011111 -w 14 -i 10
Below is the output of the second case (10). The point is that with the exception of being much more larger in time (i only used 1 thread) than the process with one mask. The output is more or less similar. The tool gives you warning messages that apparently do not affect the run.
Question 1) Here I wonder about if this mesages are a normal part of the output or if I must check/amend something of my genome file in order to avoid these messages??
Question 2) the printing program parameters list (below) only list the first mask although this is the output of the 10 mask examples. Does this means that the tool only recognizes the first mask string or is just that it only prints (for simplicity sake) this first string???
Thank you in advande
Carlos
................................................
Checking input parameters supplied by the user ...
Validating fastaFileName hg19.fa.
Validating tmpDir path ./.
Input arguments look good!
************************************************************
************************************************************
Printing Program Parameters:
programMode: [ExecuteProgram]
fastaFileName: hg19.fa
space: [NT Space]
mask: 1111111111111111
depth: 0
hashWidth: 14
indexNumber: 10
repeatMasker: [Not Using]
startContig: 0
startPos: 0
endContig: 2147483647
endPos: 2147483647
exonsFileName: [Not Using]
numThreads: 1
tmpDir: ./
timing: [Not Using]
************************************************************
************************************************************
Reading in reference genome from hg19.fa.nt.brg.
In total read 93 contigs for a total of 3137161264 bases
************************************************************
Creating the index...
************************************************************
Warning: startContig was less than zero.
Defaulting to contig=1 and position=1.
************************************************************
************************************************************
Warning: endContig was greater than the number of contigs in the reference genome.
Defaulting to reference genome's end contig=93 and position=59373566.
************************************************************
Currently on [contig,pos]:
[------93,---59373566]
Sorting... 100.00 percent complete 100.000 percent complete
Sorted.
Creating a hash.
Pass 1 out of 2. Out of 2897303003, currently on:
2897303003
Pass 2 of 2. Out of 268435456, currently on:
268435456
Hash created.
Index created.
Index size is 14.492GB.
Terminating successfully!
************************************************************
This is Carlos from Valencia, I am a new member (congratulations for the site is fantastic).
Although I have some humble experience on bioinformatics, in next gene sequencing I am just starting and there are some things I still am unfamiliar. My first question is begginer´s question and it is about the BFAST indexing process if using the human genome hg19 as a case study.
I downloaded this genome and joined all the chromosomes in a single file installed BFAST and run fasta2brg to create the gen reference. I did that without troubles.
Then, i tried the indexing. First i read something here about how to do that and here is my question (in fact are two).
A priori nothing was wrong. I tried two things regarding masking and I guess I was able to create the bif file in both cases. However I do no find apparent differences in the command outputs of both processes. In short, the indexing was succesful but i am not sure if it was right.
First i tried with one mask
..assembling/tools/bfast/bfast-0.7.0a/bfast> ./bfast index -f hg19.fa -m 1111111111111111 -w 14 -i 1
and then tried 10 masks recommended in another post.
..assembling/tools/bfast/bfast-0.7.0a/bfast> ./bfast index -f hg19.fa -m 1111111111111111 11111110011110011111 111111111001111111 1111000111111111111 111101000110010011111111 1111111111110001111 1111100100111110011111 1111111110011001100111 1100111001110011111111 11110011011110010011111 -w 14 -i 10
Below is the output of the second case (10). The point is that with the exception of being much more larger in time (i only used 1 thread) than the process with one mask. The output is more or less similar. The tool gives you warning messages that apparently do not affect the run.
Question 1) Here I wonder about if this mesages are a normal part of the output or if I must check/amend something of my genome file in order to avoid these messages??
Question 2) the printing program parameters list (below) only list the first mask although this is the output of the 10 mask examples. Does this means that the tool only recognizes the first mask string or is just that it only prints (for simplicity sake) this first string???
Thank you in advande
Carlos
................................................
Checking input parameters supplied by the user ...
Validating fastaFileName hg19.fa.
Validating tmpDir path ./.
Input arguments look good!
************************************************************
************************************************************
Printing Program Parameters:
programMode: [ExecuteProgram]
fastaFileName: hg19.fa
space: [NT Space]
mask: 1111111111111111
depth: 0
hashWidth: 14
indexNumber: 10
repeatMasker: [Not Using]
startContig: 0
startPos: 0
endContig: 2147483647
endPos: 2147483647
exonsFileName: [Not Using]
numThreads: 1
tmpDir: ./
timing: [Not Using]
************************************************************
************************************************************
Reading in reference genome from hg19.fa.nt.brg.
In total read 93 contigs for a total of 3137161264 bases
************************************************************
Creating the index...
************************************************************
Warning: startContig was less than zero.
Defaulting to contig=1 and position=1.
************************************************************
************************************************************
Warning: endContig was greater than the number of contigs in the reference genome.
Defaulting to reference genome's end contig=93 and position=59373566.
************************************************************
Currently on [contig,pos]:
[------93,---59373566]
Sorting... 100.00 percent complete 100.000 percent complete
Sorted.
Creating a hash.
Pass 1 out of 2. Out of 2897303003, currently on:
2897303003
Pass 2 of 2. Out of 268435456, currently on:
268435456
Hash created.
Index created.
Index size is 14.492GB.
Terminating successfully!
************************************************************
Comment