SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics

Similar Threads
Thread Thread Starter Forum Replies Last Post
BFAST - Help cutcopy11 SOLiD 17 10-17-2013 08:12 AM
BFAST fasta2brg help Esther Bioinformatics 4 05-10-2011 01:29 PM
BFAST bfast.submit.pl configuration epigen Bioinformatics 1 03-18-2011 07:51 AM
SHRiMP vs BFAST mathieu Bioinformatics 21 11-04-2010 12:55 AM
BFAST thanks you for your help! (was: ... needs your help) nilshomer Bioinformatics 5 04-21-2010 08:29 PM

Reply
 
Thread Tools
Old 10-12-2009, 12:00 PM   #1
jsun529
Member
 
Location: US

Join Date: Apr 2009
Posts: 52
Default Bfast

Can any one share a quick guide on how to install and run BFAST for SOLiD data? The book does not give a clear instruction so does the readme file. If we do target resequencing, not the complete genome rather genes how should the reference file be prepared. Thanks much in advance.
jsun529 is offline   Reply With Quote
Old 10-12-2009, 04:07 PM   #2
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Quote:
Originally Posted by jsun529 View Post
Can any one share a quick guide on how to install and run BFAST for SOLiD data? The book does not give a clear instruction so does the readme file. If we do target resequencing, not the complete genome rather genes how should the reference file be prepared. Thanks much in advance.
If you find something unclear, please let us know so we can improve the manual (email: bfast-help@lists.sourceforge.net). For targetting resequencing, you must consider whether to use the full reference, or only the targetted region. The former is more computationally expensive, while the latter may have biases if your targetting was off.

The easiest way to run BFAST in a "targetted" mode is to create indexes that only use the targetted regions. This can be accomplished using the "-x" option in when running "bfast index". Example input files can be found in the manual.
nilshomer is offline   Reply With Quote
Old 10-13-2009, 07:06 AM   #3
jsun529
Member
 
Location: US

Join Date: Apr 2009
Posts: 52
Default

So, the bfast-book or the readme did not say how to install the program, also for the ABI solid example section 7.1.2, for the bpreprocess or bmatches, could not find any of this command from the source code downloaded through sourceforge for the latest version?.
jsun529 is offline   Reply With Quote
Old 10-13-2009, 07:36 AM   #4
jsun529
Member
 
Location: US

Join Date: Apr 2009
Posts: 52
Default

on the INSTALL page it said refer to the man pages for documentation

doc/bpreprocess.1

but where is this doc folder, this is really confusing and not user friendly and it would be better everything are in one place, say the bfast book or something.

Thanks,
jsun529 is offline   Reply With Quote
Old 10-13-2009, 12:34 PM   #5
jsun529
Member
 
Location: US

Join Date: Apr 2009
Posts: 52
Default

I have the preprocess running however i got this error message when I trying to create the indexes, any one know how to fix this? Thanks.(I have fragment data not mate pair)

bfast-0.5.6/bpreprocess/bpreprocess -r bfast.rg.file.OPA1.1.brg -i layouts.txt -a 1 -A 1 -o OPA1 -d ./ -T ./
************************************************************
Checking input parameters supplied by the user ...
Validating rgFileName bfast.rg.file.OPA1.1.brg.
Validating indexLayoutFileName layouts.txt.
Validating exonsFileName Default.txt.
Validating outputID OPA1.
Validating outputDir ./.
Validating tmpDir path ./.
Input arguments look good!
************************************************************
************************************************************
Printing Program Parameters:
programMode: 1 [ExecuteProgram]
rgFileName: bfast.rg.file.OPA1.1.brg
algorithm: 1
space: 1
indexLayoutFileName: layouts.txt
repeatMasker: 0
startContig: 0
startPos: 0
endContig: 2147483647
endPos: 2147483647
exonsFileName: Default.txt
numThreads: 1
outputID: OPA1
outputDir: ./
tmpDir: ./
timing: 0
************************************************************
************************************************************
Reading in reference genome from bfast.rg.file.OPA1.1.brg.
In total read 1 contigs for a total of 104204 bases
************************************************************
************************************************************
In function "RGIndexLayoutRead": Fatal Error[OpenFileError]. Message: Could not open index layout file reading.
***** Exiting due to errors *****
************************************************************
jsun529 is offline   Reply With Quote
Old 10-13-2009, 12:48 PM   #6
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Quote:
Originally Posted by jsun529 View Post
I have the preprocess running however i got this error message when I trying to create the indexes, any one know how to fix this? Thanks.(I have fragment data not mate pair)

bfast-0.5.6/bpreprocess/bpreprocess -r bfast.rg.file.OPA1.1.brg -i layouts.txt -a 1 -A 1 -o OPA1 -d ./ -T ./
************************************************************
Checking input parameters supplied by the user ...
Validating rgFileName bfast.rg.file.OPA1.1.brg.
Validating indexLayoutFileName layouts.txt.
Validating exonsFileName Default.txt.
Validating outputID OPA1.
Validating outputDir ./.
Validating tmpDir path ./.
Input arguments look good!
************************************************************
************************************************************
Printing Program Parameters:
programMode: 1 [ExecuteProgram]
rgFileName: bfast.rg.file.OPA1.1.brg
algorithm: 1
space: 1
indexLayoutFileName: layouts.txt
repeatMasker: 0
startContig: 0
startPos: 0
endContig: 2147483647
endPos: 2147483647
exonsFileName: Default.txt
numThreads: 1
outputID: OPA1
outputDir: ./
tmpDir: ./
timing: 0
************************************************************
************************************************************
Reading in reference genome from bfast.rg.file.OPA1.1.brg.
In total read 1 contigs for a total of 104204 bases
************************************************************
************************************************************
In function "RGIndexLayoutRead": Fatal Error[OpenFileError]. Message: Could not open index layout file reading.
***** Exiting due to errors *****
************************************************************
It looks like your layout file is not properly formatted (there are examples in the manual!). Please post the layout file when you get the chance.

I would suggests upgrading to the 0.6.0 version as the user interface has been improved.
nilshomer is offline   Reply With Quote
Old 10-13-2009, 01:02 PM   #7
jsun529
Member
 
Location: US

Join Date: Apr 2009
Posts: 52
Default

Thanks, what would be a good index layout for 50bp SOLiD reads?
jsun529 is offline   Reply With Quote
Old 10-13-2009, 01:11 PM   #8
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Recommended settings can be found in the manual.
nilshomer is offline   Reply With Quote
Old 10-14-2009, 06:06 AM   #9
jsun529
Member
 
Location: US

Join Date: Apr 2009
Posts: 52
Default

Thanks. If I were to look for a 20 ~60bp deletion or insertion, how would this reflect to the score matrix for the gap penalty,etc. I assume the example is generic, did not specify.

Best,
jsun529 is offline   Reply With Quote
Old 10-14-2009, 06:42 AM   #10
jsun529
Member
 
Location: US

Join Date: Apr 2009
Posts: 52
Default

Also what would be a good/valid offset for 50bp solid data the example goes with
1 3 5 7 9 11 13 15 17 19
Will this be good enough?
jsun529 is offline   Reply With Quote
Old 10-14-2009, 06:55 AM   #11
jsun529
Member
 
Location: US

Join Date: Apr 2009
Posts: 52
Default

Ok, when running the bmatches I got this error message using the example settings.

sudo bfast-0.5.6/bmatches/bmatches -r bfast.rg.file.OPA1.1.brg -i main.indexes.txt -I secondary.indexes.txt -R OPA1F3.1.fastq -O offsets.txt -A 1 -K 8 -M 384 -o OPA1.1 -d ./ -T ./
************************************************************
Checking input parameters supplied by the user ...
Validating rgFileName bfast.rg.file.OPA1.1.brg.
Validating bfastMainIndexesFileName main.indexes.txt.
Validating bfastSecondaryIndexesFileName secondary.indexes.txt.
Validating readsFileName OPA1F3.1.fastq.
Validating offsetsFileName offsets.txt.
Validating outputID OPA1.1.
Validating outputDir ./.
Validating tmpDir path ./.
**** Input arguments look good!
************************************************************
************************************************************
Printing Program Parameters:
programMode: 1 [ExecuteProgram]
rgFileName: bfast.rg.file.OPA1.1.brg
bfastMainIndexesFileName main.indexes.txt
bfastSecondaryIndexesFileName secondary.indexes.txt
readsFileName: OPA1F3.1.fastq
offsetsFileName: offsets.txt
space: 1
startReadNum: -1
endReadNum: -1
keySize: 0
maxKeyMatches: 8
maxNumMatches: 384
whichStrand: 0 [BothStrands]
numThreads: 1
queueLength: 100000
outputID: OPA1.1
outputDir: ./
tmpDir: ./
timing: 0
************************************************************
************************************************************
Reading in reference genome from bfast.rg.file.OPA1.1.brg.
In total read 1 contigs for a total of 104204 bases
************************************************************
Reading OPA1F3.1.fastq into temp files.
Will process 1933855 reads.
************************************************************
Will output to ./bfast.matches.file.OPA1.1.bmf.
************************************************************
Processing 1933855 reads using 20 main indexes.
************************************************************
Searching index 1 out of 20...
Reading index from 14.
************************************************************
In function "RGIndexRead": Fatal Error[OpenFileError]. Variable/Value: 14.
Message: Could not open rgIndexFileName for reading.
***** Exiting due to errors *****
************************************************************

any suggestions? Thanks,
jsun529 is offline   Reply With Quote
Old 10-14-2009, 09:43 AM   #12
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Quote:
Originally Posted by jsun529 View Post
Ok, when running the bmatches I got this error message using the example settings.

sudo bfast-0.5.6/bmatches/bmatches -r bfast.rg.file.OPA1.1.brg -i main.indexes.txt -I secondary.indexes.txt -R OPA1F3.1.fastq -O offsets.txt -A 1 -K 8 -M 384 -o OPA1.1 -d ./ -T ./
************************************************************
Checking input parameters supplied by the user ...
Validating rgFileName bfast.rg.file.OPA1.1.brg.
Validating bfastMainIndexesFileName main.indexes.txt.
Validating bfastSecondaryIndexesFileName secondary.indexes.txt.
Validating readsFileName OPA1F3.1.fastq.
Validating offsetsFileName offsets.txt.
Validating outputID OPA1.1.
Validating outputDir ./.
Validating tmpDir path ./.
**** Input arguments look good!
************************************************************
************************************************************
Printing Program Parameters:
programMode: 1 [ExecuteProgram]
rgFileName: bfast.rg.file.OPA1.1.brg
bfastMainIndexesFileName main.indexes.txt
bfastSecondaryIndexesFileName secondary.indexes.txt
readsFileName: OPA1F3.1.fastq
offsetsFileName: offsets.txt
space: 1
startReadNum: -1
endReadNum: -1
keySize: 0
maxKeyMatches: 8
maxNumMatches: 384
whichStrand: 0 [BothStrands]
numThreads: 1
queueLength: 100000
outputID: OPA1.1
outputDir: ./
tmpDir: ./
timing: 0
************************************************************
************************************************************
Reading in reference genome from bfast.rg.file.OPA1.1.brg.
In total read 1 contigs for a total of 104204 bases
************************************************************
Reading OPA1F3.1.fastq into temp files.
Will process 1933855 reads.
************************************************************
Will output to ./bfast.matches.file.OPA1.1.bmf.
************************************************************
Processing 1933855 reads using 20 main indexes.
************************************************************
Searching index 1 out of 20...
Reading index from 14.
************************************************************
In function "RGIndexRead": Fatal Error[OpenFileError]. Variable/Value: 14.
Message: Could not open rgIndexFileName for reading.
***** Exiting due to errors *****
************************************************************

any suggestions? Thanks,
Again, please upgrade to bfast-0.6.0d

1. I would recommend using all possible offsets.

2. Long gaps 60bp will be difficult to detect with short reads directly.

3. It looks like the "main.indexes.txt" file is incorrectly formatted. It should just be file paths.
nilshomer is offline   Reply With Quote
Old 10-14-2009, 11:59 AM   #13
jsun529
Member
 
Location: US

Join Date: Apr 2009
Posts: 52
Default

Thanks. This will be my last question for the test run I think :-)
when I do this bpostprocess as recommended by the book I got these error message as

ocalhost:~/Desktop/OPA1.BFAST # bfast-0.5.6/bpostprocess/bpostprocess -i bfast.aligned.file.OPA1.baf -r bfast.rg.file.OPA1.0.brg -a 3 -o OPA1 -d ./ -O 3
************************************************************
Checking input parameters supplied by the user ...
Validating rgFileName bfast.rg.file.OPA1.0.brg.
Validating alignFileName bfast.aligned.file.OPA1.baf.
Validating outputID OPA1.
Validating outputDir path ./.
Input arguments look good!
************************************************************
************************************************************
Printing Program Parameters:
programMode: 1 [ExecuteProgram]
rgFileName: bfast.rg.file.OPA1.0.brg
alignFileName: bfast.aligned.file.OPA1.baf
algorithm: 3 [Best Score]
queueLength: 10000
outputID: OPA1
outputDir: ./
outputFormat: 6
timing: 0
************************************************************
************************************************************
Reading in reference genome from bfast.rg.file.OPA1.0.brg.
In total read 1 contigs for a total of 104204 bases
************************************************************
Processing reads, currently on:
0************************************************************
In function "ConvertReadFromColorSpace": Fatal Error[OutOfRange]. Variable/Value: read.
Message: Could not convert base and color.
***** Exiting due to errors *****
************************************************************
not sure which part cause this? Thanks
jsun529 is offline   Reply With Quote
Old 10-14-2009, 12:03 PM   #14
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Quote:
Originally Posted by jsun529 View Post
Thanks. This will be my last question for the test run I think :-)
when I do this bpostprocess as recommended by the book I got these error message as

ocalhost:~/Desktop/OPA1.BFAST # bfast-0.5.6/bpostprocess/bpostprocess -i bfast.aligned.file.OPA1.baf -r bfast.rg.file.OPA1.0.brg -a 3 -o OPA1 -d ./ -O 3
************************************************************
Checking input parameters supplied by the user ...
Validating rgFileName bfast.rg.file.OPA1.0.brg.
Validating alignFileName bfast.aligned.file.OPA1.baf.
Validating outputID OPA1.
Validating outputDir path ./.
Input arguments look good!
************************************************************
************************************************************
Printing Program Parameters:
programMode: 1 [ExecuteProgram]
rgFileName: bfast.rg.file.OPA1.0.brg
alignFileName: bfast.aligned.file.OPA1.baf
algorithm: 3 [Best Score]
queueLength: 10000
outputID: OPA1
outputDir: ./
outputFormat: 6
timing: 0
************************************************************
************************************************************
Reading in reference genome from bfast.rg.file.OPA1.0.brg.
In total read 1 contigs for a total of 104204 bases
************************************************************
Processing reads, currently on:
0************************************************************
In function "ConvertReadFromColorSpace": Fatal Error[OutOfRange]. Variable/Value: read.
Message: Could not convert base and color.
***** Exiting due to errors *****
************************************************************
not sure which part cause this? Thanks
This seems to be a bug in bfast version 0.5.6. I am really trying to get you to upgrade to the latest version (0.6.0d) as most of these problems and pitfalls are gone!
nilshomer is offline   Reply With Quote
Old 10-15-2009, 10:15 AM   #15
jsun529
Member
 
Location: US

Join Date: Apr 2009
Posts: 52
Default

when i use samtools.pl varFilter to filter the variant and snps, it seemed the program have a cap on the maximum read depth, it filtered most of the candidates that has higher coverage that should not be filtered, however when I try to redefine the D through command line arg, the program return nothing after filtering, any one know how to fix it?

Thanks
jsun529 is offline   Reply With Quote
Old 10-15-2009, 01:40 PM   #16
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Quote:
Originally Posted by jsun529 View Post
when i use samtools.pl varFilter to filter the variant and snps, it seemed the program have a cap on the maximum read depth, it filtered most of the candidates that has higher coverage that should not be filtered, however when I try to redefine the D through command line arg, the program return nothing after filtering, any one know how to fix it?

Thanks
Try posting to the samtools help list (samtools-help@lists.sourceforge.net) or creating a new thread, since this is not related to the original question.
nilshomer is offline   Reply With Quote
Old 10-16-2009, 06:27 AM   #17
jsun529
Member
 
Location: US

Join Date: Apr 2009
Posts: 52
Default

Thank you very much for all your help, since I am very new to this BFAST program, The program runs pretty fast and with the new version it runs pretty smoothly. For my test run on solid, when I use the sam tools to call variants, the result was not what I expected, I think it comes to both side, one is the samtools, another is I think more critical as how to create proper indexes, for snp and indels? Looks like with new version this is the only parameter left to the user? Thanks
jsun529 is offline   Reply With Quote
Old 10-16-2009, 08:43 AM   #18
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Quote:
Originally Posted by jsun529 View Post
Thank you very much for all your help, since I am very new to this BFAST program, The program runs pretty fast and with the new version it runs pretty smoothly. For my test run on solid, when I use the sam tools to call variants, the result was not what I expected, I think it comes to both side, one is the samtools, another is I think more critical as how to create proper indexes, for snp and indels? Looks like with new version this is the only parameter left to the user? Thanks
The same parameters are available, they just now default to the recommended settings for whole-genome human resequencing. The indexes are important, and depend on read length, error rates, and polymorphism rate (snp and indel). What read length, error rate, platform, and polymorphism rate are you considering?
nilshomer is offline   Reply With Quote
Old 11-12-2009, 07:33 AM   #19
colindaven
Senior Member
 
Location: Germany

Join Date: Oct 2008
Posts: 415
Default

Hi Nils,

I found Bfast challenging to get into as well. I saw that as someone who has been using Maq, Mosaik, Bowtie and so on for quite some time now.

There is plenty of detail in the manual but the program expects you to set parameters right from the off, without any suggestion of defaults. Now it may well be this is complicated and reference dependent, but to encourage users I would strongly recommend

a) a short sample reference, say a genomic island or 500k of a genome
b) a few reads which can be mapped onto this.

There is plenty of public data about now.

Then I would have a group of command lines in place of the "quick tutorial" links.

For users unfamiliar with installing from c source instructions could be provided as well.

I m sure potential users would appreciate the effort involved in doing this

Colin
colindaven is offline   Reply With Quote
Old 11-12-2009, 09:32 AM   #20
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Quote:
Originally Posted by colindaven View Post
Hi Nils,

I found Bfast challenging to get into as well. I saw that as someone who has been using Maq, Mosaik, Bowtie and so on for quite some time now.

There is plenty of detail in the manual but the program expects you to set parameters right from the off, without any suggestion of defaults. Now it may well be this is complicated and reference dependent, but to encourage users I would strongly recommend

a) a short sample reference, say a genomic island or 500k of a genome
b) a few reads which can be mapped onto this.

There is plenty of public data about now.

Then I would have a group of command lines in place of the "quick tutorial" links.

For users unfamiliar with installing from c source instructions could be provided as well.

I m sure potential users would appreciate the effort involved in doing this

Colin
Great suggestions! What was the last version you used? I say that because an overhaul was made between 0.5.x and 0.6.x to improve the user interface, manual, and set default/recommended parameters.

There is a quick tutorial at the end of the manual giving examples and the recommended parameters (the program now has defaults). Does this address the issue?
nilshomer is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:10 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO