Go Back   SEQanswers > Bioinformatics > Bioinformatics

Similar Threads
Thread Thread Starter Forum Replies Last Post
Building bowtie index with mirBase hairpin.fa file Gators RNA Sequencing 6 05-07-2015 12:43 PM
Building bfast index with btestindexes bre Bioinformatics 35 09-29-2011 05:34 AM
strange bowtie index building and mapping problem Gangcai Bioinformatics 0 08-04-2010 06:02 PM
tophat-bowtie building index repinementer Bioinformatics 1 07-17-2010 11:53 PM
BWA building index of full human (ensembl) fails inijman Bioinformatics 4 12-23-2009 06:00 AM

Thread Tools
Old 09-07-2011, 12:39 PM   #1
Location: USA

Join Date: Sep 2011
Posts: 11
Default Building bfast/btestindexes index for 15% divergence

I am trying to determine whether BFAST is appropriate for mapping 101 bp unpaired illumina reads to a reference with expected divergence of 15% ( has related info on my problem). Primarily, I'm hoping Nils can advise me on whether or not BFAST is suited for this type of job. Assuming that it is, below I describe what I have tried and where I am stuck.

I am at the stage where I am trying to build an index for BFAST, using btestindexes, but I am stuck at a lack of understanding of the output of btestindexes in "evaluate" mode. I understand the concept of spaced seeds, what I don't understand is how to interpret the output of btestindexes. I have looked at the other four threads here that mention btestindexes, and I have read the supplementary info from the BFAST paper.

Following the advice in section 6.1 of the bfast-book, I get k+2=21 for a genome of size 2.4G.

I then ran an index search:
btestindexes -A 0 -a 0 -S 10000 -s 10 -r 101 -M 20 -n 10 -l 21 -w 31
I used -M 20 because I think my data will contain some unique matches out to 20% divergence. I used -n 10 to get 10 masks, expecting that the evaluation run of btestindexes will indicate how many I need.

The resulting masks are

I then ran the evaluation:
btestindexes -A 1 -a 1 -S 10000 -r 101 -M 10 -f filename

Looking at the output of the evaluation is where I am stuck. Clearly it is a table with one row per mask and one column per mismatch count deom 0 to 20 (plus a column for a deletion but let's ignore that). There is also a column labeled "CE" which is always zero (perhaps "cumulative error"?). The values are undoubtedly probabilities, but probabilities of what? My initial assumption was that row m gave the probability, for the combination of masks 1 thru m, that a homologous read would be discovered using that set of seeds. This assumption is apparently wrong, because if I shuffle the list of masks, I don't get the same results in the final row.

Looking back at section 6.1, it advises that I "seelect the minimum number of masks sufficient to tolerate" my desired accuracy. But it gives no advice on how to interpret the output so as to make this decision.

I have also hunted through the supplement and the distribution to see if there is are any masks recommended for this type of divergence. The supplement states that there distribution includes mask sets for reads up to L=100. I have distribution 0.6.5a from sourceforge and I haven't been able to find them. (

At this point, I'm just hoping to get some reassurance that BFAST will be useful for this problem.
feederbing is offline   Reply With Quote
Old 09-07-2011, 04:02 PM   #2
Location: USA

Join Date: Sep 2011
Posts: 11

Originally Posted by feederbing View Post
I then ran the evaluation:
btestindexes -A 1 -a 1 -S 10000 -r 101 -M 10 -f filename
Retracing my steps, I see that should be -A 0 (nt space instead of color space). I've rerun the same masks now. Output is in a different format. Am trying to see if it makes more sense now.
feederbing is offline   Reply With Quote

bfast, btestindexes, homer, index

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

All times are GMT -8. The time now is 02:21 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO