SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Building bowtie index with mirBase hairpin.fa file Gators RNA Sequencing 6 05-07-2015 12:43 PM
Building bfast/btestindexes index for 15% divergence feederbing Bioinformatics 1 09-07-2011 04:02 PM
strange bowtie index building and mapping problem Gangcai Bioinformatics 0 08-04-2010 06:02 PM
tophat-bowtie building index repinementer Bioinformatics 1 07-17-2010 11:53 PM
BWA building index of full human (ensembl) fails inijman Bioinformatics 4 12-23-2009 06:00 AM

Reply
 
Thread Tools
Old 05-30-2010, 08:06 PM   #21
KevinLam
Senior Member
 
Location: SEA

Join Date: Nov 2009
Posts: 203
Default

the supplementary word doc is quite helpful in understanding bfast

to quote from it
in response to WHAT
Instead of indexing the location of k-mer words in the genome, we generalize this concept to indexing the start positions of k-letter substrings that are obtained from a mask, which is slid along the reference genome at one base shifts to generate the index data. This is similar to spaced seeds introduced previously in homology search programs . For example, the letter selection mask suggested by the bit-pattern 0011001010, directly applied to the sequence "AAGATTACAG", selects the letter key "GAAA".

In reponse to why
it is a way of indexing the reference genome to speed up lookups.

if you are asking why do we need more than one....

Greater accuracy is to be achieved by using multiple indexes based on different masks to define the index keys, but keeping the number of letters in the key, k, large for uniqueness. Avoid using shorter keys (reducing k) to obtain accuracy, which results in exponential growth in spurious candidate locations.
KevinLam is offline   Reply With Quote
Old 05-30-2010, 08:42 PM   #22
elinor
Junior Member
 
Location: Santa Cruz, CA

Join Date: Feb 2010
Posts: 2
Default

Thank you, Kevin!

I have one more question for the forum. Is there a way to decipher the 4th row of a fastq file? By 4th row, I mean the fastq version of the phred-like values found in a *.qual file. I would like to parse the 4th row but I don't understand what each ` or ! or ? means other than that it is some mysterious code for quality value digits. Thank you for your reply!
elinor is offline   Reply With Quote
Old 05-31-2010, 03:19 PM   #23
HESmith
Senior Member
 
Location: Bethesda MD

Join Date: Oct 2009
Posts: 509
Default

Quote:
Originally Posted by elinor View Post
I have question regarding the "masks" in the index. What are the masks and why do we need them? Thanks for the response!
At the risk of sounding cranky, I would encourage you to read the user manual, published paper, and/or supplemental material before asking this question. Nils does a nice job of explaining his program, which is not always the case for software developers, so take advantage of these resources.

-Harold
HESmith is offline   Reply With Quote
Old 05-31-2010, 05:20 PM   #24
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Quote:
Originally Posted by elinor View Post
Thank you, Kevin!

I have one more question for the forum. Is there a way to decipher the 4th row of a fastq file? By 4th row, I mean the fastq version of the phred-like values found in a *.qual file. I would like to parse the 4th row but I don't understand what each ` or ! or ? means other than that it is some mysterious code for quality value digits. Thank you for your reply!
Please search around for a few minutes before asking such questions. You can find their decoding in many threads on this site. Use the search function here, or in google add the string "site:seqanswers.com" to your search. Wikipedia also has a FASTQ entry.
nilshomer is offline   Reply With Quote
Old 07-04-2010, 02:29 PM   #25
abattenhouse
Junior Member
 
Location: Austin TX

Join Date: Jan 2010
Posts: 3
Default

Nils - I've been reading through all your documentation and it's really great -- thanks! However, I'm struggling to figure out how to build indices for sacCer, genome size ~ 12Mb. It would seem like a key size of 16 would be about right. I've looked at btestindexes, which is what I think should be used to generate an appropriate index set. Your post says they should be generated with btestindexes using "recommended" settings but I can't figure out what those should be. Suggestions?
abattenhouse is offline   Reply With Quote
Old 07-04-2010, 06:51 PM   #26
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Quote:
Originally Posted by abattenhouse View Post
Nils - I've been reading through all your documentation and it's really great -- thanks! However, I'm struggling to figure out how to build indices for sacCer, genome size ~ 12Mb. It would seem like a key size of 16 would be about right. I've looked at btestindexes, which is what I think should be used to generate an appropriate index set. Your post says they should be generated with btestindexes using "recommended" settings but I can't figure out what those should be. Suggestions?
What read length(s) do you have? I would suggest to use the recommended indexes in the manual as a first pass. If you don't like the results, you can then build custom indexes. The vast majority are satisfied with the recommended indexes.
nilshomer is offline   Reply With Quote
Old 07-05-2010, 07:31 AM   #27
abattenhouse
Junior Member
 
Location: Austin TX

Join Date: Jan 2010
Posts: 3
Default

Nils - These are 36 bp reads. I have both SOLiD and Illumina data. Should I use the "25 bp" SOLiD mask set from your SOM? Also, it would be nice to know how to use btestindexes so alternative index sets could be generated and compared. Thanks, Anna
abattenhouse is offline   Reply With Quote
Old 07-05-2010, 08:18 AM   #28
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Quote:
Originally Posted by abattenhouse View Post
Nils - These are 36 bp reads. I have both SOLiD and Illumina data. Should I use the "25 bp" SOLiD mask set from your SOM? Also, it would be nice to know how to use btestindexes so alternative index sets could be generated and compared. Thanks, Anna
Yes, use the 25bp indexes. I am not sure if there are other posts here where I go into detail about "btestindexes". If not, let me know and I will try to explain it.
nilshomer is offline   Reply With Quote
Old 07-06-2010, 01:45 PM   #29
abattenhouse
Junior Member
 
Location: Austin TX

Join Date: Jan 2010
Posts: 3
Default

Nils - I've just tried the 25bp SOLiD mask set and I'm getting a lot of false negatives, as determined by alignments showing up in a deleted gene. These reads don't show up in a BWA alignment of the same data. So I think I need a set of BFAST masks with a larger key size. I'm pretty sure I've looked everywhere for more info on btestindexes with no luck (altho I've been reading so much stuff the last few days my head is about to explode Thanks, Anna
abattenhouse is offline   Reply With Quote
Old 07-06-2010, 07:22 PM   #30
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Quote:
Originally Posted by abattenhouse View Post
Nils - I've just tried the 25bp SOLiD mask set and I'm getting a lot of false negatives, as determined by alignments showing up in a deleted gene. These reads don't show up in a BWA alignment of the same data. So I think I need a set of BFAST masks with a larger key size. I'm pretty sure I've looked everywhere for more info on btestindexes with no luck (altho I've been reading so much stuff the last few days my head is about to explode Thanks, Anna
Try the 50bp ones... if that doesn't map many reads, then I'll see what I can do.
nilshomer is offline   Reply With Quote
Old 03-16-2011, 03:08 PM   #31
Sheila
Member
 
Location: Europe

Join Date: Jun 2009
Posts: 17
Default masks for SOLiD paired-ends reads 35+50nt

Quote:
Originally Posted by nilshomer View Post
See section 7.1.
Hi there,
I've gone through the Bfast manual and I found a suggestion for masks assuming reads of at least 50nt in section 7.1.2. The problem is that my paired reads do not have the same length (SOLiDv4, paired-end libraries not mate-pair libraries, they have different orientation), one has 35nt in length and the other one has 50nt. Do I need to use different masks in order to index the reference for reads with 35nt or it's OK if I use those masks suggested in the manual for 50nt reads also for 35nt reads?

Thanks in advance.

Best regards,

S.
Sheila is offline   Reply With Quote
Old 03-16-2011, 04:33 PM   #32
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Quote:
Originally Posted by Sheila View Post
Hi there,
I've gone through the Bfast manual and I found a suggestion for masks assuming reads of at least 50nt in section 7.1.2. The problem is that my paired reads do not have the same length (SOLiDv4, paired-end libraries not mate-pair libraries, they have different orientation), one has 35nt in length and the other one has 50nt. Do I need to use different masks in order to index the reference for reads with 35nt or it's OK if I use those masks suggested in the manual for 50nt reads also for 35nt reads?

Thanks in advance.

Best regards,

S.
Try the bfast+bwa code. BFAST will map the 50bp read, and BWA will map the 35bp read, then BFAST will merge the two. There are many discussions here about how to run bfast+bwa.
nilshomer is offline   Reply With Quote
Old 09-28-2011, 12:25 PM   #33
Ramon Vidal
Junior Member
 
Location: Campinas

Join Date: Aug 2008
Posts: 4
Default

Quote:
Originally Posted by elinor View Post
I have question regarding the "masks" in the index. What are the masks and why do we need them? Thanks for the response!
This is a question that intrigues me for some weeks. Do you already have the answer for that? Anyone?

I have created the 10 mask for human genome (hg18). To run bfast with all 10 masks I only point to the prefix of the index right? Or somehow I have to run the bfast match for each one of the masks?

Thank you
Ramon Vidal is offline   Reply With Quote
Old 09-28-2011, 12:34 PM   #34
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Quote:
Originally Posted by Ramon Vidal View Post
This is a question that intrigues me for some weeks. Do you already have the answer for that? Anyone?

I have created the 10 mask for human genome (hg18). To run bfast with all 10 masks I only point to the prefix of the index right? Or somehow I have to run the bfast match for each one of the masks?

Thank you
That's right, it will find all of them for you. You can use "-i" to specify a subset, for example "-i 2-3,6,9-10".
nilshomer is offline   Reply With Quote
Old 09-29-2011, 12:26 AM   #35
pengchy
Senior Member
 
Location: China

Join Date: Feb 2009
Posts: 116
Default

Hi,
I have found a potential error in the Table 1 of the bfast main manuscript (http://www.plosone.org/article/info:...l.pone.0007767). The M1 mask only has 18 "1", but the key size and key width both are 22. Is this a typo or my misunderstanding?

Code:
M1=111111111111111111 (k = 22, w= 22) M M M M
One more question:
I am working for a genome size two times the human and pair-end 100bp reads. Can I use the recommended mask that used for human pe50 data, or need I regenerate the mask set?

based on the function:
Code:
f <- function(L=50,k=18,G=2*3.2*1e+9,A=4){
  return((L-k+1)*(G/(A^k)))
}
f(L=100,k=22,G=2*7.2*1e+9,A=4) ## 0.06466507
The f = 0.06466507, which is less than 1. Is it mean that I can use the key size 22 and the recommended mask set? Thanks in advance.

Last edited by pengchy; 09-29-2011 at 12:36 AM.
pengchy is offline   Reply With Quote
Old 09-29-2011, 05:34 AM   #36
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

See the recommended settings in the manual! Good catch on the typo.
nilshomer is offline   Reply With Quote
Reply

Tags
bfast, btestindexes, indexes

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:20 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO