Go Back   SEQanswers > Bioinformatics > Bioinformatics

Similar Threads
Thread Thread Starter Forum Replies Last Post
Custom local blast results detq182 Bioinformatics 3 07-07-2019 08:58 AM
How to create a BLAST database aliealexandre Bioinformatics 22 02-02-2015 04:34 AM
BLAST+ vs BLASTALL (legacy BLAST) Symphysodon Bioinformatics 4 10-25-2011 03:52 PM
BLAST database error - when changing to new BLAST+ local program biobio Bioinformatics 4 06-15-2011 06:20 AM
Database of BLAST CarlElit Bioinformatics 1 01-04-2010 07:23 AM

Thread Tools
Old 10-05-2010, 11:56 PM   #1
Junior Member
Location: Australia

Join Date: Sep 2008
Posts: 1
Default BLAST+ creating custom blast database and using blast+ filtering features


I would like to create a personal blast database of arbitrary sequences and be able to use all the features of BLAST+ to create subsets of databases based on identifiers or filter based on taxonomy.

It looks like the formatting of the definition line in the input FASTA files is crucial to assign proper sequence identifiers.

Using the General database identifier gnl|database|identifier or local identifier format lcl|identifier I wasn't able to use the blastdb_aliastool to create db subsets as it expects a GI list as input. I also didn't have any luck assigning taxonomy identifiers with the -taxid_map option of makeblastdb.

What is the recommended way to format FASTA definition lines in order to be able to use all the filtering features of the BLAST+ tools.

I was thinking of creating pseudo GenBank definitions for all my sequences: gi|<gi-number>|gb|<AccessionVersion>|<Accession>, where <gi-number> is a generated numeric value, and <Accession/Version> is my identifier. This works for the GI based filtering, however it seems like an ugly hack and I would prefer something more straight forward.

How is the taxid_map file formatted? I've tried <gi-number>, <gb|<AccessionVersion>>, or <gb|Accession> as the sequence identifier, however they don't seem to be assigned properly and blastdbcmd with -outfmt %T gives me zero for all entries.

Thanks for any help,


PS: I've posted this already in but as it's still beta traffic is a bit low and I'm probably more likely to get an answer here in the forum.
deniz is offline   Reply With Quote
Old 06-14-2012, 07:43 AM   #2
Location: New Jersey

Join Date: Aug 2010
Posts: 29

I was wondering if you found a solution for this? I have been having all kinds of problems creating subset blast databases.
nupurgupta is offline   Reply With Quote
Old 10-26-2012, 12:04 PM   #3
Junior Member
Location: Ottawa

Join Date: Oct 2012
Posts: 2

GI lists are essentially simple to create, if you have experience with bash, perl, or any programming language that you can use to format text, you may be able to automatically pull the GIs from these files and put them into a GI list. A GI list is simply a text file with one number per line, and each number is a GI. There may be some utilities that do this automatically (i.e. FASTA->GI list), though I don't know of them.

Alternatively, a nice way I've found for creating GI lists based on queries is to query NCBI for the data you want. You can do a search in NCBI, click send to on the top-right corner, and export it as a GI list. This is a nice, easy way of getting a GI list for blast subsets, though it's difficult to automate.
couttsbr is offline   Reply With Quote
Old 07-07-2019, 09:04 AM   #4
Location: Bhopal

Join Date: Jul 2019
Posts: 19

GI records are basically easy to make, in the event that you have involvement with slam, perl, or any programming language that you can use to design content, you might most likely naturally pull the GIs from these documents and place them into a GI rundown. A GI rundown is essentially a content record with one number for each line, and each number is a GI. There might be a few utilities that do this naturally (for example FASTA->GI list), however I don't know about them.

On the other hand, a pleasant way I've found for making GI records dependent on questions is to inquiry NCBI for the information you need. You can complete a pursuit in NCBI, click send to on the upper right corner, and fare it as a GI rundown. This is a decent, simple method for getting a GI rundown for impact subsets, however it's hard to robotize.
brojee is offline   Reply With Quote

blast, ncbi-blast

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

All times are GMT -8. The time now is 05:37 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO