View Single Post
Old 08-10-2012, 03:13 PM   #1
ssully
Member
 
Location: NYC

Join Date: Aug 2010
Posts: 48
Default BLAST+ blastdbcmd batch file formatting

db definition lines look like:

>DS170424 | organism=Trichomonas_vaginalis_G3 | version=2007-01-11 | length=883
>DS170425 | organism=Trichomonas_vaginalis_G3 | version=2007-01-11 | length=883
>DS170426 | organism=Trichomonas_vaginalis_G3 | version=2007-01-11 | length=883

[db was created from fasta records using makeblastdb (with parse-seqids)]

Lines of batch input file (test.txt) to pull out subsequences look like:
DS113177 1-10 plus
DS113178 1-10 plus
DS113179 1-10 plus

[whitespace = tab (have also tried space, commas, and semicolon)]

command line query:
blastdbcmd -db TvaginalisGenomic_TrichDB-1.3.fasta -dbtype nucl -entry_batch test.txt

result is a series of 'OID not found" errors.
Error: DS113177 1-10 plus : OID not found
Error: DS113178 1-10 plus : OID not found
Error: DS113179 1-10 plus : OID not found
BLAST query/options error: Entry not found in database

Commandline query works if the batch file contains a list of JUST the sequence IDs (no range or strand info). In this case it returns the entire sequence for that ID. Query also works if I specify one seqID, range, strand e.g.:

blastdbcmd -db TvaginalisGenomic_TrichDB-1.3.fasta -dbtype nucl -entry DS113177 -range 1-10 -strand plus

So, what am I doing wrong? It seems to be something about line formatting in the input file. No guidance on this in the NCBI BLAST+ user manual.
ssully is offline   Reply With Quote