Hi everybody, I am a bit desperate and hope someone can help me. I need to create a subset of the nr-database (for blastx) using a negative or positive gi list. There are several possibilities to do this (are there more??):
1) read the multifasta nr-file and remove some entries; however, the file is almost 8 GB in size and this takes a lot of time (and you have to create the database afterwards)
2) use blast+ which has a "negative_gi" option; however, another program's parser expects the old output format which seems to differ from blast+
3) formatdb has a -L option to create a subset of the database based on a file with a positive gi-list
4) blastall has a -l option to perform the search based on a file with a positive gi-list, which should produce the same result
Now, the possibilities 3) and 4) seem to be what I need. Unfortunately, they don't work. The problem looks like this:
I tried the latest blast version (2.2.25) as well as some other ones, on Fedora and on Ubuntu. Can someone reproduce this behavior?
1) read the multifasta nr-file and remove some entries; however, the file is almost 8 GB in size and this takes a lot of time (and you have to create the database afterwards)
2) use blast+ which has a "negative_gi" option; however, another program's parser expects the old output format which seems to differ from blast+
3) formatdb has a -L option to create a subset of the database based on a file with a positive gi-list
4) blastall has a -l option to perform the search based on a file with a positive gi-list, which should produce the same result
Now, the possibilities 3) and 4) seem to be what I need. Unfortunately, they don't work. The problem looks like this:
my@computer:/tmp/blast/bin$ ls
. blastclust drosoph.aa.phr fastacmd impala query.fa
.. blastpgp drosoph.aa.pin formatdb makemat rpsblast
bl2seq copymat drosoph.aa.psq formatdb.log megablast seedtop
blastall drosoph.aa drosoph.gi.txt formatrpsdb .ncbirc
my@computer:/tmp/blast/bin$ ./formatdb -i drosoph.aa -F drosoph.gi.txt -L subset
[formatdb] FATAL ERROR: Unable to find drosoph.gi.txt
my@computer:/tmp/blast/bin$ ./formatdb -i drosoph.aa -F ./drosoph.gi.txt -L subset
[formatdb] FATAL ERROR: Unable to find ./drosoph.gi.txt
my@computer:/tmp/blast/bin$ ./formatdb -i drosoph.aa -F /tmp/blast/bin/drosoph.gi.txt -L subset
[formatdb] FATAL ERROR: Unable to find /tmp/blast/bin/drosoph.gi.txt
my@computer:/tmp/blast/bin$ ./formatdb -i drosoph.aa -F whatever -L subset
[formatdb] FATAL ERROR: Unable to find whatever
my@computer:/tmp/blast/bin$ ./blastall -p blastx -d drosoph.aa -l drosoph.gi.txt -i query.fa
Searching[blastall] ERROR: query1[protein_gi:7290028]: Unable to open file drosoph.gi.txt
[blastall] WARNING: query1[protein_gi:7290028]: Intersection of gilist and BLAST database ID's empty
. blastclust drosoph.aa.phr fastacmd impala query.fa
.. blastpgp drosoph.aa.pin formatdb makemat rpsblast
bl2seq copymat drosoph.aa.psq formatdb.log megablast seedtop
blastall drosoph.aa drosoph.gi.txt formatrpsdb .ncbirc
my@computer:/tmp/blast/bin$ ./formatdb -i drosoph.aa -F drosoph.gi.txt -L subset
[formatdb] FATAL ERROR: Unable to find drosoph.gi.txt
my@computer:/tmp/blast/bin$ ./formatdb -i drosoph.aa -F ./drosoph.gi.txt -L subset
[formatdb] FATAL ERROR: Unable to find ./drosoph.gi.txt
my@computer:/tmp/blast/bin$ ./formatdb -i drosoph.aa -F /tmp/blast/bin/drosoph.gi.txt -L subset
[formatdb] FATAL ERROR: Unable to find /tmp/blast/bin/drosoph.gi.txt
my@computer:/tmp/blast/bin$ ./formatdb -i drosoph.aa -F whatever -L subset
[formatdb] FATAL ERROR: Unable to find whatever
my@computer:/tmp/blast/bin$ ./blastall -p blastx -d drosoph.aa -l drosoph.gi.txt -i query.fa
Searching[blastall] ERROR: query1[protein_gi:7290028]: Unable to open file drosoph.gi.txt
[blastall] WARNING: query1[protein_gi:7290028]: Intersection of gilist and BLAST database ID's empty
Comment