Hi all,
I'm new to using blast (and particularly the command line) so I had a few question/issues that I wasn't sure of the significance of.
I'm trying to build a blast db which is a subset of nr with only human records. I downloaded a GI list from the Entrez protein database and then ran
cat gi.txt | blastdbcmd -db nr_humans -entry_batch - -out human_sequences.txt
While running I am receiving a large number of errors about missing OIDs (e.g. "Error: 567316212: OID not found" ), I've gotten about 500 so far and the database isn't quite finished processing.
Is this something that is expected (since perhaps Entrez has more proteins than the nr database has) ? Or is this some sort of problem that I should be looking into more closely?
Long background: I'm planning on running delta-blast against more than 5,000 sequences so I'm trying to set up a local blast system. I've downloaded and installed BLAST+, and the nr database. I've run a few blastp queries against nr and they took an excessive amount of time, additionally I wanted only homo sapiens results, so I created an alias following the instructions here. This results in a much faster query however I wanted to see if rebuilding the database would yield and even faster result, so I followed the instructions here (to some extent, I already had my GI's from the first run).
I'm new to using blast (and particularly the command line) so I had a few question/issues that I wasn't sure of the significance of.
I'm trying to build a blast db which is a subset of nr with only human records. I downloaded a GI list from the Entrez protein database and then ran
cat gi.txt | blastdbcmd -db nr_humans -entry_batch - -out human_sequences.txt
While running I am receiving a large number of errors about missing OIDs (e.g. "Error: 567316212: OID not found" ), I've gotten about 500 so far and the database isn't quite finished processing.
Is this something that is expected (since perhaps Entrez has more proteins than the nr database has) ? Or is this some sort of problem that I should be looking into more closely?
Long background: I'm planning on running delta-blast against more than 5,000 sequences so I'm trying to set up a local blast system. I've downloaded and installed BLAST+, and the nr database. I've run a few blastp queries against nr and they took an excessive amount of time, additionally I wanted only homo sapiens results, so I created an alias following the instructions here. This results in a much faster query however I wanted to see if rebuilding the database would yield and even faster result, so I followed the instructions here (to some extent, I already had my GI's from the first run).
Comment