SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
BLAST+ creating custom blast database and using blast+ filtering features deniz Bioinformatics 3 07-07-2019 08:04 AM
stand-alone blast problem tujchl Bioinformatics 1 08-17-2011 09:58 PM
BLAST database error - when changing to new BLAST+ local program biobio Bioinformatics 4 06-15-2011 05:20 AM
Blast makeblastdb problem Lazy D-Sledge Bioinformatics 0 12-12-2010 10:01 AM
blast hit problem NicoBxl Bioinformatics 2 10-02-2010 06:55 PM

Reply
 
Thread Tools
Old 08-07-2011, 05:40 AM   #1
sammy07
Member
 
Location: austria

Join Date: Nov 2010
Posts: 20
Default Blast problem

Hi everybody, I am a bit desperate and hope someone can help me. I need to create a subset of the nr-database (for blastx) using a negative or positive gi list. There are several possibilities to do this (are there more??):

1) read the multifasta nr-file and remove some entries; however, the file is almost 8 GB in size and this takes a lot of time (and you have to create the database afterwards)
2) use blast+ which has a "negative_gi" option; however, another program's parser expects the old output format which seems to differ from blast+
3) formatdb has a -L option to create a subset of the database based on a file with a positive gi-list
4) blastall has a -l option to perform the search based on a file with a positive gi-list, which should produce the same result

Now, the possibilities 3) and 4) seem to be what I need. Unfortunately, they don't work. The problem looks like this:

Quote:
my@computer:/tmp/blast/bin$ ls
. blastclust drosoph.aa.phr fastacmd impala query.fa
.. blastpgp drosoph.aa.pin formatdb makemat rpsblast
bl2seq copymat drosoph.aa.psq formatdb.log megablast seedtop
blastall drosoph.aa drosoph.gi.txt formatrpsdb .ncbirc
my@computer:/tmp/blast/bin$ ./formatdb -i drosoph.aa -F drosoph.gi.txt -L subset
[formatdb] FATAL ERROR: Unable to find drosoph.gi.txt

my@computer:/tmp/blast/bin$ ./formatdb -i drosoph.aa -F ./drosoph.gi.txt -L subset
[formatdb] FATAL ERROR: Unable to find ./drosoph.gi.txt

my@computer:/tmp/blast/bin$ ./formatdb -i drosoph.aa -F /tmp/blast/bin/drosoph.gi.txt -L subset
[formatdb] FATAL ERROR: Unable to find /tmp/blast/bin/drosoph.gi.txt

my@computer:/tmp/blast/bin$ ./formatdb -i drosoph.aa -F whatever -L subset
[formatdb] FATAL ERROR: Unable to find whatever

my@computer:/tmp/blast/bin$ ./blastall -p blastx -d drosoph.aa -l drosoph.gi.txt -i query.fa

Searching[blastall] ERROR: query1[protein_gi:7290028]: Unable to open file drosoph.gi.txt
[blastall] WARNING: query1[protein_gi:7290028]: Intersection of gilist and BLAST database ID's empty
I tried the latest blast version (2.2.25) as well as some other ones, on Fedora and on Ubuntu. Can someone reproduce this behavior?
sammy07 is offline   Reply With Quote
Old 08-07-2011, 07:01 AM   #2
DZhang
Senior Member
 
Location: East Coast, US

Join Date: Jun 2010
Posts: 177
Default

Hi,

Both formatdb and blastall complained about drosoph.gi.txt. Can you double check the file? You may post "ls -l" output here.
DZhang is offline   Reply With Quote
Old 08-07-2011, 07:22 AM   #3
sammy07
Member
 
Location: austria

Join Date: Nov 2010
Posts: 20
Default

Quote:
my@computer:/tmp/blast/bin$ ls -l
total 94040
...
-rwxr-xr-x 1 me me 8 2011-08-07 15:12 drosoph.gi.txt
...
my@computer:/tmp/blast/bin$ head drosoph.gi.txt
7290028
The file contains only one line, which is a gi number. I tried setting permissions to 777 for this file, didn't help.
sammy07 is offline   Reply With Quote
Old 08-07-2011, 07:32 AM   #4
DZhang
Senior Member
 
Location: East Coast, US

Join Date: Jun 2010
Posts: 177
Default

1) This is a strange problem. The file belongs to me/me but the login is my. Do you know why. It should not contribute to your problem but I am just curious.
2) Can you successfully run blast+ in this case? As to the output format, blast+ allows you to customize output fields. You may pursue this as an alternative.
DZhang is offline   Reply With Quote
Old 08-07-2011, 09:30 AM   #5
sammy07
Member
 
Location: austria

Join Date: Nov 2010
Posts: 20
Default

1) Well, that's because I changed my true name and did it inconsistently.
2) Is it possible to change the blast+ output in a way that a parser written for the plain-text blast output can read it?? If would be really happy if this was the case. Otherwise I can't use it. (Anyway, I didn't try a blast+ run with a negative gi-list until now; will do tomorrow.)
sammy07 is offline   Reply With Quote
Old 08-07-2011, 10:49 AM   #6
DZhang
Senior Member
 
Location: East Coast, US

Join Date: Jun 2010
Posts: 177
Default

2) yes. You can specify the fields in tab-delimited format. Check the blast+ manual.
DZhang is offline   Reply With Quote
Old 08-07-2011, 01:35 PM   #7
sammy07
Member
 
Location: austria

Join Date: Nov 2010
Posts: 20
Default

Dear DZhang, thank you very much for your replies! But as far as I can see the program I use expects plain-text blast output, and not the tab-delimited format. And the plain-text blast+ output cannot be parsed.

So I would like to use the old blast version, as it offers the option I need (according to the documentation, the -l parameter for blastall or the -L parameter for formatdb). Can someone reproduce my problem and make any suggestions?
sammy07 is offline   Reply With Quote
Old 08-09-2011, 02:14 AM   #8
sammy07
Member
 
Location: austria

Join Date: Nov 2010
Posts: 20
Default

Ok, we got it. It almost drove me crazy. Finally, my colleague found out by using the - since today my very favorite - command "strace".

So, here is the solution: the .ncbirc file has to contain the following lines.

Quote:
[BLAST]
BLASTDB=/path/to/db
Then the environmental variable is properly set and

Quote:
formatdb -i drosoph.aa -F drosoph.gil -L subset
works like a charm.

Last edited by sammy07; 08-09-2011 at 05:39 AM.
sammy07 is offline   Reply With Quote
Old 08-09-2011, 04:19 AM   #9
DZhang
Senior Member
 
Location: East Coast, US

Join Date: Jun 2010
Posts: 177
Default

sammy07, thank you for sharing the solution.
DZhang is offline   Reply With Quote
Reply

Tags
bioinformatics, blast, formatdb

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 09:03 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO