SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
BLAST+ creating custom blast database and using blast+ filtering features deniz Bioinformatics 3 07-07-2019 09:04 AM
hello and BLAST batch question SophieP Introductions 10 02-18-2014 04:20 AM
blastdbcmd question darren.obbard Bioinformatics 2 05-17-2012 06:03 AM
GFF file formatting naluru Bioinformatics 5 03-29-2011 12:21 PM
Celera Formatting Input bulletproofpenguin Bioinformatics 3 03-04-2010 03:14 PM

Reply
 
Thread Tools
Old 08-10-2012, 03:13 PM   #1
ssully
Member
 
Location: NYC

Join Date: Aug 2010
Posts: 48
Default BLAST+ blastdbcmd batch file formatting

db definition lines look like:

>DS170424 | organism=Trichomonas_vaginalis_G3 | version=2007-01-11 | length=883
>DS170425 | organism=Trichomonas_vaginalis_G3 | version=2007-01-11 | length=883
>DS170426 | organism=Trichomonas_vaginalis_G3 | version=2007-01-11 | length=883

[db was created from fasta records using makeblastdb (with parse-seqids)]

Lines of batch input file (test.txt) to pull out subsequences look like:
DS113177 1-10 plus
DS113178 1-10 plus
DS113179 1-10 plus

[whitespace = tab (have also tried space, commas, and semicolon)]

command line query:
blastdbcmd -db TvaginalisGenomic_TrichDB-1.3.fasta -dbtype nucl -entry_batch test.txt

result is a series of 'OID not found" errors.
Error: DS113177 1-10 plus : OID not found
Error: DS113178 1-10 plus : OID not found
Error: DS113179 1-10 plus : OID not found
BLAST query/options error: Entry not found in database

Commandline query works if the batch file contains a list of JUST the sequence IDs (no range or strand info). In this case it returns the entire sequence for that ID. Query also works if I specify one seqID, range, strand e.g.:

blastdbcmd -db TvaginalisGenomic_TrichDB-1.3.fasta -dbtype nucl -entry DS113177 -range 1-10 -strand plus

So, what am I doing wrong? It seems to be something about line formatting in the input file. No guidance on this in the NCBI BLAST+ user manual.
ssully is offline   Reply With Quote
Old 08-10-2012, 05:00 PM   #2
Torst
Senior Member
 
Location: The University of Melbourne, AUSTRALIA

Join Date: Apr 2008
Posts: 275
Default

Quote:
Originally Posted by ssully View Post
Commandline query works if the batch file contains a list of JUST the sequence IDs (no range or strand info). In this case it returns the entire sequence for that ID. So, what am I doing wrong? It seems to be something about line formatting in the input file. No guidance on this in the NCBI BLAST+ user manual.
Maybe I'm missing something, but I think the -entry_batch option is only MEANT to take one ID per line. That does work for me, and for you too.

What made you think it could handle extra range/strand info? It doesn't say it does in the docs. And how would it know which parameters to apply your extra data to?
Torst is offline   Reply With Quote
Old 08-10-2012, 05:36 PM   #3
ssully
Member
 
Location: NYC

Join Date: Aug 2010
Posts: 48
Default

I would think pulling out subsequences by range and strand would be very common, such that columns two and three in an input file would be specified for range and strand. It didn't even occur to me that they would make the batch function so very limited as to ONLY work for sequence IDs.
ssully is offline   Reply With Quote
Old 08-10-2012, 06:03 PM   #4
Torst
Senior Member
 
Location: The University of Melbourne, AUSTRALIA

Join Date: Apr 2008
Posts: 275
Default

It's been that way since the batch mode was implemented for the old BLAST suite (via the "fastacmd" command). I can see your point about batch vs cmdline differences in capability.

It's not that limiting, as you can still do one at a time on the command line. So if you are able to create the 3 column batch file in "A B C" format, then you similarly should be able to create a batch file in "-entry A -range B -strand C" format and use a shell command to apply it:

% (for LINE in batch.txt ; do blastdbcmd -db mydb $LINE ; done) > output.fasta

Problem solved.
Torst is offline   Reply With Quote
Old 08-16-2012, 02:02 PM   #5
ssully
Member
 
Location: NYC

Join Date: Aug 2010
Posts: 48
Default

Running this on a Windows command line, btw, so I wonder if the syntax would be different. I get "LINE was unexpected at the time" when I try to run that command on a file "'temp.txt" I created with lines that look like:

-entry DS113177 -range 1-10 -strand plus
-entry DS113177 -range 558-1093 -strand plus
-entry DS113177 -range 1415-3062 -strand plus

so I replaced tabs with commas and tried this on the command line

for /F "tokens=*,delims=," %G IN temp.txt DO blastdbcmd -db [mydb] %G %H

error is now
"temp.txt was unexpected at the time"

Last edited by ssully; 08-16-2012 at 02:23 PM.
ssully is offline   Reply With Quote
Old 08-22-2012, 10:42 PM   #6
Torst
Senior Member
 
Location: The University of Melbourne, AUSTRALIA

Join Date: Apr 2008
Posts: 275
Default

Quote:
Originally Posted by ssully View Post
Running this on a Windows command line, btw, so I wonder if the syntax would be different.
I expect the syntax will be different! I am unable to assist with Windows/DOS batch files, sorry.
Torst is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 03:39 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO