SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Augustus command line help needed gomahony Bioinformatics 2 05-07-2012 12:02 PM
Want to use extract_genomic_dna in command line louis7781x Bioinformatics 2 12-04-2011 06:51 AM
SAMtools command line ??? Pawan Noel Bioinformatics 6 11-16-2010 11:42 AM
Tophat command line options ice RNA Sequencing 6 09-02-2010 04:25 PM
SIFT on the command line lamasmi Bioinformatics 2 08-17-2010 10:32 AM

Reply
 
Thread Tools
Old 03-09-2011, 01:57 PM   #1
ElMichael
Member
 
Location: UK

Join Date: Jun 2009
Posts: 31
Default blastall output: NCBI vs command line

Sorry for basic question. I wonder is there any options in blastall I can use in order to get output in the same format as we get it using blast online on the NCBI website?
Specifically, I want to get "Sequences producing significant alignments:" line that contains columns: Accession Description Max score Total score Query coverage E value Max ident


But using blastall from command line
e.g. blastall -p blastx -i input.fa -d /blast/db/nr -a 4 -b 5 -v 5 -e 1e-20 -o output.file
I get only Accession Description Score (bits) E value columns.


However, I want also to get Query coverage and Max ident columns.
I didn't find solution in the blastall manual. Perhaps, it depends on the parameter -m, but there are many options...
Thanks in advance!

UPD: -m 9 (tabular with comment lines (post-processed, sorted) view) produces almost what I want, but it gives only an Accession ID without description. And something like gi|66734174|gb|AAY53484.1| isn't very helpful.

Last edited by ElMichael; 03-09-2011 at 02:26 PM.
ElMichael is offline   Reply With Quote
Old 03-10-2011, 02:41 AM   #2
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,543
Default

Quote:
Originally Posted by ElMichael View Post
Sorry for basic question. I wonder is there any options in blastall I can use in order to get output in the same format as we get it using blast online on the NCBI website?
The NCBI website is now using BLAST+ rather than 'legacy' BLAST. So one thing to do would be to switch from using 'legacy' blastall binary to the blastx binary. Note that with BLAST+ you can request lots of extra columns in the tabular output - that may cover what you want.
maubp is offline   Reply With Quote
Old 03-10-2011, 04:55 AM   #3
colindaven
Senior Member
 
Location: Germany

Join Date: Oct 2008
Posts: 415
Default

I haven't tried Blast+ yet, but in the past we have used a combination of
blastx -m 8
and blastx (without the -m parameter)
to get coverage and protein hit names.
colindaven is offline   Reply With Quote
Old 03-10-2011, 04:56 AM   #4
colindaven
Senior Member
 
Location: Germany

Join Date: Oct 2008
Posts: 415
Default

I haven't tried Blast+ yet, but in the past we have used a combination of
blastx -m 8
and blastx (without the -m parameter)
to get coverage and protein hit names.
colindaven is offline   Reply With Quote
Old 03-10-2011, 08:01 AM   #5
ElMichael
Member
 
Location: UK

Join Date: Jun 2009
Posts: 31
Default

maubp, colindaven, thanks for your advice!
I tried the blast+, but, unfortunately, the number of supported format specifiers doesn't include Description of subject (I wonder why?!) and Query coverage (though it could be calculated, but again why?!).
I think, I have to use combination of two blastx runs as colindaven suggested.
(Though still hope that there is some unknown to me magic option that produces required format).
ElMichael is offline   Reply With Quote
Old 03-10-2011, 08:41 AM   #6
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,543
Default

Quote:
Originally Posted by ElMichael View Post
maubp, colindaven, thanks for your advice!
I tried the blast+, but, unfortunately, the number of supported format specifiers doesn't include Description of subject (I wonder why?!) and Query coverage (though it could be calculated, but again why?!).
I'd like to be able to have query length and subject length as output columns (which then makes either percentage coverage easily calculated).
Quote:
Originally Posted by ElMichael View Post
I think, I have to use combination of two blastx runs as colindaven suggested.
(Though still hope that there is some unknown to me magic option that produces required format).
You don't have to do that, run BLAST+ once with ASN.1 output, then use blast_formatter to turn this into any of the output formats (text, html, xml, tabular).
maubp is offline   Reply With Quote
Old 03-10-2011, 09:35 AM   #7
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,178
Default

This sounds like a job for Bio::SearchIO. However you have to be very comfortable already with BioPerl.
kmcarr is offline   Reply With Quote
Old 03-10-2011, 12:41 PM   #8
ElMichael
Member
 
Location: UK

Join Date: Jun 2009
Posts: 31
Default

Quote:
Originally Posted by maubp View Post
You don't have to do that, run BLAST+ once with ASN.1 output, then use blast_formatter to turn this into any of the output formats (text, html, xml, tabular).
Thanks for the hint.

kmcarr, that works terrific! Exactly, what I wanted. Thank you.
ElMichael is offline   Reply With Quote
Old 11-17-2011, 12:52 PM   #9
yifangt
Member
 
Location: Canada

Join Date: Feb 2011
Posts: 61
Default follow up blast+

Hello,
I met similar case to blast. Not familiar with the blast+ though. Anyway, I tried:
Code:
blastall -p blastx -i all-EST-cleaned.fasta -d my-db -m 9 -B 3 -b 10  -o blast-output.txt
and I got the result,
Code:
# Fields: Query id, Subject id, % identity, alignment length, mismatches, gap openings, q. start, q. end, s. start, s. end, e-value, bit score
1EB_RP_001_2009-03-27_0116=1EB_RP_001_A07_26MAR2009_032.seq	sp|Q6IBW4|CNDH2_HUMAN	34.69	49	32	0	360	214	258	306	1.0	32.7
1EB_RP_001_2009-03-27_0116=1EB_RP_001_A07_26MAR2009_032.seq	sp|Q5T655|CC147_HUMAN	20.73	82	65	0	260	15	39	120	1.3	32.3
1EB_RP_001_2009-03-27_0116=1EB_RP_001_A07_26MAR2009_032.seq	sp|Q9BRQ6|CHCH6_HUMAN	29.87	77	43	2	420	223	63	139	2.9	31.2
1EB_RP_001_2009-03-27_0116=1EB_RP_001_A07_26MAR2009_032.seq	sp|Q9Y3L3|3BP1_HUMAN	35.48	62	39	2	414	232	7	62	2.9	31.2
1EB_RP_001_2009-03-27_0116=1EB_RP_001_A07_26MAR2009_032.seq	sp|Q9LEM8|NAC2_CHLRE	38.46	39	24	0	387	271	1313	1351	3.8	30.8
1EB_RP_001_2009-03-27_0116=1EB_RP_001_A07_26MAR2009_032.seq	sp|P33424|POLN_HEVPA	39.34	61	37	2	423	241	1034	1090	5.0	30.4
1EB_RP_001_2009-03-27_0116=1EB_RP_001_A07_26MAR2009_032.seq	sp|Q81862|POLN_HEVCH	39.34	61	37	2	423	241	1034	1090	5.0	30.4
1EB_RP_001_2009-03-27_0116=1EB_RP_001_A07_26MAR2009_032.seq	sp|Q9UKP4|ATS7_HUMAN	39.47	38	23	0	423	310	1022	1059	5.0	30.4
1EB_RP_001_2009-03-27_0116=1EB_RP_001_A07_26MAR2009_032.seq	sp|Q2PC93|SSPO_CHICK	39.47	38	22	1	286	176	4073	4110	8.4	29.6
Now,
1) how can I add the annotation to the end for each subject entry,
2) how to reformat the subject entries as html link to NCBI if not using 1)?
Thanks!
yifangt is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 02:23 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO