SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
extract full fasta file for local blast hits Oyster Bioinformatics 8 02-16-2016 12:34 PM
BLAST - plus/minus strand - non coding region information niti217 Bioinformatics 4 08-29-2013 08:23 AM
How to change MapSplice output(txt format) into bed format? kanewong RNA Sequencing 3 04-17-2013 07:57 PM
Standalone BLAST output format question dacotahm Bioinformatics 0 04-27-2012 08:51 AM
Roche gsMapper output exon contigs rather than full-length sequence? sulicon Bioinformatics 0 02-28-2011 04:51 PM

Reply
 
Thread Tools
Old 02-17-2014, 05:35 PM   #1
SDPA_Pet
Senior Member
 
Location: US

Join Date: Apr 2013
Posts: 204
Default Which BLAST output format can give full taxonomony information

Hi, I was wondering which BLAST output format I should use to do this. I want to blast against nt database. I want the output not only gives me genus and species name, but also gives me phylum, order, family name.

Is it possible?
SDPA_Pet is offline   Reply With Quote
Old 02-17-2014, 06:41 PM   #2
jameslz
Member
 
Location: ShangHai

Join Date: Nov 2009
Posts: 20
Default

you can use the GI number to map the tax id and the get the complete taxonomic lineage information via taxdump.tar.Z. ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/
jameslz is offline   Reply With Quote
Old 02-17-2014, 07:09 PM   #3
SDPA_Pet
Senior Member
 
Location: US

Join Date: Apr 2013
Posts: 204
Default

Is it possible to map it when I blast it? or I need to get blast result first and then map to tax?

Which command I should use to map the tax id?
SDPA_Pet is offline   Reply With Quote
Old 02-17-2014, 07:57 PM   #4
jameslz
Member
 
Location: ShangHai

Join Date: Nov 2009
Posts: 20
Default

blast output format 6, 7, and 10 can be additionally configured to produce a custom format , it includes the Subject Taxonomy ID (staxids flag),
for example:
Quote:
blastx -query Human_kinase_rna-100.fasta -db ../ccds/CCDS_protein.20130430 -out Human_kinase-rna-blastx-m7.tbl -evalue 1 -outfmt “7 qseqid qlen slen qcovhsp sseqid staxids bitscore score evalue pident qstart qend sstart send” -num_alignments 10 -num_threads 8
I use a perl script to fetch the complete taxonomic lineage information of the blast staxids.
jameslz is offline   Reply With Quote
Old 02-18-2014, 10:34 AM   #5
SDPA_Pet
Senior Member
 
Location: US

Join Date: Apr 2013
Posts: 204
Default

Quote:
Originally Posted by jameslz View Post
blast output format 6, 7, and 10 can be additionally configured to produce a custom format , it includes the Subject Taxonomy ID (staxids flag),
for example:

I use a perl script to fetch the complete taxonomic lineage information of the blast staxids.
Do I need to put the complete taxonomic lineage information file (ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/) in the same folder with my database file? In your example, I didn't see your command call any taxonomic lineage information file.
SDPA_Pet is offline   Reply With Quote
Old 02-18-2014, 04:00 PM   #6
SDPA_Pet
Senior Member
 
Location: US

Join Date: Apr 2013
Posts: 204
Default

Hi use format 7 and use your command settings

I still do have the taxonomic lineage and I don't know how to use the taxonomic lineage file

Here is my output.

query id query length subject length % hsp coverage subject id subject tax ids bit score score evalue % identity q. start q. end s. start " s. end "
denovo0 266 297 85 gi|14718977|gb|AF352544.1| 163259 351 190 1.00E-93 95.15 39 265 1 219
denovo1 400 1178 100 gi|117572484|gb|DQ979290.1| 1211 706 382 0 98.5 1 400 670 1069
SDPA_Pet is offline   Reply With Quote
Old 02-18-2014, 08:37 PM   #7
jameslz
Member
 
Location: ShangHai

Join Date: Nov 2009
Posts: 20
Default

Usage:
Quote:
perl tax_trace.pl nodes.dmp names.dmp taxids.txt taxids_export.txt
input file:taxids.txt
format: seqId taxId
Quote:
gl00001 192
gl00002 2020
nodes.dmp names.dmp can be download from taxdump.tar.Z (ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/)


output file:taxids_export.txt
Quote:
gl00001 Azospirillum brasilense root|cellular organisms|Bacteria|Proteobacteria|Alphaproteobacteria|Rhodospirillales|Rhodospirillaceae|Azospirillum|Azospirillum brasilense no rank|no rank|superkingdom|phylum|class|order|family|genus|species
gl00002 Thermomonospora curvata root|cellular organisms|Bacteria|Actinobacteria|Actinobacteria|Actinobacteridae|Actinomycetales|Streptosporangineae|Thermomonosporaceae|Thermomonospora|Thermomonospora curvata no rank|no rank|superkingdom|phylum|class|subclass|order|suborder|family|genus|species
Attached Files
File Type: pl tax_trace.pl (1.6 KB, 248 views)
jameslz is offline   Reply With Quote
Old 02-19-2014, 06:44 AM   #8
SDPA_Pet
Senior Member
 
Location: US

Join Date: Apr 2013
Posts: 204
Default

Quote:
Originally Posted by jameslz View Post
Usage:


input file:taxids.txt
format: seqId taxId


nodes.dmp names.dmp can be download from taxdump.tar.Z (ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/)


output file:taxids_export.txt
Thank you. Can you also post your "taxids.txt" and "taxids_export.txt" file. I need to make sure my input file format is right and the output file is what I want. The attachment is my blastoutput and taxid. It didn't work.
Thank you.
Attached Files
File Type: txt blastout.txt (11.5 KB, 82 views)
File Type: txt taxids.txt (4.5 KB, 67 views)

Last edited by SDPA_Pet; 02-19-2014 at 07:32 AM.
SDPA_Pet is offline   Reply With Quote
Old 02-19-2014, 04:35 PM   #9
jameslz
Member
 
Location: ShangHai

Join Date: Nov 2009
Posts: 20
Default

You can parse the blast out file using:
Quote:
cut -f1,6 blastout.txt >taxids.txt
and then:
Quote:
perl tax_trace.pl nodes.dmp names.dmp taxids.txt taxids_export.txt
jameslz is offline   Reply With Quote
Old 02-19-2014, 04:47 PM   #10
SDPA_Pet
Senior Member
 
Location: US

Join Date: Apr 2013
Posts: 204
Default

Hi, what does this mean? "cut -f1,6 blastout.txt >taxids.txt"
SDPA_Pet is offline   Reply With Quote
Old 02-19-2014, 05:19 PM   #11
jameslz
Member
 
Location: ShangHai

Join Date: Nov 2009
Posts: 20
Default

Quote:
Originally Posted by SDPA_Pet View Post
Hi, what does this mean? "cut -f1,6 blastout.txt >taxids.txt"
Linux Command, Get the first and the sixth collumn (taxids)
jameslz is offline   Reply With Quote
Old 02-19-2014, 05:22 PM   #12
SDPA_Pet
Senior Member
 
Location: US

Join Date: Apr 2013
Posts: 204
Default

Oh, my taxon id is the 5th column. I have another attachment in my post. Is that correct? I can't get taxon information from that file.
SDPA_Pet is offline   Reply With Quote
Old 02-19-2014, 05:27 PM   #13
jameslz
Member
 
Location: ShangHai

Join Date: Nov 2009
Posts: 20
Default

Just fetch the "query id" and "tax ids" collumn .
jameslz is offline   Reply With Quote
Old 03-03-2014, 07:09 AM   #14
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,540
Default

Which BLAST output format can give full taxonomony information?

Not the XML output (for now), but the optional taxonomy columns in the tabular/comma separated output get you close:
http://blastedbio.blogspot.com/2014/...love-from.html
maubp is offline   Reply With Quote
Old 03-03-2014, 07:30 AM   #15
yzzhang
Member
 
Location: florida

Join Date: Jan 2013
Posts: 66
Default

I think blast+ has option to include taxonomy information. the "sscinames" option --- unique Subject Scientific Name(s)
yzzhang is offline   Reply With Quote
Old 03-03-2014, 07:36 AM   #16
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,540
Default

Quote:
Originally Posted by yzzhang View Post
I think blast+ has option to include taxonomy information. the "sscinames" option --- unique Subject Scientific Name(s)
Yes, but only in the tabular or comma separated output:
  • staxids means Subject Taxonomy ID(s), separated by a ';'
  • sscinames means Subject Scientific Name(s), separated by a ';'
  • scomnames means Subject Common Name(s), separated by a ';'
  • sblastnames means Subject Blast Name(s), separated by a ';' (in alphabetical order)
  • sskingdoms means Subject Super Kingdom(s), separated by a ';' (in alphabetical order)
  • stitle means Subject Title
  • salltitles means All Subject Title(s), separated by a '<>'

None of these give the full taxonomy lineage, which is what I think was being asked for here (see the earlier comments).

See also http://blastedbio.blogspot.co.uk/201...criptions.html
maubp is offline   Reply With Quote
Old 11-05-2015, 10:21 AM   #17
lilicano
Junior Member
 
Location: Raleigh, NC

Join Date: Feb 2015
Posts: 4
Default thanks for your tax_trace.pl script solution!

Thanks so much jamslz, your script works great. I dont know but the sskingdoms option had never worked for me.

you made my day with this post!

Best,

Lili

Quote:
Originally Posted by jameslz View Post
Usage:


input file:taxids.txt
format: seqId taxId


nodes.dmp names.dmp can be download from taxdump.tar.Z (ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/)


output file:taxids_export.txt
lilicano is offline   Reply With Quote
Old 01-27-2017, 08:49 AM   #18
sme.bug
Junior Member
 
Location: USA

Join Date: Apr 2016
Posts: 3
Default

Thanks for sharing! This was very helpful to me.
sme.bug is offline   Reply With Quote
Old 01-27-2017, 11:54 AM   #19
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,540
Default

I believe the fairly new BLAST XML v2 output also includes taxonomy information, if available.
maubp is offline   Reply With Quote
Old 01-27-2017, 04:32 PM   #20
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

Just thought I'd mention JGI's new Taxonomy server. You can look up organism names, gi numbers, or accession numbers to get either the NCBI taxid or complete lineage. These are all in the nt sequence headers.
Brian Bushnell is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 11:49 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO