Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • arkilis
    Senior Member
    • Jul 2013
    • 119

    Could I get the species name list from a blast database?

    I was thinking is there anyway to get the species name list from a blast db, I saw they are binary files.

    I found this:

    blastdbcmd -db [db name] -info, but actually I don't have the blastdbcmd installed on our server.

    Any thoughts? Cheers,

    ======================================================
    Here is what I tried on blastdbcmd, no species name found anyway.

    Database: Kog
    4,825 sequences; 2,336,026 total residues

    Date: Jun 2, 2009 10:24 AM Longest sequence: 5,019 residues

    Volumes:
    /usr/local/agrf/data/ncbi-blast/linked_databases/Kog
    Last edited by arkilis; 05-06-2014, 06:32 PM. Reason: more details
  • maubp
    Peter (Biopython etc)
    • Jul 2009
    • 1544

    #2
    BLAST databases can record the species (via an NCBI taxid) for each sequence, so there can be many species in a single database. You can use the out and outfmt options to blastdbcmd to get this information:

    Code:
     -outfmt <String>
       Output format, where the available format specifiers are:
       		%f means sequence in FASTA format
       		%s means sequence data (without defline)
       		%a means accession
       		%g means gi
       		%o means ordinal id (OID)
       		%i means sequence id
       		%t means sequence title
       		%l means sequence length
       		%h means sequence hash value
       		%T means taxid
       		%e means membership integer
       		%L means common taxonomic name
       		%S means scientific name
       		%K means taxonomic super kingdom
       		%P means PIG
       		%m means sequence masking data.
       		   Masking data will be displayed as a series of 'N-M' values
       		   separated by ';' or the word 'none' if none are available.
       	If '%f' is specified, all other format specifiers are ignored.
       	For every format except '%f', each line of output will correspond
       	to a sequence.
    e.g.

    Code:
    $ blastdbcmd -db pdbaa -outfmt "%a %T %K %S %L" -entry all | head
    1VS5_R 562 Bacteria Escherichia coli Escherichia coli
    1VS7_R 562 Bacteria Escherichia coli Escherichia coli
    3I1M_R 83333 Bacteria Escherichia coli K-12 Escherichia coli K-12
    3I1O_R 83333 Bacteria Escherichia coli K-12 Escherichia coli K-12
    3I1Q_R 83333 Bacteria Escherichia coli K-12 Escherichia coli K-12
    3I1S_R 83333 Bacteria Escherichia coli K-12 Escherichia coli K-12
    3I1Z_R 83333 Bacteria Escherichia coli K-12 Escherichia coli K-12
    3I21_R 83333 Bacteria Escherichia coli K-12 Escherichia coli K-12
    3KC4_R 83333 Bacteria Escherichia coli K-12 Escherichia coli K-12
    3OR9_R 83333 Bacteria Escherichia coli K-12 Escherichia coli K-12
    For this to work, in addition to the BLAST database, you also need the taxonomy information. Download ftp://ftp.ncbi.nlm.nih.gov/blast/db/taxdb.tar.gz then unzip this next to your other BLAST databases.
    Last edited by maubp; 05-07-2014, 02:41 AM. Reason: Adding example

    Comment

    Latest Articles

    Collapse

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by SEQadmin2, 06-09-2026, 11:58 AM
    0 responses
    21 views
    0 reactions
    Last Post SEQadmin2  
    Started by SEQadmin2, 06-05-2026, 10:09 AM
    0 responses
    27 views
    0 reactions
    Last Post SEQadmin2  
    Started by SEQadmin2, 06-04-2026, 08:59 AM
    0 responses
    38 views
    0 reactions
    Last Post SEQadmin2  
    Started by SEQadmin2, 06-02-2026, 12:03 PM
    0 responses
    61 views
    0 reactions
    Last Post SEQadmin2  
    Working...