![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Doing local blast. No alias or index file found for nucleotide database? | hyates | Bioinformatics | 2 | 03-09-2015 04:22 AM |
How can I write a bash command to blast local nt database? | hyates | Bioinformatics | 5 | 03-05-2015 09:30 AM |
Creating local blast+ database for mouse build 37 | npatel | Bioinformatics | 7 | 03-04-2013 08:02 AM |
BLAST database error - when changing to new BLAST+ local program | biobio | Bioinformatics | 4 | 06-15-2011 05:20 AM |
Create local BLAST database | SeqClark | Bioinformatics | 2 | 03-07-2011 01:17 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Junior Member
Location: Oslo Join Date: Jan 2018
Posts: 1
|
![]()
Dear all,
I would like to be able to create my own custom local blast database, as this may be relevant in many different situations in bioinformatics. In this case, I hope to make a database containing all the latest versions of the bacterial genomes found in RefSeq. For starters, I have downloaded bacterial genomes (assemblies) from ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/bacteria, using information in the "assembly_summary.txt" to fetch the latest genome versions only. As a result, I now have almost 104,000 files (one per bacterial genome) containing one or multiple contigs. So far, so good. Each contig within a genome has a header containing the NCBI accession number ++, i.e.: Genome (file) 1: Code:
>NZ_NMDP01000102.1 Escherichia coli strain MOD1-EC6062 >NZ_NMDP01000103.1 Escherichia coli strain MOD1-EC6062 Code:
>NZ_NOBY01000102.1 Escherichia coli strain MOD1-EC5816 >NZ_NOBY01000115.1 Escherichia coli strain MOD1-EC5816 I now want to associate all genomes with a taxonomy (taxid?), as I understand this is important in many applications. For example, by blasting to my local database, I want to be able to quickly determine from which bacterium my blast query sequence originates. My questions are therefore: 1. How do I find the taxon ID for all the bacterial genomes in question? (Note: These are genomes from ../genomes/refseq/bacteria, not ..refseq/release/bacteria)? 2. How do I incorporate that information into my genome files and/or final local database? I suspect I first have to link up the NCBI accession number in the headers to a taxon ID in some way, but I'm not sure how to do that, or in what format it should be. All answers are highly appreciated! ![]() Kind regards, Even Sannes Riiser, PhD candidate, University of Oslo, Norway |
![]() |
![]() |
![]() |
Tags |
blast, local, refseq, taxid, taxonomy |
Thread Tools | |
|
|