SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
.tbl to .gbk format hasma Bioinformatics 0 03-28-2016 07:13 AM
download 1000 genomes (south asian) consensus vcfs Rabu Bioinformatics 2 10-17-2014 09:33 AM
download all available genomes by genus paraslonic Bioinformatics 5 07-16-2013 05:57 AM
1000 Genomes Data Download/Tabix strongside24 Bioinformatics 21 06-23-2013 12:46 PM
1. embl/gbk to FASTA conversion; 2. 16s RNA to be found in a embl/gbk file ashuchawla Bioinformatics 3 05-16-2012 07:00 AM

Reply
 
Thread Tools
Old 06-10-2020, 08:21 AM   #1
NotGraysonFord
Junior Member
 
Location: Somewhere in UK

Join Date: Jun 2020
Posts: 3
Default Download all bacteria genomes in .gbk format

Hello all,

Sorry if this has been posted before, but after searching the internet I seem to be at a dead end.

I currently would like to download all bacterial genomes in .gbk or .embl format to use with a program called MultiGeneBlast.

The current GUI in the program to create your own database doesnt work.
I have tried https://github.com/kblin/ncbi-genome...ster/README.md website and it does not work either and crashes most of the time.

I've manually downloaded the gbgbct.seq.gz files individually from https://ftp.ncbi.nlm.nih.gov/genbank/ and changed the extensions of the files to .gbk and when I use the database creation tool within MultiGeneBlast it still doesnt work.

If I go to https://www.ncbi.nlm.nih.gov/assembl...ll%5Bfilter%5D and choose completed genomes of bacteria it doesnt give me the .gbk format only .gbff and doesnt give me an embl option.

Maybe it is alot easier and simpler than I'm finding it, but I am not the most computer proficient.

If anybody would like to extend some help or advice it would be muchly appreciated. All the best!
NotGraysonFord is offline   Reply With Quote
Old 06-10-2020, 08:28 AM   #2
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,059
Default

There is no GUI for Kai Blin's NCBI Genome downloader tool so I am not sure what is crashing for you. That is a command line tool. Install and use on command line. There are thousands of bacterial genomes so be careful with the downloads.

gbff file format is actually GenBank format.

NCBI has assembly summary report files available. Here is the file that you can parse for bacteria. You can look at the relevant field to get the FTP download path for particular genome. GenBank format files will be inside that folder.

Code:
#   See ftp://ftp.ncbi.nlm.nih.gov/genomes/README_assembly_summary.txt for a description of the columns in this file.
# assembly_accession    bioproject      biosample       wgs_master      refseq_category taxid   species_taxid   organism_
name    infraspecific_name      isolate version_status  assembly_level  release_type    genome_rep      seq_rel_date    a
sm_name submitter       gbrs_paired_asm paired_asm_comp ftp_path        excluded_from_refseq    relation_to_type_material
GCA_003023565.1 PRJNA351262     SAMN06020791    PXSA00000000.1  na      1919191 1919191 Halobacteriales archaeon QS_9_68_17              QS_9_68_17      latest  Scaffold        Major   Full    2018/03/27      ASM302356v1     None    na      na       ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/003/023/565/GCA_003023565.1_ASM302356v1      derived from metagenome
GCA_003023575.1 PRJNA351262     SAMN06020787    PXRW00000000.1  na      1919185 1919185 Halobacteriales archaeon QS_7_69_60              QS_7_69_60      latest  Scaffold        Major   Full    2018/03/27      ASM302357v1     None    na      na       ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/003/023/575/GCA_003023575.1_ASM302357v1      derived from metagenome

Last edited by GenoMax; 06-10-2020 at 08:33 AM.
GenoMax is offline   Reply With Quote
Old 06-10-2020, 08:46 AM   #3
NotGraysonFord
Junior Member
 
Location: Somewhere in UK

Join Date: Jun 2020
Posts: 3
Default

Thank you for replying so fast.

this is the error I had when I used Kai's ncbi genome downloader and I didn't understand it to say the least :P

Traceback (most recent call last):
File "c:\users\g\appdata\local\programs\python\python38-32\lib\multiprocessing\pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "c:\users\g\appdata\local\programs\python\python38-32\lib\site-packages\ncbi_genome_download\core.py", line 385, in downloadjob_creator_caller
return create_downloadjob(*args)
File "c:\users\g\appdata\local\programs\python\python38-32\lib\site-packages\ncbi_genome_download\core.py", line 397, in create_downloadjob
checksums = grab_checksums_file(entry)
File "c:\users\g\appdata\local\programs\python\python38-32\lib\site-packages\ncbi_genome_download\core.py", line 465, in grab_checksums_file
req = requests.get(full_url)
File "c:\users\g\appdata\local\programs\python\python38-32\lib\site-packages\requests\api.py", line 76, in get
return request('get', url, params=params, **kwargs)
File "c:\users\g\appdata\local\programs\python\python38-32\lib\site-packages\requests\api.py", line 61, in request
return session.request(method=method, url=url, **kwargs)
File "c:\users\g\appdata\local\programs\python\python38-32\lib\site-packages\requests\sessions.py", line 516, in request
prep = self.prepare_request(req)
File "c:\users\g\appdata\local\programs\python\python38-32\lib\site-packages\requests\sessions.py", line 449, in prepare_request
p.prepare(
File "c:\users\g\appdata\local\programs\python\python38-32\lib\site-packages\requests\models.py", line 314, in prepare
self.prepare_url(url, params)
File "c:\users\g\appdata\local\programs\python\python38-32\lib\site-packages\requests\models.py", line 388, in prepare_url
raise MissingSchema(error)
requests.exceptions.MissingSchema: Invalid URL 'na/md5checksums.txt': No schema supplied. Perhaps you meant http://na/md5checksums.txt?

I will try and do the download the ncbi website in gbff format too and see if the multigeneblast program accepts it
NotGraysonFord is offline   Reply With Quote
Old 06-10-2020, 09:07 AM   #4
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,059
Default

Looks like you are trying to do this on Windows. That is a different can of worms. Programs like Kai Blin's are written to work on linux/macOS. While some may work on Windows there is no guarantee that they will.

If you can use WSL2 on Windows 10 which will allow you to use linux on your machine and use most of the programs seamlessly.
GenoMax is offline   Reply With Quote
Old 06-10-2020, 09:35 AM   #5
NotGraysonFord
Junior Member
 
Location: Somewhere in UK

Join Date: Jun 2020
Posts: 3
Default

Okay thank you, I'll give that a go.

I appreciate you responding so fast, thanks again
NotGraysonFord is offline   Reply With Quote
Reply

Tags
file, format, genbank, multigeneblast, refseq

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:05 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO