SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
BLAST+ creating custom blast database and using blast+ filtering features deniz Bioinformatics 3 07-07-2019 08:04 AM
BLAST Database error: No alias or index file found for nucleotide database [nt] chris_s Bioinformatics 14 03-24-2019 07:11 AM
format uniref90.xml to database for BLAST emanlee Bioinformatics 7 09-27-2013 07:44 AM
RDP database SDPA_Pet Bioinformatics 2 05-31-2013 12:53 PM
Advice on how to implement a keyword search (of a (blast) database) Kennels Bioinformatics 0 02-29-2012 04:14 PM

Reply
 
Thread Tools
Old 07-03-2014, 07:16 AM   #1
miguelangel
Member
 
Location: Madrid

Join Date: Jun 2012
Posts: 16
Default How can I format RDP database to be used in a BLAST search?

Hello there and thank you for your welcome!

I have to format RDP 16S bacterial database in fasta format (downladed from here:http://rdp.cme.msu.edu/misc/resources.jsp) to fit in a BLAST search, carried out with QIIME command 'assign_taxonomy.py'.

I need to create an index from the fasta file and I have read that this can be done by using 'formatdb' in the BLAST standsalone program, but when I try to do it I always get a message like this one:

[formatdb 2.2.22] ERROR: RDP_11_2_index.txt.nhrOutput
Blast-def-line-set.E.<title>
Invalid value(s) [9] in VisibleString [uncultured bacterium; DolOr_72351#Lineage=Root;rootrank;Bacteria;domain;unclassified_Bacteria; ...]

However, I get a .nhr file, but with data in this shape:

S000655540ó─0─0─ć─┼łuncultured actinobacterium; GASP-KA1W3_B01#Lineage=Root;rootrank;Bacteria;domain;"Actinobacteria";phylum;Actinobacteria;class;Acidimicrobidae;subclass;Acidimicrobiales;order;"Acidimicrobineae";suborder;Acidimicrobiaceae;family;Ilumatobacter;genus░─0─ć─░─


With unknown characters that does not allow to use it with BLAST option or in any other BLAST search.

Could any one help me with this issue? I am really stuck at this step...


Thanks a lot

MA
miguelangel is offline   Reply With Quote
Old 07-03-2014, 08:23 AM   #2
miguelangel
Member
 
Location: Madrid

Join Date: Jun 2012
Posts: 16
Default

I have tried to do the same but using 'makeblastdb' and now I get this different error:

Error: (803.7) Blast-def-line-set.E.title
Bad char [0x9] in string at byte 38
uncultured bacterium; L2Sp-13 Lineage=Root;rootrank;Bacteria;domain;"Actinobacteria";phylum;Actinobacteria;class;Acidimicrobidae;subclass;Acidimicrobiales;order;"Acidimicrobineae";suborder;Acidimicrobiaceae;family;Ilumatobacter;genus

And a .nhr almost equal to the one generated with 'formatdb'.

I am sure that the problem is in the format of the original fasta file, that looks like this entry:

>S000655540 uncultured bacterium; L2Sp-13 Lineage=Root;rootrank;Bacteria;domain;"Actinobacteria";phylum;Actinobacteria;class;Acidimicrobidae;subclass;Acidimicrobiales;order;"Acidimicrobineae";suborder;Acidimicrobiaceae;family;Ilumatobacter;genus
ggaatcttgcgcaatgggcgaaagcctgacgcagcaacgccgcgtgcgggatgaaggccttcgggctgtaaaccgctttc
agcaggaacgaaaatgacggtacctgcagaagaaggagcggccaactacgtgccagcagccgcggtgacacgtaggctcc
aagcgttgtccggatttattgggcgtaaagagctcgtaggcggttgagtaagtcgggtgtgaaaactctgggcttaaccc
ggagacgccatccgatactgctctgactagagttcaggaggggagtggggaattcctagtgtagcggtgaaatgcgcaga
tattaggaggaacaccggtggcgaaggcgccactctggactgaaactgacgctgaggagcgaaagcatgggtatcaaaca
ggattagataccctggtactccatgccgtaaacggtgggcactaggtgtgggttccaactaacgggatccgcgccgtcgc
taacgcattaagtgccccgcctggggagtacggtcgcaagactaaaactcaaatgaattgacgg


Any idea of how can I change this format to fit into the formatdb/makeblastdb commands?

Thanks again
miguelangel is offline   Reply With Quote
Old 07-03-2014, 08:58 AM   #3
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,978
Default

Problem is likely a tab character (between the S* and the rest of the header?) (based on the 0x9 code in your error). Also ID header line is probably wrapping on to second line (unless your copy/paste did that). You will likely need to reformat the headers.
GenoMax is offline   Reply With Quote
Old 07-03-2014, 11:47 AM   #4
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,978
Default

Using the "release11_2_Bacteria_unaligned.fa" file downloaded from the link you posted I was able to create the indexes using makeblastdb (v. 2.2.29+) without the errors you saw. I did

Code:
$ makeblastdb -dbtype nucl -in release11_2_Bacteria_unaligned.fa
I got a certain number of errors (below), which may or may not indicate a real problem http://www.acgt.me/blog/2014/5/15/fu...rom-ncbi-blast

Quote:
Error: (1431.1) FASTA-Reader: Warning: FASTA-Reader: First data line in seq is about 45% ambiguous nucleotides (shouldn't be over 40%)
GenoMax is offline   Reply With Quote
Old 07-04-2014, 12:27 AM   #5
miguelangel
Member
 
Location: Madrid

Join Date: Jun 2012
Posts: 16
Default

Thanks a lot

I got exactly the same errors, so I will try if with these new files are properly formated to run BLAST.
miguelangel is offline   Reply With Quote
Old 07-04-2014, 12:17 PM   #6
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,978
Default

I tried a test blast with a few sequences from the RDP fasta file. Worked without any problems.

If you do not need all the extra stuff in the fasta header ID you could remove most of it using the following command (leaving the S* ID's)

Code:
$ sed -e 's/>* .*$//' release11_2_Bacteria_unaligned.fa > release11_2_Bacteria_unaligned_truncated_header.fa
Then build the indexes from the new file.

Last edited by GenoMax; 07-04-2014 at 12:21 PM.
GenoMax is offline   Reply With Quote
Reply

Tags
blast, formatdb, qiime, rdp

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 09:57 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO