Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How can I format RDP database to be used in a BLAST search?

    Hello there and thank you for your welcome!

    I have to format RDP 16S bacterial database in fasta format (downladed from here:http://rdp.cme.msu.edu/misc/resources.jsp) to fit in a BLAST search, carried out with QIIME command 'assign_taxonomy.py'.

    I need to create an index from the fasta file and I have read that this can be done by using 'formatdb' in the BLAST standsalone program, but when I try to do it I always get a message like this one:

    [formatdb 2.2.22] ERROR: RDP_11_2_index.txt.nhrOutput
    Blast-def-line-set.E.<title>
    Invalid value(s) [9] in VisibleString [uncultured bacterium; DolOr_72351#Lineage=Root;rootrank;Bacteria;domain;unclassified_Bacteria; ...]

    However, I get a .nhr file, but with data in this shape:

    S000655540¢Ä0Ä0ĆÄňuncultured actinobacterium; GASP-KA1W3_B01#Lineage=Root;rootrank;Bacteria;domain;"Actinobacteria";phylum;Actinobacteria;class;Acidimicrobidae;subclass;Acidimicrobiales;order;"Acidimicrobineae";suborder;Acidimicrobiaceae;family;Ilumatobacter;genus°Ä0ĆİÄ


    With unknown characters that does not allow to use it with BLAST option or in any other BLAST search.

    Could any one help me with this issue? I am really stuck at this step...


    Thanks a lot

    MA

  • #2
    I have tried to do the same but using 'makeblastdb' and now I get this different error:

    Error: (803.7) Blast-def-line-set.E.title
    Bad char [0x9] in string at byte 38
    uncultured bacterium; L2Sp-13 Lineage=Root;rootrank;Bacteria;domain;"Actinobacteria";phylum;Actinobacteria;class;Acidimicrobidae;subclass;Acidimicrobiales;order;"Acidimicrobineae";suborder;Acidimicrobiaceae;family;Ilumatobacter;genus

    And a .nhr almost equal to the one generated with 'formatdb'.

    I am sure that the problem is in the format of the original fasta file, that looks like this entry:

    >S000655540 uncultured bacterium; L2Sp-13 Lineage=Root;rootrank;Bacteria;domain;"Actinobacteria";phylum;Actinobacteria;class;Acidimicrobidae;subclass;Acidimicrobiales;order;"Acidimicrobineae";suborder;Acidimicrobiaceae;family;Ilumatobacter;genus
    ggaatcttgcgcaatgggcgaaagcctgacgcagcaacgccgcgtgcgggatgaaggccttcgggctgtaaaccgctttc
    agcaggaacgaaaatgacggtacctgcagaagaaggagcggccaactacgtgccagcagccgcggtgacacgtaggctcc
    aagcgttgtccggatttattgggcgtaaagagctcgtaggcggttgagtaagtcgggtgtgaaaactctgggcttaaccc
    ggagacgccatccgatactgctctgactagagttcaggaggggagtggggaattcctagtgtagcggtgaaatgcgcaga
    tattaggaggaacaccggtggcgaaggcgccactctggactgaaactgacgctgaggagcgaaagcatgggtatcaaaca
    ggattagataccctggtactccatgccgtaaacggtgggcactaggtgtgggttccaactaacgggatccgcgccgtcgc
    taacgcattaagtgccccgcctggggagtacggtcgcaagactaaaactcaaatgaattgacgg


    Any idea of how can I change this format to fit into the formatdb/makeblastdb commands?

    Thanks again

    Comment


    • #3
      Problem is likely a tab character (between the S* and the rest of the header?) (based on the 0x9 code in your error). Also ID header line is probably wrapping on to second line (unless your copy/paste did that). You will likely need to reformat the headers.

      Comment


      • #4
        Using the "release11_2_Bacteria_unaligned.fa" file downloaded from the link you posted I was able to create the indexes using makeblastdb (v. 2.2.29+) without the errors you saw. I did

        Code:
        $ makeblastdb -dbtype nucl -in release11_2_Bacteria_unaligned.fa
        I got a certain number of errors (below), which may or may not indicate a real problem http://www.acgt.me/blog/2014/5/15/fu...rom-ncbi-blast

        Error: (1431.1) FASTA-Reader: Warning: FASTA-Reader: First data line in seq is about 45% ambiguous nucleotides (shouldn't be over 40%)

        Comment


        • #5
          Thanks a lot

          I got exactly the same errors, so I will try if with these new files are properly formated to run BLAST.

          Comment


          • #6
            I tried a test blast with a few sequences from the RDP fasta file. Worked without any problems.

            If you do not need all the extra stuff in the fasta header ID you could remove most of it using the following command (leaving the S* ID's)

            Code:
            $ sed -e 's/>* .*$//' release11_2_Bacteria_unaligned.fa > release11_2_Bacteria_unaligned_truncated_header.fa
            Then build the indexes from the new file.
            Last edited by GenoMax; 07-04-2014, 12:21 PM.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM
            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            22 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            24 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 09:21 AM
            0 responses
            20 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-04-2024, 09:00 AM
            0 responses
            52 views
            0 likes
            Last Post seqadmin  
            Working...
            X