Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problems with blastdbcmd with entry ID contains space

    I am a rookie still in this area, this is the first thing I was requested to do: to extract a list of 100% matched reads from a self-generated database. However, the reads' names are not formatted in the regular way. I assume that's what I am encountering now.

    Below is a a list of my reads' names:
    this is part of my entry_batch input file -- ID.txt

    'M00344:4:000000000-A5RU9:1:2119:17016:21751 2:N:0:10'
    'M00344:4:000000000-A5RU9:1:2119:6591:19854 2:N:0:10'
    'M00344:4:000000000-A5RU9:1:2119:11445:14212 2:N:0:10'
    'M00344:4:000000000-A5RU9:1:2119:22676:7504 2:N:0:10'
    'M00344:4:000000000-A5RU9:1:2119:13009:4084 2:N:0:10'
    'M00344:4:000000000-A5RU9:1:2119:14454:4004 2:N:0:10'
    'M00344:4:000000000-A5RU9:1:2118:11021:19828 2:N:0:10'
    'M00344:4:000000000-A5RU9:1:2118:14025:16724 2:N:0:10'
    'M00344:4:000000000-A5RU9:1:2118:25864:15172 2:N:0:10'
    'M00344:4:000000000-A5RU9:1:2118:13018:13673 2:N:0:10'
    'M00344:4:000000000-A5RU9:1:2118:5760:11441 2:N:0:10'
    'M00344:4:000000000-A5RU9:1:2117:24461:19844 2:N:0:10'
    'M00344:4:000000000-A5RU9:1:2117:17300:18233 2:N:0:10'
    'M00344:4:000000000-A5RU9:1:2117:4137:17412 2:N:0:10'
    'M00344:4:000000000-A5RU9:1:2117:2789:15268 2:N:0:10'
    'M00344:4:000000000-A5RU9:1:2117:25164:15029 2:N:0:10'
    'M00344:4:000000000-A5RU9:1:2117:16039:7681 2:N:0:10'
    'M00344:4:000000000-A5RU9:1:2117:8713:5016 2:N:0:10'
    'M00344:4:000000000-A5RU9:1:2116:13795:20195 2:N:0:10'
    'M00344:4:000000000-A5RU9:1:2116:6977:17108 2:N:0:10'

    I used commands below:
    $ blastdbcmd -db seqs.fasta -dbtype nucl -entry_batch ID.txt -out miseq.read.fasta

    Error messages:

    Error: 'M00344:4:000000000-A5RU9:1:1104:13049:19775: OID not found
    Error: 'M00344:4:000000000-A5RU9:1:1104:13044:19758: OID not found
    Error: 'M00344:4:000000000-A5RU9:1:1104:13062:19751: OID not found
    Error: 'M00344:4:000000000-A5RU9:1:1104:11099:18531: OID not found
    Error: 'M00344:4:000000000-A5RU9:1:1104:11118:18521: OID not found
    Error: 'M00344:4:000000000-A5RU9:1:1104:17175:17791: OID not found
    Error: 'M00344:4:000000000-A5RU9:1:1104:17452:17720: OID not found
    Error: 'M00344:4:000000000-A5RU9:1:1104:16737:13751: OID not found
    Error: 'M00344:4:000000000-A5RU9:1:1104:16726:13733: OID not found
    Error: 'M00344:4:000000000-A5RU9:1:1104:19339:9296: OID not found
    Error: 'M00344:4:000000000-A5RU9:1:1104:17187:8943: OID not found
    Error: 'M00344:4:000000000-A5RU9:1:1104:14936:7801: OID not found
    Error: 'M00344:4:000000000-A5RU9:1:1104:21379:6845: OID not found
    Error: 'M00344:4:000000000-A5RU9:1:1104:23493:5643: OID not found
    Error: 'M00344:4:000000000-A5RU9:1:1104:26299:4746: OID not found
    Error: 'M00344:4:000000000-A5RU9:1:1104:23691:4053: OID not found
    Error: 'M00344:4:000000000-A5RU9:1:1104:15699:3766: OID not found
    Error: 'M00344:4:000000000-A5RU9:1:1103:18377:16637: OID not found
    Error: 'M00344:4:000000000-A5RU9:1:1103:16030:10176: OID not found


    I tried changing white space to \s, or add ' before and after each id names, but it didn't help at all. The blastdbcmd program recognizes anything before the space as the id names. Anyone has any idea how to do it? Or I am totally heading in the wrong direction?

    Eddi

  • #2
    Here's what I run to generate a BLAST database out of a FASTA file:
    Code:
    makeblastdb -in <input>.fasta -title 'Something Stringy' -taxid <org_taxid> -dbtype nucl -out <dbname_ID>
    It looks like you might be trying to query a database that doesn't exist (or hasn't been generated yet).

    However, if you have an NGS-amount of reads, it's probably better to use something other than BLAST for sequence matching. I'd recommend Bowtie2, but BWA seems to also be commonly used here.

    Here's the command I'd run to generate a Bowtie2 index:
    Code:
    bowtie2-build <input>.fasta <dbname_ID>

    Comment


    • #3
      Originally posted by yingeddi2008 View Post
      I tried changing white space to \s, or add ' before and after each id names, but it didn't help at all. The blastdbcmd program recognizes anything before the space as the id names. Anyone has any idea how to do it? Or I am totally heading in the wrong direction?
      Hi Eddi. This is by partly design - most tools consider everything up to the first space as the ID.

      However there are also some issues with the blastdbcmd, and the exact version of BLAST+ is important, see my blog post:
      The blastdbcmd tool in the BLAST+ suite (replacing fastacmd in the C 'legacy' BLAST suite) lets you do a lot of clever things with a BLAST d...

      Comment


      • #4
        blastdbcmd sucks

        Hi maubp,

        Thank you very much. I read through your blog. I think that's exactly what I have problem now. Then there is no way I can extract sequences from my own custom database?!

        For example, in my database, I have

        Code:
        >M00344:4:000000000-A5RU9:1:1101:17539:1069 1:N:0:14
        AAGAGTTTGATCATGGCTCAGGACGAACGCTGGCGGCGTGCCTAACACATGCAAGTCGAGCGATGAAACCCTTCGGGGTGGATTAGCGGCGGACGGGTGAGTAACACGTGGGCAACCTGCCTCAAAGAGGGGGATAGCCTCCCGAAAGGGAGATTAATACCGCATAATAAGTACTTCTCGCATGGGAAGAACTTTAAAGGAGCAATCCGCTTTGAGATGGGCCCGCGGCGCATTAGCTAGTTGGTGAGGTAAAGGCTCACAAAGGCGACGATGCGTAGCCGACCTGAGAGGGTGATCGGCG
        >M00344:4:000000000-A5RU9:1:1101:17556:1074 1:N:0:14
        AAGAGTTTGATCATGGCTCAGGACGAACGCTGGCGGCGTGCCTAACACATGCAAGTCGAGCGATGAAACCCTTCGGGGTGGATTAGCGGCGGACGGGTGAGTAACACGTGGGCAACCTGCCTCAAAGAGGGGGATAGCCTCCCGAAAGGGAGATTAATACCGCATAATAAGTACTTCTCGCATGGGAAGAACTTTAAAGGAGCAATCCGCTTTGAGATGGGCCCGCGGCGCATTAGCTAGTTGGTGAGGTAAAGGCTCACCAAGGCGACGATGCGTAGCCGAACTGAGAGGGGGATCGGC
        But when I run

        Code:
        $ blastdbcmd -db seq.fasta -entry all -outfmt "OID: %o     TITLE: %t"
        I got nothing back, I don't know whether there is an internal error or it won't recognize any IDs that are not in NCBI format. That is so unfortunate.


        Eddi

        Originally posted by maubp View Post
        Hi Eddi. This is by partly design - most tools consider everything up to the first space as the ID.

        However there are also some issues with the blastdbcmd, and the exact version of BLAST+ is important, see my blog post:
        http://blastedbio.blogspot.co.uk/201...cbi-blast.html

        Comment


        • #5
          Hi gringer,

          Thank you for your advice, I will try Bowtie2 or BWA. I have Illumina Miseq data here. Maybe I should try something else.

          Eddi

          Comment


          • #6
            Originally posted by yingeddi2008 View Post
            Thank you very much. I read through your blog. I think that's exactly what I have problem now.
            Please email them to check (and make sure they know people are having problems with blastdbcmd to help prioritise fixing this). Thanks!
            Originally posted by yingeddi2008 View Post
            Then there is no way I can extract sequences from my own custom database?!
            As long as you still have the FASTA file you made the BLAST database from, you can extract the records from the FASTA file. There are several tools for this (including support in scripting libraries like Biopython, BioPerl, BioRuby etc).

            Comment


            • #7
              Thank you.

              Hi maubp,

              Originally posted by maubp View Post
              Please email them to check (and make sure they know people are having problems with blastdbcmd to help prioritise fixing this). Thanks!
              Who should I email to? These NCBI guys?

              Originally posted by maubp View Post
              As long as you still have the FASTA file you made the BLAST database from, you can extract the records from the FASTA file. There are several tools for this (including support in scripting libraries like Biopython, BioPerl, BioRuby etc).
              I will try those then. Thank you.

              Eddi

              Comment


              • #8
                Originally posted by maubp View Post
                Please email them to check (and make sure they know people are having problems with blastdbcmd to help prioritise fixing this). Thanks!
                blast-help at ncbi.nlm.nih.gov as listed here:

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Strategies for Sequencing Challenging Samples
                  by seqadmin


                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                  03-22-2024, 06:39 AM
                • seqadmin
                  Techniques and Challenges in Conservation Genomics
                  by seqadmin



                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                  Avian Conservation
                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                  03-08-2024, 10:41 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 03-27-2024, 06:37 PM
                0 responses
                12 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-27-2024, 06:07 PM
                0 responses
                11 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-22-2024, 10:03 AM
                0 responses
                53 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-21-2024, 07:32 AM
                0 responses
                68 views
                0 likes
                Last Post seqadmin  
                Working...
                X