Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • StandAlone Blast+ weird response

    Dear SEQanswerers,

    I have installed and configured the NCBI Blast+ to run local blast searches against databases. I configured the PATH and the BLASTDB link environmental variables for the default database to which I'm going to create to search against. I am facing a weird situation however, the program doesn't find the pattern that I am testing it with even though this pattern is actually picked right out of the standalone database that I have created via

    Code:
    makeblastdb.exe -in "sequences.fasta" -type nucl -title "testing" -out testingDB
    .

    This code gave out three files with different extensions however, tried to run blastn.exe against all these extensions provided to the -db option from blastn.exe, the query file (in fastA format) and the output file and format, I also ran it against testingDB as is and yet in all these cases I get the output as "no hits found".

    I don't know what it is that I am doing wrong and reading through the documentation and looking for standalone Blast+ examples to compare against has given no fruition, so can anyone pitch in by giving an explanation or giving a quick primer/example on how to get this up and running? since the NCBI link on the Blast+ itself is not of so much help..

  • #2
    When you say that you're getting the output as "no hits found", does it mean that you're getting a results file but that you're getting "No hits found" within the file for each sequence that you're trying to search against the database?
    If so, I would check the contents of your original sequences.fasta file (are you searching the sequences.fasta file against a database of itself?).

    Comment


    • #3
      Exactly, I am getting the results file but getting "no Hits Found" for each sequence, while my database consists of nucleic sequences for 8 species, I have a query file having an orphan query sequence that is actually copied from testDB.fasta itself! and this is why I feel the output is weird !

      I'll really appreciate if someone could tell me that they have used this new StandAloneBlast+ from NCBI and that they provide me an actual example to how they're doing it...

      In my case, I have prepared the environmental variables like I mentioned, created my database like

      Code:
      makeblastdb -in testDB.fasta -dbtype nucl -title myDatabase -out myTestDatabase
      
      #Here is the log from the above operation
      
      Building a new DB, current time: 08/04/2010 21:23:03
      New DB name:   myTestDatabase
      New DB title:  myDatabase
      Sequence type: Nucleotide
      Keep Linkouts: T
      Keep MBits: T
      Maximum file size: 1073741824B
      Adding sequences from FASTA; added 1 sequences in 0.000745346 seconds.
      This creates myTestDatabase.nhr, myTestDatabase.nin , myTestDatabase.nsq
      then, using the database name and the sequence in the query.fasta file (which is brought from myTestDatabase btw)
      Code:
      blastn -db myTestDatabase -query query.fasta -out result.txt -outfmt 2
      The file result.txt would have the following
      Code:
      BLASTN 2.2.23+
      Reference: Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb
      Miller (2000), "A greedy algorithm for aligning DNA sequences", J
      Comput Biol 2000; 7(1-2):203-14.
      Database: myDatabase
                 1 sequences; 12,621 total letters
      Query=  query
      Length=25
      ***** No hits found *****
      Lambda     K      H
          1.33    0.621     1.12 
      
      Gapped
      Lambda     K      H
          1.28    0.460    0.850 
      
      Effective search space used: 176540
      
      Query=  query2
      Length=25
      ***** No hits found *****
      Lambda     K      H
          1.33    0.621     1.12 
      
      Gapped
      Lambda     K      H
          1.28    0.460    0.850 
      Effective search space used: 176540
        Database: myDatabase
          Posted date:  Aug 4, 2010  9:23 PM
        Number of letters in database: 12,621
        Number of sequences in database:  1
      Matrix: blastn matrix 1 -2
      Gap Penalties: Existence: 0, Extension: 0

      Comment


      • #4
        Ahh, here you go. This had me a few times too before I realised what was wrong.

        With blast+, if any of your query sequences are likely to be <~50bp, you'll need to add the following to your parameters (otherwise you get exactly what you've described):

        Code:
        -task blastn-short
        I almost guarantee if you add that, you'll get your hits
        Last edited by rglover; 08-06-2010, 02:46 AM.

        Comment


        • #5
          This is an AHA moment in here... Thank you so much Rachel, you've made my day..


          I really take it on the terse documentation and I was thinking all along that there's an algorithm embedded to detect these short queries in much the same way as the BLAST from NCBI website does by optimizing the query for short sequences...

          I thought I read it all and I was on the verge of writing my own program to simulate substring finding and returning of hits/targets positions...

          Comment


          • #6
            No problem - I'm sure I've seen most Blast+ errors in the last couple of years. The manual isn't exactly the most user-friendly of documents You really have to dig around for the information you need to get going!

            Comment


            • #7
              BLAST is like a big complex program in nature so for a version of it to have only a few pages for a documentation can make it extremely user unfriendly.

              I have discovered the use of
              Code:
              blastn -subject someSeqFile -out ... etc
              in blastn options to search a file without having to explicitly create a database, that's a nice thing...

              I am looking forward to more discussions with you and other SeqAnswerers and I'd want to frequent this lovely forum more often...

              Comment


              • #8
                Originally posted by BioSlayer View Post
                I am facing a weird situation however, the program doesn't find the pattern that I am testing it with even though this pattern is actually picked right out of the standalone database that I have created
                What is the pattern? If it is low-complexity sequence (eg. AAAATATAAAAAAA) then blastn.exe may filter it out automatically (mask it off due to too many hits).

                Comment


                • #9
                  Do you think blastn.exe can actually provide for masking of low-complexity regions without invoking the feature upon running the program? By now I know that we have masking options to dust or apply softmasks to the search parameters, but why I am suspicious that blastn.exe will not mask low-complexity regions automatically is because it did not provide for query optimization upon sending a short length query like the one I got trouble with.

                  My query was less than 50 chars long and until rglover confidently suggested I try
                  Code:
                  task short-blastn
                  I would have gone on tail-chasing which means that blastn could not detect the length and act accordingly in the same manner the blast from NCBI does...

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Essential Discoveries and Tools in Epitranscriptomics
                    by seqadmin




                    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                    04-22-2024, 07:01 AM
                  • seqadmin
                    Current Approaches to Protein Sequencing
                    by seqadmin


                    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                    04-04-2024, 04:25 PM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, Yesterday, 08:47 AM
                  0 responses
                  12 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-11-2024, 12:08 PM
                  0 responses
                  60 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 10:19 PM
                  0 responses
                  59 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 09:21 AM
                  0 responses
                  54 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X