Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating local blast+ database for mouse build 37

    I am trying to create a local database to blast the MGSCv37 database. I'm on windows 7 using the latest version of blast+ and I have downloaded the fasta files from ftp://ftp.ncbi.nih.gov/genomes/M_mus...VE/BUILD.37.1/ .

    When I try to create the database for an individual chromosome I end up with 1 very long sequence. I assume this happens because the FASTA file on the NCBI website isn't in the correct format. Is there anything I can do to fix this?

  • #2
    Hi Npatel,
    I don't understand what 'long' may refer to in this context, but it shouldn't be a surprise if you are worried about long as in length because chromosomes are general long anyway. Which of the files did your download?

    Comment


    • #3
      I was working with chromosome 18. I downloaded mm_ref_chr18.fa.gz.

      I then ran:

      makeblastdb -in ref_chr18.fa -dbtype nucl -out ref_chr18.db

      which gave me:
      Building a new DB, current time: 02é05é2013 02:02:57
      New DB name: ref_char18.db
      New DB title: ref_chr18.fa
      Sequence type: Nucleotide
      Keep Linkouts: T
      Keep Mbits: T
      Maximum file size: 1000000000B
      Adding sequences from FASTA; added 1 sequences in 1.59614 seconds.

      So i assume my database is made at this point.

      From here I am trying to blast the sequence CCGAGGGTGTGTGTCCCGCAAAGCC which I know for a fact is on chromosome 18.

      To do that I input:
      blastn -query sequences.txt -db ref_char18.db -out output.txt

      Where the sequences.txt file is a notepad txt file with only CCGAGGGTGTGTGTCCCGCAAAGCC in it.

      That gives me an output of:
      BLASTN 2.2.27+


      Reference: Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb
      Miller (2000), "A greedy algorithm for aligning DNA sequences", J
      Comput Biol 2000; 7(1-2):203-14.



      Database: ref_chr18.fa
      1 sequences; 90,772,031 total letters



      Query=
      Length=25


      ***** No hits found *****



      Lambda K H
      1.33 0.621 1.12

      Gapped
      Lambda K H
      1.28 0.460 0.850

      Effective search space used: 453860055


      Database: ref_chr18.fa
      Posted date: Feb 5, 2013 2:02 AM
      Number of letters in database: 90,772,031
      Number of sequences in database: 1



      Matrix: blastn matrix 1 -2
      Gap Penalties: Existence: 0, Extension: 2.5

      That's what I've gotten so far. Not sure where I've gone wrong. Hope this additional information will help you, help me. Thanks for replying!

      Comment


      • #4
        Hi Naptel,
        I decided to replicate your experiment on a linux machine which is what I have access to at the moment with the following commands:
        ../ncbi-blast-2.2.25+/bin/makeblastdb -in mm_ref_chr18.fa -dbtype nucl
        ../ncbi-blast-2.2.25+/bin/blastn -query query.fa -db mm_ref_chr18.fa -out query.out

        And indeed there is no hit. A hit exist only if threshold are satisfied. You may have to change default parameters for this to show up as a hit. I have not thought of which to change. Just to confirm that the string exit as a substring on chr18, I use BLAT like so:
        ~/blat/blat mm_ref_chr18.fa -t=dna query.fa -q=dna -out=blast query.blast

        Eureka! It shows up
        BLASTN 2.2.11 [blat]
        Reference: Kent, WJ. (2002) BLAT - The BLAST-like alignment tool
        Query= string
        (25 letters)
        Database: mm_ref_chr18.fa
        4 sequences; 87,601,031 total letters
        Searching.done
        Score E
        Sequences producing significant alignments: (bits) Value
        gi|149269870|ref|NT_039674.7|Mm18_39714_37 50 3e-06
        >gi|149269870|ref|NT_039674.7|Mm18_39714_37
        Length = 73639148
        Score = 50 bits (128), Expect = 3e-06
        Identities = 25/25 (100%)
        Strand = Plus / Plus
        Query: 1 ccgagggtgtgtgtcccgcaaagcc 25
        |||||||||||||||||||||||||
        Sbjct: 383032 ccgagggtgtgtgtcccgcaaagcc 383056
        Database: mm_ref_chr18.fa

        I am not suggesting here that BLAT is the option for your experiment. This is just a litmus test that the string exit on the chromosome and that same experiment, performed elsewhere gave the same result. So I think your experiment is fine, and only parameters need to be address if you want a desired effect.

        HTH

        Comment


        • #5
          Hi Apexy,

          You were right, it was a matter of changing the settings. I imported a saved search strategy from the NCBI web blast using the import_saved_strategy function and I am now getting the result I need. Thanks for your help!

          Comment


          • #6
            Hi Npatel,
            I'm curious how you manage to get this to work. Did you end up doing the run on the web or imported saved_strategy function to your local machine? I'm not familiar with this. Kindly clarify.

            Thanks

            Comment


            • #7
              Sorry for the delay. I ran one instance on the web. Saved the search strategy with the specifications i desired and saved it locally. I then imported these specifications to the stand alone blast command line using the import_saved_strategy function. Hope that helps!

              Comment


              • #8
                You might have more long changing the -task flag to blastn-short. Default is megablast, which isn't optimised for finding things that small.

                "blastn -help" to get the command line options.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Techniques and Challenges in Conservation Genomics
                  by seqadmin



                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                  Avian Conservation
                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                  03-08-2024, 10:41 AM
                • seqadmin
                  The Impact of AI in Genomic Medicine
                  by seqadmin



                  Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
                  02-26-2024, 02:07 PM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 03-14-2024, 06:13 AM
                0 responses
                32 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-08-2024, 08:03 AM
                0 responses
                71 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-07-2024, 08:13 AM
                0 responses
                80 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-06-2024, 09:51 AM
                0 responses
                68 views
                0 likes
                Last Post seqadmin  
                Working...
                X