SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
BLAST+ creating custom blast database and using blast+ filtering features deniz Bioinformatics 3 07-07-2019 08:04 AM
Creating subset BLAST database nupurgupta Bioinformatics 1 06-19-2012 03:37 AM
Mirroring/creating the database VIX_Z General 4 07-04-2011 12:25 AM
BLAST database error - when changing to new BLAST+ local program biobio Bioinformatics 4 06-15-2011 05:20 AM
Create local BLAST database SeqClark Bioinformatics 2 03-07-2011 01:17 AM

Reply
 
Thread Tools
Old 02-04-2013, 08:12 PM   #1
npatel
Junior Member
 
Location: Toronto, ON

Join Date: Feb 2013
Posts: 4
Default Creating local blast+ database for mouse build 37

I am trying to create a local database to blast the MGSCv37 database. I'm on windows 7 using the latest version of blast+ and I have downloaded the fasta files from ftp://ftp.ncbi.nih.gov/genomes/M_mus...VE/BUILD.37.1/ .

When I try to create the database for an individual chromosome I end up with 1 very long sequence. I assume this happens because the FASTA file on the NCBI website isn't in the correct format. Is there anything I can do to fix this?
npatel is offline   Reply With Quote
Old 02-04-2013, 09:51 PM   #2
Apexy
Member
 
Location: Africa

Join Date: Apr 2011
Posts: 62
Default

Hi Npatel,
I don't understand what 'long' may refer to in this context, but it shouldn't be a surprise if you are worried about long as in length because chromosomes are general long anyway. Which of the files did your download?
Apexy is offline   Reply With Quote
Old 02-04-2013, 10:12 PM   #3
npatel
Junior Member
 
Location: Toronto, ON

Join Date: Feb 2013
Posts: 4
Default

I was working with chromosome 18. I downloaded mm_ref_chr18.fa.gz.

I then ran:

makeblastdb -in ref_chr18.fa -dbtype nucl -out ref_chr18.db

which gave me:
Building a new DB, current time: 02052013 02:02:57
New DB name: ref_char18.db
New DB title: ref_chr18.fa
Sequence type: Nucleotide
Keep Linkouts: T
Keep Mbits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 1 sequences in 1.59614 seconds.

So i assume my database is made at this point.

From here I am trying to blast the sequence CCGAGGGTGTGTGTCCCGCAAAGCC which I know for a fact is on chromosome 18.

To do that I input:
blastn -query sequences.txt -db ref_char18.db -out output.txt

Where the sequences.txt file is a notepad txt file with only CCGAGGGTGTGTGTCCCGCAAAGCC in it.

That gives me an output of:
BLASTN 2.2.27+


Reference: Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb
Miller (2000), "A greedy algorithm for aligning DNA sequences", J
Comput Biol 2000; 7(1-2):203-14.



Database: ref_chr18.fa
1 sequences; 90,772,031 total letters



Query=
Length=25


***** No hits found *****



Lambda K H
1.33 0.621 1.12

Gapped
Lambda K H
1.28 0.460 0.850

Effective search space used: 453860055


Database: ref_chr18.fa
Posted date: Feb 5, 2013 2:02 AM
Number of letters in database: 90,772,031
Number of sequences in database: 1



Matrix: blastn matrix 1 -2
Gap Penalties: Existence: 0, Extension: 2.5

That's what I've gotten so far. Not sure where I've gone wrong. Hope this additional information will help you, help me. Thanks for replying!
npatel is offline   Reply With Quote
Old 02-04-2013, 11:26 PM   #4
Apexy
Member
 
Location: Africa

Join Date: Apr 2011
Posts: 62
Default

Hi Naptel,
I decided to replicate your experiment on a linux machine which is what I have access to at the moment with the following commands:
../ncbi-blast-2.2.25+/bin/makeblastdb -in mm_ref_chr18.fa -dbtype nucl
../ncbi-blast-2.2.25+/bin/blastn -query query.fa -db mm_ref_chr18.fa -out query.out

And indeed there is no hit. A hit exist only if threshold are satisfied. You may have to change default parameters for this to show up as a hit. I have not thought of which to change. Just to confirm that the string exit as a substring on chr18, I use BLAT like so:
~/blat/blat mm_ref_chr18.fa -t=dna query.fa -q=dna -out=blast query.blast

Eureka! It shows up
BLASTN 2.2.11 [blat]
Reference: Kent, WJ. (2002) BLAT - The BLAST-like alignment tool
Query= string
(25 letters)
Database: mm_ref_chr18.fa
4 sequences; 87,601,031 total letters
Searching.done
Score E
Sequences producing significant alignments: (bits) Value
gi|149269870|ref|NT_039674.7|Mm18_39714_37 50 3e-06
>gi|149269870|ref|NT_039674.7|Mm18_39714_37
Length = 73639148
Score = 50 bits (128), Expect = 3e-06
Identities = 25/25 (100%)
Strand = Plus / Plus
Query: 1 ccgagggtgtgtgtcccgcaaagcc 25
|||||||||||||||||||||||||
Sbjct: 383032 ccgagggtgtgtgtcccgcaaagcc 383056
Database: mm_ref_chr18.fa

I am not suggesting here that BLAT is the option for your experiment. This is just a litmus test that the string exit on the chromosome and that same experiment, performed elsewhere gave the same result. So I think your experiment is fine, and only parameters need to be address if you want a desired effect.

HTH
Apexy is offline   Reply With Quote
Old 02-05-2013, 07:44 AM   #5
npatel
Junior Member
 
Location: Toronto, ON

Join Date: Feb 2013
Posts: 4
Default

Hi Apexy,

You were right, it was a matter of changing the settings. I imported a saved search strategy from the NCBI web blast using the import_saved_strategy function and I am now getting the result I need. Thanks for your help!
npatel is offline   Reply With Quote
Old 02-07-2013, 07:01 AM   #6
Apexy
Member
 
Location: Africa

Join Date: Apr 2011
Posts: 62
Default

Hi Npatel,
I'm curious how you manage to get this to work. Did you end up doing the run on the web or imported saved_strategy function to your local machine? I'm not familiar with this. Kindly clarify.

Thanks
Apexy is offline   Reply With Quote
Old 02-28-2013, 07:16 PM   #7
npatel
Junior Member
 
Location: Toronto, ON

Join Date: Feb 2013
Posts: 4
Default

Sorry for the delay. I ran one instance on the web. Saved the search strategy with the specifications i desired and saved it locally. I then imported these specifications to the stand alone blast command line using the import_saved_strategy function. Hope that helps!
npatel is offline   Reply With Quote
Old 03-04-2013, 08:02 AM   #8
A.N.Other
Member
 
Location: London, UK

Join Date: Feb 2012
Posts: 25
Default

You might have more long changing the -task flag to blastn-short. Default is megablast, which isn't optimised for finding things that small.

"blastn -help" to get the command line options.
A.N.Other is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:12 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO