SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
BLAST+ creating custom blast database and using blast+ filtering features deniz Bioinformatics 3 07-07-2019 08:04 AM
BLAST help horvathdp Bioinformatics 1 12-14-2011 07:33 AM
BLAST+ vs BLASTALL (legacy BLAST) Symphysodon Bioinformatics 4 10-25-2011 02:52 PM
BLAST database error - when changing to new BLAST+ local program biobio Bioinformatics 4 06-15-2011 05:20 AM
blast AndyOD Bioinformatics 3 03-07-2010 05:59 PM

Reply
 
Thread Tools
Old 07-26-2010, 06:15 AM   #1
BioTalk
Member
 
Location: Kansas

Join Date: Feb 2010
Posts: 43
Default BLAST Help

Hello all,

I am trying to compare two .fasta files with many fasta sequences within both the files using Blast. But for some reason Blast is considering only first sequence from the fasta files. I am not sure what parameter I should use (for standalone BLAST) to compare all the sequences of both the files.

Please let me know if anyone knows how to do it.

Thank you!
BioTalk is offline   Reply With Quote
Old 07-26-2010, 06:46 AM   #2
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,542
Default

It should "just work".

Which flavour and version of standalone BLAST do you have (the NCBI "legacy" version in C, the new NCBI BLAST+ written in C++, or one of the 3rd party BLAST implementations)?

What BLAST command line are you using?

Have you checked your FASTA files are using the right new line characters for your OS? The command unix2dos or dos2unix can help here. Also try this to count the entries:

grep -c "^>" your_file.fasta
maubp is offline   Reply With Quote
Old 07-26-2010, 06:53 AM   #3
BioTalk
Member
 
Location: Kansas

Join Date: Feb 2010
Posts: 43
Default

Thank you for your prompt response!

I am using Blast for Linux 64 bit downloaded from:
http://blast.ncbi.nlm.nih.gov/Blast...._TYPE=Download

The command line I am using is: blastall -p blastn -F F -W 16 -i <inputfile.fa> -d <knownsequences.fa> -o <outputfile.fa>
BioTalk is offline   Reply With Quote
Old 07-26-2010, 06:57 AM   #4
BioTalk
Member
 
Location: Kansas

Join Date: Feb 2010
Posts: 43
Default

I am not sure about this question: Have you checked your FASTA files are using the right new line characters for your OS?
BioTalk is offline   Reply With Quote
Old 07-26-2010, 07:07 AM   #5
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,542
Default

Quote:
Originally Posted by BioTalk View Post
I am not sure about this question: Have you checked your FASTA files are using the right new line characters for your OS?
Unix/Linux/Mac OS X etc all use a LF character for a new line, while DOS/Windows uses the CR LF characters. This incompatibility is a common problem when dealing with files created on another OS.
http://en.wikipedia.org/wiki/Newline
maubp is offline   Reply With Quote
Old 07-26-2010, 07:09 AM   #6
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,542
Default

Quote:
Originally Posted by BioTalk View Post
Thank you for your prompt response!

I am using Blast for Linux 64 bit downloaded from:
http://blast.ncbi.nlm.nih.gov/Blast...._TYPE=Download

The command line I am using is: blastall -p blastn -F F -W 16 -i <inputfile.fa> -d <knownsequences.fa> -o <outputfile.fa>
If you are using "blastall" then you are using the old legacy BLAST executables based on the NCBI C Toolkit.

If you are using the new BLAST+ suite written in C++ then the command here would be "blastn" instead (and all the options have been renamed).

Last edited by maubp; 07-26-2010 at 07:28 AM. Reason: blastn vs blastp typo
maubp is offline   Reply With Quote
Old 07-26-2010, 07:13 AM   #7
BioTalk
Member
 
Location: Kansas

Join Date: Feb 2010
Posts: 43
Default

Quote:
Originally Posted by maubp View Post
Unix/Linux/Mac OS X etc all use a LF character for a new line, while DOS/Windows uses the CR LF characters. This incompatibility is a common problem when dealing with files created on another OS.
http://en.wikipedia.org/wiki/Newline
Oh okay, I think this is not a problem with the fasta files as they are all created in Linux and being used in Linux. Also, I tried to open the files and they both looks like a normal fasta file.

Please let me know if you know how should I deal with "Blank output file" problem!
BioTalk is offline   Reply With Quote
Old 07-26-2010, 07:17 AM   #8
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,542
Default

Quote:
Originally Posted by BioTalk View Post
Please let me know if you know how should I deal with "Blank output file" problem!
You never mentioned a "blank output file" until now. I thought you said you were having trouble getting BLAST to use multiple input query sequences.


Does blastall give any error messages?

Did you remember to create a BLAST database first using formatdb?

Last edited by maubp; 07-26-2010 at 07:21 AM.
maubp is offline   Reply With Quote
Old 07-26-2010, 07:25 AM   #9
BioTalk
Member
 
Location: Kansas

Join Date: Feb 2010
Posts: 43
Default

I am sorry for the confusion! It is giving almost blank output file with the following details in it instead of alignment result.

BLASTN 2.2.21 [Jun-14-2009]


Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs", Nucleic Acids Res. 25:3389-3402.

Query= Cluster_573384 1
(23 letters)

So, I thought Blast it not using multiple input query sequences.
BioTalk is offline   Reply With Quote
Old 07-26-2010, 07:30 AM   #10
BioTalk
Member
 
Location: Kansas

Join Date: Feb 2010
Posts: 43
Default

Quote:
Originally Posted by maubp View Post
If you are using "blastall" then you are using the old legacy BLAST executables based on the NCBI C Toolkit.

If you are using the new BLAST+ suite written in C++ then the command here would be "blastp" instead (and all the options have been renamed).
I tried the command for blastn as you have suggested and for that I got some error for indexing.
@biocomp:~/Desktop/Blast/bin$ ./blastn -word_size 16 -query <inputseq.fa> -db <sequencetocompare.fa> -out <outputfile.fa>
BLAST Database error: No alias or index file found for nucleotide database [/home/Desktop/sequencetocompare.fa] in search path [/home/Desktop/Blast/bin::]
BioTalk is offline   Reply With Quote
Old 07-26-2010, 07:30 AM   #11
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,542
Default

You should have one line starting "Query=" for each query sequence.

If that is a full file, it looks like BLAST is crashing or failing to finish.

If I recall correctly, the next output would have been information about the BLAST database - are you sure that is setup right using formatdb? For example, can you do single queries against this database?
maubp is offline   Reply With Quote
Old 07-26-2010, 07:36 AM   #12
rglover
rg
 
Location: uk

Join Date: Dec 2008
Posts: 51
Default

It could be that when you've formatted the blast database you didn't set it for nucleotide sequences - formatdb defaults to protein if it finds no command to specify nucleotide.
Try "formatdb -i <yourfasta.fasta> -p F"
The -p F turns protein off and nucleotide on
rglover is offline   Reply With Quote
Old 07-26-2010, 07:37 AM   #13
BioTalk
Member
 
Location: Kansas

Join Date: Feb 2010
Posts: 43
Default

I just tried inputting one query sequence with the command: blastall -p blastn -F F -W 16 -i <inputfile.fa> -d <knownsequences.fa> -o <outputfile.fa>

and I got almost similar output:

BLASTN 2.2.21 [Jun-14-2009]


Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs", Nucleic Acids Res. 25:3389-3402.

Query= 1-72342
(20 letters)
Do you think there is installation problem? or the command I am using are not correct?
BioTalk is offline   Reply With Quote
Old 07-26-2010, 07:40 AM   #14
BioTalk
Member
 
Location: Kansas

Join Date: Feb 2010
Posts: 43
Default

Quote:
Originally Posted by rglover View Post
It could be that when you've formatted the blast database you didn't set it for nucleotide sequences - formatdb defaults to protein if it finds no command to specify nucleotide.
Try "formatdb -i <yourfasta.fasta> -p F"
The -p F turns protein off and nucleotide on
I tried "formatdb -i <yourfasta.fasta> -p F" and then my previous command but it gave me the same output as before
BioTalk is offline   Reply With Quote
Old 07-26-2010, 07:41 AM   #15
rglover
rg
 
Location: uk

Join Date: Dec 2008
Posts: 51
Default

What are the names of the database files that formatdb is creating? Could you list them here? You could also try putting "-o T" on the end of your formatdb. Other than that I'm not really sure!
rglover is offline   Reply With Quote
Old 07-26-2010, 07:46 AM   #16
BioTalk
Member
 
Location: Kansas

Join Date: Feb 2010
Posts: 43
Default

Quote:
Originally Posted by rglover View Post
What are the names of the database files that formatdb is creating? Could you list them here? You could also try putting "-o T" on the end of your formatdb. Other than that I'm not really sure!
The list of file created by formatdb are:
.nhr, .nin, .nsq, .nsd, .nsi
BioTalk is offline   Reply With Quote
Old 07-26-2010, 08:10 AM   #17
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,542
Default

Quote:
Originally Posted by BioTalk View Post
The list of file created by formatdb are:
.nhr, .nin, .nsq, .nsd, .nsi
What are the full names? This is important for what you tell blast, e.g. for example.nin etc tell blast the database name is example, but for example.fas.nin etc the database name is example.fas instead.
maubp is offline   Reply With Quote
Old 07-26-2010, 08:14 AM   #18
rglover
rg
 
Location: uk

Join Date: Dec 2008
Posts: 51
Default

Hiya. Looking at the link you posted for where you downloaded your executables, you're definitely using BLAST+, so blastall (probably) won't work well.
If you try formatting your database with:

makeblastdb -in yourfasta.fasta -dbtype nucl

that should format your database for use with the Blast+ executables. I've managed to use formatdb/makeblastdb interchangeably between blast/blast+ in the past, but on Windows so you never know, it might error on Linux.
If you then use the "blastn -db <yourfasta.fasta> -word_size" etc convention for starting your blast it might work.
rglover is offline   Reply With Quote
Old 07-26-2010, 08:25 AM   #19
BioTalk
Member
 
Location: Kansas

Join Date: Feb 2010
Posts: 43
Default

Thank you all very much! Now, I am able to generate following type of Blast output file:
Which is a huge file because of the repetition of the information. Does anyone know how can we get it in any other format?

BLASTN 2.2.21 [Jun-14-2009]


Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs", Nucleic Acids Res. 25:3389-3402.

Query= 1-72342
(20 letters)

Database:/home//Desktop/mma.faa
15,632 sequences; 339,921 total letters

Searching..................................................done

***** No hits found ******


BLASTN 2.2.21 [Jun-14-2009]


Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs", Nucleic Acids Res. 25:3389-3402.

Query= 2-55421
(19 letters)

Database: /home//Desktop/mma.fa
15,632 sequences; 339,921 total letters

Searching..................................................done

***** No hits found ******


BLASTN 2.2.21 [Jun-14-2009]


Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs", Nucleic Acids Res. 25:3389-3402.

Query= 3-46574
(21 letters)

Database: /home/Desktop/mma.fa
15,632 sequences; 339,921 total letters

Searching..................................................done

***** No hits found ******


BLASTN 2.2.21 [Jun-14-2009]


Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs", Nucleic Acids Res. 25:3389-3402.

Query= 4-38013
(17 letters)

Database: /home//Desktop/mma.fa
15,632 sequences; 339,921 total letters

Searching..................................................done

***** No hits found ******
BioTalk is offline   Reply With Quote
Old 07-26-2010, 08:30 AM   #20
rglover
rg
 
Location: uk

Join Date: Dec 2008
Posts: 51
Default

if you have a look in the blast+ manual there's some formatting guidelines for tabulating the data output etc. Glad you got it working!
rglover is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 11:46 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO