SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
BLAST+ creating custom blast database and using blast+ filtering features deniz Bioinformatics 3 07-07-2019 09:04 AM
BLAST help horvathdp Bioinformatics 1 12-14-2011 08:33 AM
BLAST Help BioTalk Bioinformatics 32 12-09-2011 03:10 PM
BLAST database error - when changing to new BLAST+ local program biobio Bioinformatics 4 06-15-2011 06:20 AM
blast AndyOD Bioinformatics 3 03-07-2010 06:59 PM

Reply
 
Thread Tools
Old 10-18-2011, 03:24 AM   #1
Symphysodon
Junior Member
 
Location: Australia

Join Date: Mar 2011
Posts: 5
Default BLAST+ vs BLASTALL (legacy BLAST)

Hi all,

I have done a comparison of blastn as implemented in BLAST+ 2.2.25 (latest version) and BLASTALL (legacy BLAST) and observed non-trivial discrepancies in the results. In summary, BLAST+ gives more hits of the query to the subject/database, with lower (better) E-values. BLAST+ also often generates best hits (often with better E-values) to different sequences in the subject/database, compared to BLASTALL.

Of concern also is the fact that the default and .tsv outputs obtained with BLASTALL, blastn also show differences in the number of hits. No such discrepancy was seen in the BLAST+ results, regardless of which output format was specified.

If anyone has had similar observations or feedback on my observations/analyses, they would be much appreciated.





Below are examples of equivalent command lines I used for the different BLAST versions.

The command lines used for BLAST+, blastn were:

blastn -task blastn -db database -query query.fa -evalue 0.00001 -dust no -num_descriptions 1 -num_alignments 1 -num_threads 8 ľout output.blastn (default output)

blastn -task blastn -db database -query query.fa -evalue 0.00001 -dust no -num_descriptions 1 -num_alignments 1 -num_threads 8 -outfmt 10 qseqid qacc qlen qframe qstart qend qseq sseqid sacc slen sframe sstart send sseq pident nident length mismatch positive ppos gapopen gaps evalue bitscore score ľout output.blastn.csv (csv output)


The command lines used for BLASTALL, blastn were:

blastall -p blastn -i query.fa -d database -e 0.00001 -v 1 -b 1 -F F -o output.blastn -a 8 (default output)

blastall -p blastn -i query.fa -d database -e 0.00001 -v 1 -b 1 -F F -m 8 ľo output.blastn.tsv -a 2 (tsv output)


Thanks!
Symphysodon is offline   Reply With Quote
Old 10-18-2011, 10:20 AM   #2
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,543
Default

What version numbers? That can make a difference.

I don't use -num_descriptions and -num_alignments having found them behaving oddly (something at least partially addressed in a recent BLAST+ release). Have you tried with -max_target_seqs instead?
maubp is offline   Reply With Quote
Old 10-18-2011, 10:30 AM   #3
rskr
Senior Member
 
Location: Santa Fe, NM

Join Date: Oct 2010
Posts: 250
Default

I don't think you can look at the evalues and say they are lower therefore better, or returning more or fewer hits. The statistics aren't comparable without calibrating the Karlin-Altschul parameters. I am suspicious of blast+, because it is so fast I suspect that they tweaked the hash word size parameters in favor of speed rather than accuracy. You might want to compare the the actual parameters that are used for example, look at what parameters blastall runs blastn with then compare them with blast+, which is the equivalent of blastn. There is a way to get them to print the actual parameters, not just the parameters of the wrapper. My understanding is that there isn't much difference in the two but mostly if there was a difference it was the parametrization that the wrappers used.
rskr is offline   Reply With Quote
Old 10-19-2011, 04:30 AM   #4
Symphysodon
Junior Member
 
Location: Australia

Join Date: Mar 2011
Posts: 5
Default

Hi,

Thanks maubp and rskr for your feedback.

I am using blastn from blastall 2.2.23 and blastn from BLAST+ 2.2.25.

Perhaps if I specify for both to use the same hash word size, that might be a more equivalent comparison. Note that I have specified for both to have the dust filters turned OFF.

I'll try -max_target_seqs in BLAST+. Do you know what the equivalent parameter in BLASTALL is?

I specified -num_descriptions and -num_alignments for BLAST+ blastn as the legacy_blast.pl returned them as the equivalent of -b and -v in BLASTALL blastn.

If anyone can let me know how to get both applications to print out all the actual default parameters they used, that'd be great.


Cheers!
Symphysodon is offline   Reply With Quote
Old 10-25-2011, 03:52 PM   #5
mdimon
Member
 
Location: San Francisco

Join Date: Jan 2010
Posts: 10
Default

The difference in the number of hits between the default and csv formats is that the -b and -v parameters are only followed for the default format. In the csv format, the -b and -v parameters are ignored.

In BLAST+ this was remedied by the introduction of the -max_target_seqs parameter. The documentation suggests that for the default format, the -num_descriptions and -num_alignments options should be used but for XML and tabular output, the -max_target_seqs options should be used instead.

As far as seeing different results between old and new BLASTs, have you figured out what type of sequences lead to different results?
mdimon is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 02:05 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO