SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   Bioinformatics (http://seqanswers.com/forums/forumdisplay.php?f=18)
-   -   BLAST+ vs BLASTALL (legacy BLAST) (http://seqanswers.com/forums/showthread.php?t=14829)

Symphysodon 10-18-2011 02:24 AM

BLAST+ vs BLASTALL (legacy BLAST)
 
Hi all,

I have done a comparison of blastn as implemented in BLAST+ 2.2.25 (latest version) and BLASTALL (legacy BLAST) and observed non-trivial discrepancies in the results. In summary, BLAST+ gives more hits of the query to the subject/database, with lower (better) E-values. BLAST+ also often generates best hits (often with better E-values) to different sequences in the subject/database, compared to BLASTALL.

Of concern also is the fact that the default and .tsv outputs obtained with BLASTALL, blastn also show differences in the number of hits. No such discrepancy was seen in the BLAST+ results, regardless of which output format was specified.

If anyone has had similar observations or feedback on my observations/analyses, they would be much appreciated.





Below are examples of equivalent command lines I used for the different BLAST versions.

The command lines used for BLAST+, blastn were:

blastn -task blastn -db database -query query.fa -evalue 0.00001 -dust no -num_descriptions 1 -num_alignments 1 -num_threads 8 ľout output.blastn (default output)

blastn -task blastn -db database -query query.fa -evalue 0.00001 -dust no -num_descriptions 1 -num_alignments 1 -num_threads 8 -outfmt 10 qseqid qacc qlen qframe qstart qend qseq sseqid sacc slen sframe sstart send sseq pident nident length mismatch positive ppos gapopen gaps evalue bitscore score ľout output.blastn.csv (csv output)


The command lines used for BLASTALL, blastn were:

blastall -p blastn -i query.fa -d database -e 0.00001 -v 1 -b 1 -F F -o output.blastn -a 8 (default output)

blastall -p blastn -i query.fa -d database -e 0.00001 -v 1 -b 1 -F F -m 8 ľo output.blastn.tsv -a 2 (tsv output)


Thanks!

maubp 10-18-2011 09:20 AM

What version numbers? That can make a difference.

I don't use -num_descriptions and -num_alignments having found them behaving oddly (something at least partially addressed in a recent BLAST+ release). Have you tried with -max_target_seqs instead?

rskr 10-18-2011 09:30 AM

I don't think you can look at the evalues and say they are lower therefore better, or returning more or fewer hits. The statistics aren't comparable without calibrating the Karlin-Altschul parameters. I am suspicious of blast+, because it is so fast I suspect that they tweaked the hash word size parameters in favor of speed rather than accuracy. You might want to compare the the actual parameters that are used for example, look at what parameters blastall runs blastn with then compare them with blast+, which is the equivalent of blastn. There is a way to get them to print the actual parameters, not just the parameters of the wrapper. My understanding is that there isn't much difference in the two but mostly if there was a difference it was the parametrization that the wrappers used.

Symphysodon 10-19-2011 03:30 AM

Hi,

Thanks maubp and rskr for your feedback.

I am using blastn from blastall 2.2.23 and blastn from BLAST+ 2.2.25.

Perhaps if I specify for both to use the same hash word size, that might be a more equivalent comparison. Note that I have specified for both to have the dust filters turned OFF.

I'll try -max_target_seqs in BLAST+. Do you know what the equivalent parameter in BLASTALL is?

I specified -num_descriptions and -num_alignments for BLAST+ blastn as the legacy_blast.pl returned them as the equivalent of -b and -v in BLASTALL blastn.

If anyone can let me know how to get both applications to print out all the actual default parameters they used, that'd be great.


Cheers!

mdimon 10-25-2011 02:52 PM

The difference in the number of hits between the default and csv formats is that the -b and -v parameters are only followed for the default format. In the csv format, the -b and -v parameters are ignored.

In BLAST+ this was remedied by the introduction of the -max_target_seqs parameter. The documentation suggests that for the default format, the -num_descriptions and -num_alignments options should be used but for XML and tabular output, the -max_target_seqs options should be used instead.

As far as seeing different results between old and new BLASTs, have you figured out what type of sequences lead to different results?


All times are GMT -8. The time now is 06:21 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.