Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • BLAST+ vs BLASTALL (legacy BLAST)

    Hi all,

    I have done a comparison of blastn as implemented in BLAST+ 2.2.25 (latest version) and BLASTALL (legacy BLAST) and observed non-trivial discrepancies in the results. In summary, BLAST+ gives more hits of the query to the subject/database, with lower (better) E-values. BLAST+ also often generates best hits (often with better E-values) to different sequences in the subject/database, compared to BLASTALL.

    Of concern also is the fact that the default and .tsv outputs obtained with BLASTALL, blastn also show differences in the number of hits. No such discrepancy was seen in the BLAST+ results, regardless of which output format was specified.

    If anyone has had similar observations or feedback on my observations/analyses, they would be much appreciated.





    Below are examples of equivalent command lines I used for the different BLAST versions.

    The command lines used for BLAST+, blastn were:

    blastn -task blastn -db database -query query.fa -evalue 0.00001 -dust no -num_descriptions 1 -num_alignments 1 -num_threads 8 –out output.blastn (default output)

    blastn -task blastn -db database -query query.fa -evalue 0.00001 -dust no -num_descriptions 1 -num_alignments 1 -num_threads 8 -outfmt 10 qseqid qacc qlen qframe qstart qend qseq sseqid sacc slen sframe sstart send sseq pident nident length mismatch positive ppos gapopen gaps evalue bitscore score –out output.blastn.csv (csv output)


    The command lines used for BLASTALL, blastn were:

    blastall -p blastn -i query.fa -d database -e 0.00001 -v 1 -b 1 -F F -o output.blastn -a 8 (default output)

    blastall -p blastn -i query.fa -d database -e 0.00001 -v 1 -b 1 -F F -m 8 –o output.blastn.tsv -a 2 (tsv output)


    Thanks!

  • #2
    What version numbers? That can make a difference.

    I don't use -num_descriptions and -num_alignments having found them behaving oddly (something at least partially addressed in a recent BLAST+ release). Have you tried with -max_target_seqs instead?

    Comment


    • #3
      I don't think you can look at the evalues and say they are lower therefore better, or returning more or fewer hits. The statistics aren't comparable without calibrating the Karlin-Altschul parameters. I am suspicious of blast+, because it is so fast I suspect that they tweaked the hash word size parameters in favor of speed rather than accuracy. You might want to compare the the actual parameters that are used for example, look at what parameters blastall runs blastn with then compare them with blast+, which is the equivalent of blastn. There is a way to get them to print the actual parameters, not just the parameters of the wrapper. My understanding is that there isn't much difference in the two but mostly if there was a difference it was the parametrization that the wrappers used.

      Comment


      • #4
        Hi,

        Thanks maubp and rskr for your feedback.

        I am using blastn from blastall 2.2.23 and blastn from BLAST+ 2.2.25.

        Perhaps if I specify for both to use the same hash word size, that might be a more equivalent comparison. Note that I have specified for both to have the dust filters turned OFF.

        I'll try -max_target_seqs in BLAST+. Do you know what the equivalent parameter in BLASTALL is?

        I specified -num_descriptions and -num_alignments for BLAST+ blastn as the legacy_blast.pl returned them as the equivalent of -b and -v in BLASTALL blastn.

        If anyone can let me know how to get both applications to print out all the actual default parameters they used, that'd be great.


        Cheers!

        Comment


        • #5
          The difference in the number of hits between the default and csv formats is that the -b and -v parameters are only followed for the default format. In the csv format, the -b and -v parameters are ignored.

          In BLAST+ this was remedied by the introduction of the -max_target_seqs parameter. The documentation suggests that for the default format, the -num_descriptions and -num_alignments options should be used but for XML and tabular output, the -max_target_seqs options should be used instead.

          As far as seeing different results between old and new BLASTs, have you figured out what type of sequences lead to different results?

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Strategies for Sequencing Challenging Samples
            by seqadmin


            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
            03-22-2024, 06:39 AM
          • seqadmin
            Techniques and Challenges in Conservation Genomics
            by seqadmin



            The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

            Avian Conservation
            Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
            03-08-2024, 10:41 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, Yesterday, 06:37 PM
          0 responses
          11 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, Yesterday, 06:07 PM
          0 responses
          10 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-22-2024, 10:03 AM
          0 responses
          51 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-21-2024, 07:32 AM
          0 responses
          68 views
          0 likes
          Last Post seqadmin  
          Working...
          X