View Single Post
Old 09-09-2012, 04:09 AM   #5
Location: japan

Join Date: Sep 2012
Posts: 24

Originally Posted by NormSci View Post
Hi Tsuyoshi,

After fooling around with blast+ today, I think I've managed to achieve your objective. The command I'm using is:

blastn -query transcripts.fa -out transcripts.blast.txt -task megablast -db refseq_rna -num_threads 12 -evalue 1e-10 -best_hit_score_edge 0.05 -best_hit_overhang 0.25 -outfmt 7 -perc_identity 50 -max_target_seqs 1 &

Adding the "-max_target_seqs" flag and setting it to "1" yields what appears to be the best hit in terms of e-value and bit score. I haven't done extensive comparisons, but it appears where multiple matches yield the same e-value (e.g., 0), the match with the highest score is retained.

Perhaps anyone who has more experience with blast+ can provide further insight.
Dear NormSci,
Thank you for your kind reply!
I learned more useful commands in using blast+ from your reply, such as set the minimum identity percentage and especially the max_targe_seqs. It sounds convenient to use to output the only top match. I would like to try it ASAP.

Alternatively, I managed to extract the top hit match from excel file using macro tools like
Sub tophitonly()
Dim i%
For i = [a65536].End(3).Row To 1 Step -1
If Application.CountIf(Range("a:a"), Cells(i, 1)) > 1 Then
Cells(i, 1).EntireRow.Delete
End If
Next i
End Sub
Since the queries in result.txt were in an ascending order based on the bit score and evalue, the macro worked well to pick the top hit one of each query. Meanwhile, I could count how many matches were obtained from one blast.
Thank you.
Tsuyoshi is offline   Reply With Quote