SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Standalone BLAST output MattN Bioinformatics 8 02-02-2015 12:17 AM
Annotate contigs with BLAST hit names; remove contigs with no hit Bueller_007 Bioinformatics 10 02-27-2013 11:22 AM
Standalone BLAST output format question dacotahm Bioinformatics 0 04-27-2012 09:51 AM
Standalone Blast output jomaco Bioinformatics 1 01-31-2012 08:18 AM
Help with standalone blast Ami General 0 09-07-2010 02:37 AM

Reply
 
Thread Tools
Old 09-07-2012, 10:38 PM   #1
Tsuyoshi
Member
 
Location: japan

Join Date: Sep 2012
Posts: 24
Question how to output only one best-hit result using standalone blast

Hello
I am running standalone blast now to blastp a protein data set against a genome data.

Although I have set the best-hits filtering algorithm, namely, using -best_hit_overhang, and the value I set was 0.25 (0.1~0/25 was recommended at http://www.ncbi.nlm.nih.gov/books/NB...stHits_filteri)

Compared the results without -best_hit_overhang, indeed, the number of matches for each inquery was eliminated a lot, however, what I want is to output just one best hit result (according to the bit score and evalue).

For example, the output result was like
***************************************************************
gi|270118727|emb|CAT18807.1| gi|323452346|gb|EGB08220.1| 26.15 65 48 0 31 95 835 899 0.05 35.4
gi|270118727|emb|CAT18807.1| gi|323451569|gb|EGB07446.1| 28.77 73 46 2 226 298 14 80 0.42 31.2
gi|270118727|emb|CAT18807.1| gi|323445796|gb|EGB02232.1| 30.65 62 43 0 349 410 111 172 1.8 29.3
gi|270118727|emb|CAT18807.1| gi|323449782|gb|EGB05667.1| 29.27 41 29 0 132 172 105 145 2.1 29.6
gi|270118727|emb|CAT18807.1| gi|323448614|gb|EGB04510.1| 26.92 52 38 0 189 240 15 66 6.5 27.7
***************************************************************

And I just want the top best one,
***************************************************************
gi|270118727|emb|CAT18807.1| gi|323452346|gb|EGB08220.1| 26.15 65 48 0 31 95 835 899 0.05 35.4
***************************************************************

Would anyone please help to tell me how to do that?
Tsuyoshi is offline   Reply With Quote
Old 09-08-2012, 05:05 AM   #2
dariober
Senior Member
 
Location: Cambridge, UK

Join Date: May 2010
Posts: 310
Default

Hi- I'm not sure how/if it can be done by Blast. However you can extract the top hit from the blast ouput (blastout.txt) with the code below:

Code:
sort -k1,1 -k12,12nr -k11,11n  blastout.txt | sort -u -k1,1 --merge
The first sort orders the blast output by query name then by the 12th column in descending order (bit score - I think), then by 11th column ascending (evalue I think).
The second sort picks the first line from each query. Obviously you can skip the first sort if the output is already sorted in the 'correct' order.

Hope it helps (make sure it does what you want)...

Dario
dariober is offline   Reply With Quote
Old 09-08-2012, 03:56 PM   #3
NormSci
Member
 
Location: Canada

Join Date: Apr 2012
Posts: 11
Default

Hi Tsuyoshi,

After fooling around with blast+ today, I think I've managed to achieve your objective. The command I'm using is:

blastn -query transcripts.fa -out transcripts.blast.txt -task megablast -db refseq_rna -num_threads 12 -evalue 1e-10 -best_hit_score_edge 0.05 -best_hit_overhang 0.25 -outfmt 7 -perc_identity 50 -max_target_seqs 1 &

Adding the "-max_target_seqs" flag and setting it to "1" yields what appears to be the best hit in terms of e-value and bit score. I haven't done extensive comparisons, but it appears where multiple matches yield the same e-value (e.g., 0), the match with the highest score is retained.

Perhaps anyone who has more experience with blast+ can provide further insight.
NormSci is offline   Reply With Quote
Old 09-09-2012, 04:55 AM   #4
Tsuyoshi
Member
 
Location: japan

Join Date: Sep 2012
Posts: 24
Default

Quote:
Originally Posted by dariober View Post
Hi- I'm not sure how/if it can be done by Blast. However you can extract the top hit from the blast ouput (blastout.txt) with the code below:

Code:
sort -k1,1 -k12,12nr -k11,11n  blastout.txt | sort -u -k1,1 --merge
The first sort orders the blast output by query name then by the 12th column in descending order (bit score - I think), then by 11th column ascending (evalue I think).
The second sort picks the first line from each query. Obviously you can skip the first sort if the output is already sorted in the 'correct' order.

Hope it helps (make sure it does what you want)...

Dario
Dear Dario,
Thank you very much for answering this question, and you helped me a lot to resolve the problems of python several days ago.

In order to extract the best hit in results.txt, I have tried an alternative way in using macro tool of excel. The script for this purpose was written like
****************************************************************
Sub tophitonly()
Dim i%
For i = [a65536].End(3).Row To 1 Step -1
If Application.CountIf(Range("a:a"), Cells(i, 1)) > 1 Then
Cells(i, 1).EntireRow.Delete
End If
Next i
End Sub
****************************************************************
Anyway, thank you very much. And I would like to try your method in the following days.
Tsuyoshi is offline   Reply With Quote
Old 09-09-2012, 05:09 AM   #5
Tsuyoshi
Member
 
Location: japan

Join Date: Sep 2012
Posts: 24
Default

Quote:
Originally Posted by NormSci View Post
Hi Tsuyoshi,

After fooling around with blast+ today, I think I've managed to achieve your objective. The command I'm using is:

blastn -query transcripts.fa -out transcripts.blast.txt -task megablast -db refseq_rna -num_threads 12 -evalue 1e-10 -best_hit_score_edge 0.05 -best_hit_overhang 0.25 -outfmt 7 -perc_identity 50 -max_target_seqs 1 &

Adding the "-max_target_seqs" flag and setting it to "1" yields what appears to be the best hit in terms of e-value and bit score. I haven't done extensive comparisons, but it appears where multiple matches yield the same e-value (e.g., 0), the match with the highest score is retained.

Perhaps anyone who has more experience with blast+ can provide further insight.
Dear NormSci,
Thank you for your kind reply!
I learned more useful commands in using blast+ from your reply, such as set the minimum identity percentage and especially the max_targe_seqs. It sounds convenient to use to output the only top match. I would like to try it ASAP.

Alternatively, I managed to extract the top hit match from excel file using macro tools like
****************************************************************
Sub tophitonly()
Dim i%
For i = [a65536].End(3).Row To 1 Step -1
If Application.CountIf(Range("a:a"), Cells(i, 1)) > 1 Then
Cells(i, 1).EntireRow.Delete
End If
Next i
End Sub
****************************************************************
Since the queries in result.txt were in an ascending order based on the bit score and evalue, the macro worked well to pick the top hit one of each query. Meanwhile, I could count how many matches were obtained from one blast.
Thank you.
Tsuyoshi is offline   Reply With Quote
Old 05-21-2013, 07:04 PM   #6
carmeyeii
Senior Member
 
Location: Mexico

Join Date: Mar 2011
Posts: 137
Default

@Dariober, one needs to also append the -t, parameter to your code snippet , since it is a comma-separated file and the default delimiter for sort is non-blank to blank.

Code:
sort -t, -k1,1 -k12,12nr -k11,11n  blastout.txt | sort -u -t, -k1,1 --merge
-Carmen

Last edited by carmeyeii; 05-21-2013 at 07:21 PM.
carmeyeii is offline   Reply With Quote
Old 07-30-2013, 04:08 AM   #7
rhinoceros
Senior Member
 
Location: sub-surface moon base

Join Date: Apr 2013
Posts: 372
Default

Quote:
Originally Posted by dariober View Post
Hi- I'm not sure how/if it can be done by Blast. However you can extract the top hit from the blast ouput (blastout.txt) with the code below:

Code:
sort -k1,1 -k12,12nr -k11,11n  blastout.txt | sort -u -k1,1 --merge
The first sort orders the blast output by query name then by the 12th column in descending order (bit score - I think), then by 11th column ascending (evalue I think).
The second sort picks the first line from each query. Obviously you can skip the first sort if the output is already sorted in the 'correct' order.

Hope it helps (make sure it does what you want)...

Dario
I've lost count on how many times I've used this sort. However, I just realized that it doesn't work as intended. Sort -n compares according to string numerical value, so for example 1e-10 would be smaller than 2e-100. I believe the command below is 'correct'..

Code:
sort -k1,1 -k12,12gr -k11,11g -k3,3gr blastout.txt | sort -u -k1,1 --merge > bestHits
Although the bitscore sort probably works fine with -k12,12nr too (which might make it somewhat faster perhaps??).

Note, make sure that your locale settings recognize "." as a decimal separator, e.g.

Code:
export LC_ALL=en_US.UTF-8
export LANG=en_US.UTF-8

Last edited by rhinoceros; 05-17-2014 at 04:08 AM.
rhinoceros is offline   Reply With Quote
Old 03-02-2014, 06:34 PM   #8
karla_oliveira
Junior Member
 
Location: Brazil

Join Date: Mar 2014
Posts: 1
Default

Hello everybody!

I am in a kind of problem with my standalone blastp.
I used parameters from NormSci and worked pretty good but there is one point remained in my case. I have 1000 proteins to compare to my query and at these commands, all of them are showed (even with the message "0 hits found"). This represents 998 of my 1000 but I just want to see the 2 which have hits.

Does anyone have a tip for me?

Thank you so much!
karla_oliveira is offline   Reply With Quote
Old 09-27-2014, 10:25 PM   #9
HSV-1
Member
 
Location: asia

Join Date: Jul 2012
Posts: 38
Default

You guys save my ass, thanks!
HSV-1 is offline   Reply With Quote
Old 02-17-2015, 02:13 AM   #10
kaps
Member
 
Location: Uganda

Join Date: Jan 2015
Posts: 71
Default

Hi,

Can this -max_target_seqs 1 & also work for commandline blast if I need to output the best hits?

kaps
kaps is offline   Reply With Quote
Old 03-04-2015, 03:24 PM   #11
rahulvrane
Junior Member
 
Location: Melbourne

Join Date: Apr 2012
Posts: 6
Default

Technically you dont need to go that far. Since blastn/blastp scores for latest blast+ packages is already sorted with top hit as the first hit, you can just use the following.

Code:
blastn -query transcripts.fa -db Targets.fa [other options you like ] -outfmt 7 -out blast_output.txt

cat blast_output.txt |awk '/hits found/{getline;print}' | grep -v "#" > top_hits.txt
Rahul

Last edited by rahulvrane; 03-04-2015 at 03:27 PM.
rahulvrane is offline   Reply With Quote
Reply

Tags
output, the only top best hit one

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 11:44 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO