Hi there,
I am working on an expressed sequence tag (EST) database for white clover generated using Roche GS-FLX platform. Using blastp (ncbi blast+ 2.2.25) I am trying to identify a set of orthologous genes present in the EST database and the sequenced genome of model species Medicago. In my output there are frequently two or more hits at different ends of the subject protein. For example as is the case in this output for query P21171::
...
P21171 gi|16804543|ref|NP_466028.1| 6e-50 194 484 401 141 64.54 74.47 8
P21171 gi|16802439|ref|NP_463924.1| 6e-35 144 484 227 119 57.14 75.63 3
P21171 gi|16802439|ref|NP_463924.1| 0.006 38.5 484 227 80 36.25 47.50 3
P21171 gi|126697942|ref|YP_001086839.1| 5e-26 115 484 335 125 49.60 67.20 6
P21171 gi|126698969|ref|YP_001087866.1| 2e-25 113 484 509 105 46.67 64.76 2
...
Is it possible to print only the best hit for each sequence? I've looked through the manual but haven't found anything. If this isn't possible, any pointers in how I would use blast parsers to circumvent this problem?
I am working on an expressed sequence tag (EST) database for white clover generated using Roche GS-FLX platform. Using blastp (ncbi blast+ 2.2.25) I am trying to identify a set of orthologous genes present in the EST database and the sequenced genome of model species Medicago. In my output there are frequently two or more hits at different ends of the subject protein. For example as is the case in this output for query P21171::
...
P21171 gi|16804543|ref|NP_466028.1| 6e-50 194 484 401 141 64.54 74.47 8
P21171 gi|16802439|ref|NP_463924.1| 6e-35 144 484 227 119 57.14 75.63 3
P21171 gi|16802439|ref|NP_463924.1| 0.006 38.5 484 227 80 36.25 47.50 3
P21171 gi|126697942|ref|YP_001086839.1| 5e-26 115 484 335 125 49.60 67.20 6
P21171 gi|126698969|ref|YP_001087866.1| 2e-25 113 484 509 105 46.67 64.76 2
...
Is it possible to print only the best hit for each sequence? I've looked through the manual but haven't found anything. If this isn't possible, any pointers in how I would use blast parsers to circumvent this problem?
Comment