Hi everyone,
I'm novice doing reciprocal blast and I have spent a lot of time trying it. I don't know how to do a reciprocal blast and how to parse the best hits and compare & find matches. So, an advice would be a big help to me.
I need to do a reciprocal Blast in Python (Jupyter) to identify same/ortholougs proteins bewteen two strains.
- For that, first I make a db with the respective FASTA with the coding sequences.
- Then, I run blastp for both files:
Up to here everything works. **Now, I have to parse the blast results to collect best hits. I have tried it,** **but not even print "hello??"**
If I pass as argument the file "ab.txt", the function "collect_best_hits" returns me an empty dictionary. And if I pass the handle "blastp_ab" as argument, "collect_best_hits" functions return me and advice of "blastall" and the empty dictionary:
> no BLAST output. Check that blastall is in your PATH
>
> { }
(Options)
The function contains:
What I'm doing wrong?
All help is welcome!
I'm novice doing reciprocal blast and I have spent a lot of time trying it. I don't know how to do a reciprocal blast and how to parse the best hits and compare & find matches. So, an advice would be a big help to me.
I need to do a reciprocal Blast in Python (Jupyter) to identify same/ortholougs proteins bewteen two strains.
- For that, first I make a db with the respective FASTA with the coding sequences.
Code:
!makeblastdb -in $fastap_1 -dbtype 'prot' !makeblastdb -in $fastap_2 -dbtype 'prot'
Code:
blastp_ab = !blastp -query $fastap_1 -db $fastap_2 -out ab.txt blastp_ba = !blastp -query $fastap_2 -db $fastap_1 -out ba.txt
If I pass as argument the file "ab.txt", the function "collect_best_hits" returns me an empty dictionary. And if I pass the handle "blastp_ab" as argument, "collect_best_hits" functions return me and advice of "blastall" and the empty dictionary:
> no BLAST output. Check that blastall is in your PATH
>
> { }
(Options)
Code:
#filename = "ab.txt" #filename = blastp_ab filename = open("ab.txt", "r") collect_best_hits(filename)
The function contains:
Code:
import os import sys import csv import blastparser def collect_best_hits(filename): d = {} try: for n, record in enumerate(blastparser.parse_fp(filename)): print "hola" if n % 100 == 0: print >>sys.stderr, 'loading 1 ...', n print "!-", n, record best_score = None for hit in record.hits: print "!- ", hit, len(record.hits) for match in hit.matches: print "!- ", match, len(hit.matches) query = record.query_name if query.startswith('gi'): query = query.split('|', 2)[2] subject = hit.subject_name score = match.score print "!- ", score # only keep the best set of scores for any query if best_score and best_score > score: continue best_score = score x = d.get(query, []) x.append((subject, score)) d[query] = x if best_score and best_score != score: break except Exception as e: print(e) return d
What I'm doing wrong?
All help is welcome!