SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   Bioinformatics (http://seqanswers.com/forums/forumdisplay.php?f=18)
-   -   Reciprocal Blast & Parse and Collect best hits (http://seqanswers.com/forums/showthread.php?t=79968)

Izal 01-05-2018 03:47 AM

Reciprocal Blast & Parse and Collect best hits
 
Hi everyone,

I'm novice doing reciprocal blast and I have spent a lot of time trying it. I don't know how to do a reciprocal blast and how to parse the best hits and compare & find matches. So, an advice would be a big help to me.

I need to do a reciprocal Blast in Python (Jupyter) to identify same/ortholougs proteins bewteen two strains.

- For that, first I make a db with the respective FASTA with the coding sequences.
Code:

!makeblastdb -in $fastap_1 -dbtype 'prot'
!makeblastdb -in $fastap_2 -dbtype 'prot'

- Then, I run blastp for both files:
Code:

blastp_ab = !blastp -query $fastap_1 -db $fastap_2 -out ab.txt
blastp_ba = !blastp -query $fastap_2 -db $fastap_1 -out ba.txt

Up to here everything works. **Now, I have to parse the blast results to collect best hits. I have tried it,** **but not even print "hello??"**

If I pass as argument the file "ab.txt", the function "collect_best_hits" returns me an empty dictionary. And if I pass the handle "blastp_ab" as argument, "collect_best_hits" functions return me and advice of "blastall" and the empty dictionary:

> no BLAST output. Check that blastall is in your PATH
>
> { }


(Options)
Code:

#filename = "ab.txt"
#filename = blastp_ab
filename = open("ab.txt", "r")

collect_best_hits(filename)


The function contains:

Code:

import os
import sys
import csv
import blastparser


def collect_best_hits(filename):
      d = {}
      try:
            for n, record in enumerate(blastparser.parse_fp(filename)):
                print "hola"
                if n % 100 == 0:
                      print >>sys.stderr, 'loading 1 ...', n
             
                print "!-", n, record
   
                best_score = None
                for hit in record.hits:
                    print "!- ", hit, len(record.hits)
                    for match in hit.matches:
                        print "!- ", match, len(hit.matches)
                        query = record.query_name
                        if query.startswith('gi'):
                            query = query.split('|', 2)[2]
                        subject = hit.subject_name
   
                        score = match.score
                        print "!- ", score
   
                        # only keep the best set of scores for any query
                        if best_score and best_score > score:
                            continue
                        best_score = score
   
                        x = d.get(query, [])
                        x.append((subject, score))
                        d[query] = x
   
                    if best_score and best_score != score:
                        break
        except Exception as e:
            print(e)
        return d


What I'm doing wrong?

All help is welcome!


All times are GMT -8. The time now is 07:11 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.