Reciprocal Blast & Parse and Collect best hits

Izal

Junior Member

Join Date: Jan 2018
Posts: 5

Reciprocal Blast & Parse and Collect best hits

01-05-2018, 04:47 AM

Hi everyone,

I'm novice doing reciprocal blast and I have spent a lot of time trying it. I don't know how to do a reciprocal blast and how to parse the best hits and compare & find matches. So, an advice would be a big help to me.

I need to do a reciprocal Blast in Python (Jupyter) to identify same/ortholougs proteins bewteen two strains.

- For that, first I make a db with the respective FASTA with the coding sequences.

Code:

!makeblastdb -in $fastap_1 -dbtype 'prot'
!makeblastdb -in $fastap_2 -dbtype 'prot'

- Then, I run blastp for both files:

Code:

blastp_ab = !blastp -query $fastap_1 -db $fastap_2 -out ab.txt
blastp_ba = !blastp -query $fastap_2 -db $fastap_1 -out ba.txt

Up to here everything works. **Now, I have to parse the blast results to collect best hits. I have tried it,** **but not even print "hello??"**

If I pass as argument the file "ab.txt", the function "collect_best_hits" returns me an empty dictionary. And if I pass the handle "blastp_ab" as argument, "collect_best_hits" functions return me and advice of "blastall" and the empty dictionary:

> no BLAST output. Check that blastall is in your PATH
>
> { }

(Options)

Code:

#filename = "ab.txt"
#filename = blastp_ab
filename = open("ab.txt", "r")

collect_best_hits(filename)

The function contains:

Code:

import os
import sys
import csv
import blastparser


def collect_best_hits(filename):
       d = {}
       try:
            for n, record in enumerate(blastparser.parse_fp(filename)):
                print "hola"
                if n % 100 == 0:
                      print >>sys.stderr, 'loading 1 ...', n
               
                print "!-", n, record
    
                best_score = None
                for hit in record.hits:
                    print "!- ", hit, len(record.hits)
                    for match in hit.matches:
                        print "!- ", match, len(hit.matches)
                        query = record.query_name
                        if query.startswith('gi'):
                            query = query.split('|', 2)[2]
                        subject = hit.subject_name
    
                        score = match.score
                        print "!- ", score
    
                        # only keep the best set of scores for any query
                        if best_score and best_score > score:
                            continue
                        best_score = score
    
                        x = d.get(query, [])
                        x.append((subject, score))
                        d[query] = x
    
                    if best_score and best_score != score:
                        break
        except Exception as e:
            print(e)
        return d

What I'm doing wrong?

All help is welcome!

Tags: best hits, blast, parse, python, reciprocal blast hits

Previous template Next

Essential Discoveries and Tools in Epitranscriptomics

by seqadmin

The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
- Channel: Articles
Yesterday, 07:01 AM
Current Approaches to Protein Sequencing

by seqadmin

Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
- Channel: Articles
04-04-2024, 04:25 PM

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 49 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 50 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 43 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 55 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Reciprocal Blast & Parse and Collect best hits

Latest Articles

ad_right_rmr

News