SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



View Poll Results: Do you use stand alone blast?
Always 2 66.67%
Sometimes 1 33.33%
Never 0 0%
Voters: 3. You may not vote on this poll

Similar Threads
Thread Thread Starter Forum Replies Last Post
Perl or Python ETHANol Bioinformatics 19 08-29-2019 04:30 AM
BLAST+ creating custom blast database and using blast+ filtering features deniz Bioinformatics 3 07-07-2019 08:04 AM
BLAST database error - when changing to new BLAST+ local program biobio Bioinformatics 4 06-15-2011 05:20 AM
python dror Bioinformatics 0 11-29-2010 03:22 AM
python in ABySS dror RNA Sequencing 0 11-28-2010 05:19 AM

Reply
 
Thread Tools
Old 09-07-2012, 03:47 AM   #1
Tsuyoshi
Member
 
Location: japan

Join Date: Sep 2012
Posts: 24
Question Reciproca blast using Python

HI
I am running local Blast (version 2.2.6) on a mac machine, and I want to find the homologs between two protein database(named A and B) by reciprocal blast methods under python command lines.
I have succeeded in blast A against B, and B against A, which resulted in two csv files which contained the best hits from each blast. Next I would like to use these two files to do Reciprocal blast, namely to find the homologs between A and B, according to the blast e-value and bit scores.

The codes for reciprocal blast was downloaded from http://ged.msu.edu/angus/tutorials/r...cal-blast.html
and it is like this
******************************************************************************************************

#! /usr/bin/env python
import sys, csv

# number below which to discard results
E_CUTOFF = 1e-3

def load_csv_to_dict(filename):
"""
Load the CSV file into a dictionary, tying query sequences to subject
sequences.
"""
d = {}

for (query_name, subject_name, score,expect) in csv.reader(open(filename)):
# fix the names that start with a 'gi|XXXXXXX|'
query_name = demangle_name(query_name)
subject_name = demangle_name(subject_name)

# convert the e-value into a floating point number
expect = float(expect)
if query_name not in d and expect < E_CUTOFF:
# if we don't already have an entry for this, put it in.
d[query_name] = subject_name

return d

def demangle_name(name):
"""
This functions strips off the 'gi|XXXXX|' name encoding that NCBI uses.

Note that NCBI does this automatically for subject sequences.
"""
if name.startswith('gi|'):
name = name.split('|')
name = name[2:]
name = "|".join(name)

return name

###

# This is the code that's run when you actually type 'find-reciprocal.py'
# at the command line; the above are function definitions, that define
# reusable blocks or chunks of code.

in_file_1 = sys.argv[1]
in_file_2 = sys.argv[2]

d1 = load_csv_to_dict(in_file_1)
d2 = load_csv_to_dict(in_file_2)

output = csv.writer(sys.stdout)

for seqname in d1:
seqmatch1 = d1[seqname]
seqmatch2 = d2.get(seqmatch1)

if seqmatch2 == seqname:
output.writerow([seqname, seqmatch1])

*******************************************************************************************************
However, when I used this command, an error came up like

*******************************************************************************************************

Traceback (most recent call last):
File "ngs-course/blast/find-reciprocal.py", line 49, in <module>
d1 = load_csv_to_dict(in_file_1)
File "ngs-course/blast/find-reciprocal.py", line 14, in load_csv_to_dict
for (query_name, subject_name, score,expect) in csv.reader(open(filename)):
_csv.Error: new-line character seen in unquoted field - do you need to open the file in universal-newline mode?

*******************************************************************************************************

Would anyone please help me to modify the command lines to analyse the data? It would be a great appreciation for discussion here. Thanks!

Last edited by Tsuyoshi; 09-07-2012 at 03:53 AM.
Tsuyoshi is offline   Reply With Quote
Old 09-07-2012, 04:50 AM   #2
dariober
Senior Member
 
Location: Cambridge, UK

Join Date: May 2010
Posts: 311
Default

Hello,

I don't have a file to test this but looking at the Pyhton documentation for open() (http://docs.python.org/library/functions.html#open) and at this post found by Google (http://selfsolved.com/problems/pytho...l-newline-mode), you might fix your problem by replacing csv.reader(open(filename)) with csv.reader(open(filename, "rU")) in function load_csv_to_dict(filename). That is, the new function would be:

Code:
def load_csv_to_dict(filename):
    """
    Load the CSV file into a dictionary, tying query sequences to subject
    sequences.
    """
    d = {}

    for (query_name, subject_name, score,expect) in csv.reader(open(filename, "rU")):
        # fix the names that start with a 'gi|XXXXXXX|'
        query_name = demangle_name(query_name)
        subject_name = demangle_name(subject_name)

        # convert the e-value into a floating point number
        expect = float(expect)
        if query_name not in d and expect < E_CUTOFF:
            # if we don't already have an entry for this, put it in.
            d[query_name] = subject_name

    return d
Post a sample file to reproduce the error would help if you can.

Best
Dario
dariober is offline   Reply With Quote
Old 09-07-2012, 06:38 AM   #3
Tsuyoshi
Member
 
Location: japan

Join Date: Sep 2012
Posts: 24
Default

Dear Dario,
Thank you for your kind answer!
Now it works well!
Thanks!
Tsuyoshi is offline   Reply With Quote
Old 09-07-2012, 06:40 AM   #4
Tsuyoshi
Member
 
Location: japan

Join Date: Sep 2012
Posts: 24
Default

Dear Dario,
Thank you so much!
It works!
Tsuyoshi is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 01:18 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO