SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
reciprocal blast renesh Bioinformatics 7 02-10-2017 08:33 AM
detect orthologs using reciprocal BLAST nicedad Bioinformatics 2 06-07-2012 07:37 AM
What is reciprocal best hit? Dinesh Bioinformatics 1 02-08-2011 03:51 AM
Reciprocal Blast With Multiple Contigs Per Gene jwhittall Bioinformatics 0 07-21-2010 01:20 PM
Need Help Regarding Reciprocal best hit in BLAST Bharat Bioinformatics 1 03-11-2010 03:04 AM

Reply
 
Thread Tools
Old 07-15-2014, 08:38 PM   #1
Shorash
Member
 
Location: Brisbane

Join Date: Sep 2012
Posts: 18
Default Reciprocal blast help

I have 2 datasets (Dataset A and dataset B) in which I have reciprocally allocated as BLASTx hits against one another. However I am having difficulties in identifying those contigs that are hits to one another from both datasets. I have been able to export the BLAST results into sequence tables but I'm not sure how I can identify the reciprocal top hits of one another for a large number of contigs (160,000). Here is an example of how it looks in an Excel spreadsheet:

aaaaaDataset Aaaaaaaaaaaaaaaaa Dataset B
Contig123=Contig789_1 Contig789=111Contig123_1
Contig456=72Contig221 Contig221=Contig456_3
Contig777=43Contig954 Contig954=3Contig1561_1


In the example above you can see that the results or hit from each file have characters on the beginning and sometimes on the end of each corresponding hit making it hard to compare using excel formulas. In the example, the first two rows are the ones I'm interested in extracting as they have hit the same contig in both datasets, unlike row 3 which do not match.

Any help would be greatly appreciated!

Last edited by Shorash; 07-15-2014 at 08:40 PM.
Shorash is offline   Reply With Quote
Old 07-16-2014, 08:02 AM   #2
bio_boris
Member
 
Location: Urbana IL

Join Date: Mar 2013
Posts: 14
Default

What have you come up with so far? Do you have any scripts to use or code that you have tried to create?

Some resources
http://ged.msu.edu/angus/tutorials/
http://ged.msu.edu/angus/tutorials/r...cal-blast.html
http://seqanswers.com/forums/showthread.php?t=20652
bio_boris is offline   Reply With Quote
Old 07-20-2014, 10:25 AM   #3
someperson
Junior Member
 
Location: Germany

Join Date: Jul 2013
Posts: 9
Default

Not really a direct answer to your question, but a tip for a tool that probably already does what you want:
I'm mostly using the tool proteinortho (curren version proteinortho5) for reciprocal blast analyses.
A strong advantage of this tool is, that it does not only direct orthologs via direct reziprocal blast, but can also list the respective paralogs and group them into "orthologeous groups".
links:
https://www.bioinf.uni-leipzig.de/So.../proteinortho/
http://www.biomedcentral.com/1471-2105/12/124
someperson is offline   Reply With Quote
Old 07-21-2014, 06:51 PM   #4
Shorash
Member
 
Location: Brisbane

Join Date: Sep 2012
Posts: 18
Default

Quote:
Originally Posted by bio_boris View Post
What have you come up with so far? Do you have any scripts to use or code that you have tried to create?

Some resources
http://ged.msu.edu/angus/tutorials/
http://ged.msu.edu/angus/tutorials/r...cal-blast.html
http://seqanswers.com/forums/showthread.php?t=20652
I haven't managed to create any scripts or codes. I've been manually looking at specific genes of interest but it would be great to be able to do all of them at once.
Shorash is offline   Reply With Quote
Old 07-21-2014, 06:57 PM   #5
Shorash
Member
 
Location: Brisbane

Join Date: Sep 2012
Posts: 18
Default

Quote:
Originally Posted by someperson View Post
Not really a direct answer to your question, but a tip for a tool that probably already does what you want:
I'm mostly using the tool proteinortho (curren version proteinortho5) for reciprocal blast analyses.
A strong advantage of this tool is, that it does not only direct orthologs via direct reziprocal blast, but can also list the respective paralogs and group them into "orthologeous groups".
links:
https://www.bioinf.uni-leipzig.de/So.../proteinortho/
http://www.biomedcentral.com/1471-2105/12/124
Great thanks for that, I'll give this a try.
Shorash is offline   Reply With Quote
Old 07-21-2014, 10:39 PM   #6
rhinoceros
Senior Member
 
Location: sub-surface moon base

Join Date: Apr 2013
Posts: 372
Default

Quote:
Originally Posted by Shorash View Post
how I can identify the reciprocal top hits of one another for a large number of contigs (160,000).
I would go for tabular blast output and first sort for best hits. So then, depending how you did your blasts, you can have e.g. two best-hit sorted output files with query in the first column and subject in the second. One option would be to cut columns 1-2 and switch the the order in one file and then cat it with the other file. Then you'd sort based on column 1 and only output the lines where uniq -c is 2. I'm sure there's an awk one-liner for this too..
__________________
savetherhino.org
rhinoceros is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:14 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO