James88 10-28-2014 12:48 PM

Reciprocal Blast Guidance...

I'm working on creating a core set of orthologous genes in ~40 c.diff strains in order to create a binary tree at the end.

I was told to use reciprocal blast searches. I understand this and have both a script to perform them and handle the output.

My issue is... When doing this reciprocal blast in a pairwise fashion how do I decide which strains to compare with each other? For example, should I compare strain1 with all other strains and then strain2 with all others etc. or should I compare each one with a sensible reference genome (for example, that which the assembly of these genomes was originally carried out)?

Obviously the first method would involve an incredible number of runs. The latter would take fewer, but which is more sensible and will give meaningful results?

I'm a masters student by the way so whilst I think I know everything, I possibly don't...


LeightonP 10-29-2014 12:11 AM

Generally, you need to compare all strains with all other strains. You only have reciprocal BLAST hits between two strains A and B if the BLAST has been performed reciprocally: A vs B, and B vs A.

That's 40 x 40 - 40 = 1560 comparisons, for 40 strains.

I can imagine circumstances where, if the only differences between a pair of isolates were SNPs, you could make assumptions and avoid the comparison. I'd typically just run the comparison, anyway.

James88 10-29-2014 08:43 AM

Thank you for the reply/advice. I agree... That certainly felt like the most sensible thing to do. I had better get writing a little script!

LeightonP 10-29-2014 12:19 PM

It's a common task, so you might find that someone has already written a tool/script that would suit your purposes, in a language that suits your workflow. There's also a tool in the galaxy toolshed ('blast_rbh').

James88 11-05-2014 02:45 AM

I had already... acquired one from the Kostas lab but have since been told that I apparently don't need it as we're using OrthAgogue now which allegedly does the reciprocal blast from an input of an all-vs-all blast output of all my protein sequences.

The joys of bioinformatics!

Again, thank you for the help, it's appreciated.

