aprice67 11-30-2015 10:48 AM

Identifying orthologous genes between two bacterial strains

I currently am looking at two different strains of the same bacteria. I have RNA-seq data and reference genomes from NCBI. What I want to do is find the following information:
  1. Genes that exist in A, but not in B.
  2. Genes that exist in both A and B. (Including start/end positions)
  3. Genes that do not exist in A, but exist in B.

Very venn diagram sort of stuff here.

I tried making a blast database from A and B, then blasting them against each other, and selecting the set of genes that met criteria like high identity and significant length (over 200), but this way I'm only able to identify 100-200 genes, when there are around 5000 in these genomes.

Can anyone offer some advice or point me toward a tool that will help me accomplish this? Thanks very much in advance.

GenoMax 11-30-2015 11:35 AM

Were you using protein sequences for the blast search? Probably not, if you only got 100-200 genes.

Since you have a reference you could just align to that, extract a consensus for each gene and then translate/compare the resulting proteins.

aprice67 11-30-2015 11:48 AM

You're right. I've just been working on this and tried protein sequences and am having much more promising results! I can't believe I didn't make the connection. Thanks. :)

