Salmon 01-13-2011 08:27 AM

Why is ML/NJ phylogeny different from SNP tree?

I am working on a bacterial genome. We have determined the complete genome sequence of this genome and want to study the evolutionary relationship of my genome and its VERY CLOSE relatives (within the same species).

I need to construct a tree for these genomes.

First, I generated a tree based on the number of SNPS detected in conserved genomic region (~80% of whole genome).

I also constructed trees based on the conserved genomic sequence using ML and NJ methods. The two trees with ML and NJ have the same topology.

When I compared the ML/NJ trees with the SNP tree, it shows the different phylogenetic relationship of some genomes.

When I checked the published papers, it seems most studies like inferring trees using the number of SNPs, especially for the very closed genomes. I think the ML/NJ trees based on the sequence information should be much better than the number of SNPs.

We got two trees with different topologies. I couldn't get a reason for the difference. Does anybody have any comments? which tree should I use? how to explain the difference?

mm.perrineau 01-17-2013 11:55 AM

How do you make your SNPs tree?

A_Morozov 01-21-2013 10:38 PM

Well, you use a bunch of methods and you get a bunch of different trees, that is usual. If you add, for instance, analysis of genomic rearrangements, you will get one more tree, some phenotypic traits will support yet another one and so on. So, pls do the following.
1) Give us some more details on analysis protocol, esp. on application of ML on 80% of genome (either I misunderstood you or it was incredibly computationally expensive) and on how exactly you built the SNP tree.
2) You don't mention whether you take into account horizontal gene transfer. If you do not, you surely must, because otherwise it may mislead your inference.
3) Consider using consensus tree or, better, consensus network for representing results, if everything else fails.

