Hello,
My goal is a phylogeny of multiple isolates, showing me which isolate is closer to which.
I got an organism from which I did population genomics from a few distant geographic locations. The genome size is about 7-10mb.
I did denovo assemblies using MIRA, for all of my isolates. I picked the best assembly, concatenated all the contigs, and mapped the reads of the other isolate on top of it to generate a new consensus for each of the other isolates.
Now, because the species is heterozygous, I picked a cutoff value of 85% when calling basepairs for the consensus. This should get heterozygous loci to be called as an ambiguity. I now took the consensus of all isolates, and aligned it using MAUVE. I trimmed out all sites that had ambiguities, thus removing heterozygous sites.
I am left with a very long alignment, still about 7-10mb, and only a few thousand sites having any variability whatsoever, spaced out pretty consistently.
Now for the phylogeny, i picked a simple F model, 100 BS, estimated I and G, phyml.
Any thoughts on this? It would be really helpful for some advice, what might I have omitted? Is PHYML he best for this kind of analysis, or should I try bayesian, and if so, mr bayes, phylobayes or even beagle? Are there any alternatives to MAUVE?
Thank you for your help,
Adrian
My goal is a phylogeny of multiple isolates, showing me which isolate is closer to which.
I got an organism from which I did population genomics from a few distant geographic locations. The genome size is about 7-10mb.
I did denovo assemblies using MIRA, for all of my isolates. I picked the best assembly, concatenated all the contigs, and mapped the reads of the other isolate on top of it to generate a new consensus for each of the other isolates.
Now, because the species is heterozygous, I picked a cutoff value of 85% when calling basepairs for the consensus. This should get heterozygous loci to be called as an ambiguity. I now took the consensus of all isolates, and aligned it using MAUVE. I trimmed out all sites that had ambiguities, thus removing heterozygous sites.
I am left with a very long alignment, still about 7-10mb, and only a few thousand sites having any variability whatsoever, spaced out pretty consistently.
Now for the phylogeny, i picked a simple F model, 100 BS, estimated I and G, phyml.
Any thoughts on this? It would be really helpful for some advice, what might I have omitted? Is PHYML he best for this kind of analysis, or should I try bayesian, and if so, mr bayes, phylobayes or even beagle? Are there any alternatives to MAUVE?
Thank you for your help,
Adrian