Hello, I am trying to create a phylogeny with 102 different strains of Neisseria. Normally I would just crate a SNP table, derive a multi-fasta from the SNP table (i.e. create a SNP profile fasta for all SNPs within any given strain), algin those SNP profiles, and use a tree program to draw a phylogeny.
The issue here is that I have a lot of genomes and also that the genomes are pretty divergent (SNP counts ranging from 2k to ~30k) so the total list of characters in each fasta comes out to about 180k. Apparently aligning 102 fasta files that each contain 180,000 bases is too much for any of the tools that I am currently using.
So can anyone suggest an alternate method? From looking at other posts, it looks like I oculd try to align Mugsy and use that to create a core genome, which I would then use as the reference to call SNPs from (the theory here being that I would have a lot fewer SNPs if I was calling form the core genome as opposed to the reference that I'm currently using). My only hesitation is that I'm not entirely sure how much this will cut down on the SNPs. Any other ideas? Perhaps it would make more sense to just try to filter out SNPs form the current list but I couldn't think of a good methodology for this.
Thanks.
The issue here is that I have a lot of genomes and also that the genomes are pretty divergent (SNP counts ranging from 2k to ~30k) so the total list of characters in each fasta comes out to about 180k. Apparently aligning 102 fasta files that each contain 180,000 bases is too much for any of the tools that I am currently using.
So can anyone suggest an alternate method? From looking at other posts, it looks like I oculd try to align Mugsy and use that to create a core genome, which I would then use as the reference to call SNPs from (the theory here being that I would have a lot fewer SNPs if I was calling form the core genome as opposed to the reference that I'm currently using). My only hesitation is that I'm not entirely sure how much this will cut down on the SNPs. Any other ideas? Perhaps it would make more sense to just try to filter out SNPs form the current list but I couldn't think of a good methodology for this.
Thanks.
Comment