In my lab we do loads of metagenome sequencing and assembly to recover complete genomes from environmental samples. To bin the genomes we use abundance information from multiple related samples and in general it is very easy to extract near complete genomes IF the assembly is decent.
The big problem is micro-diversity, e.g. closely related species, which prevents nice assemblies. The problematic similarity between species seem to be approximately 94-99% average nucleotide identity (dependent on kmer choice etc..). Anything else we can get decent assemblies from if we just have enough coverage.
Some metagenome assemblers try to deal with this during assembly (e.g. IDBA-UD). But I havn't really tried anything that has decent performance yet.
Do anyone know of de novo assemblers that would be able to use the abundance information from multiple related metagenome samples directly in the assembly stage?
Or alternatively something like khmer (https://khmer.readthedocs.org/en/latest/) that would allow read splitting prior to assembly - BUT using kmer abundance information from multiple samples?
rgds
Mads Albertsen
The big problem is micro-diversity, e.g. closely related species, which prevents nice assemblies. The problematic similarity between species seem to be approximately 94-99% average nucleotide identity (dependent on kmer choice etc..). Anything else we can get decent assemblies from if we just have enough coverage.
Some metagenome assemblers try to deal with this during assembly (e.g. IDBA-UD). But I havn't really tried anything that has decent performance yet.
Do anyone know of de novo assemblers that would be able to use the abundance information from multiple related metagenome samples directly in the assembly stage?
Or alternatively something like khmer (https://khmer.readthedocs.org/en/latest/) that would allow read splitting prior to assembly - BUT using kmer abundance information from multiple samples?
rgds
Mads Albertsen
Comment