Seqanswers Leaderboard Ad

**seb567** · 10-23-2013, 12:46 PM

Re: assemble multiple related metagenome samples

Originally posted by MadsAlbertsen View Post

In my lab we do loads of metagenome sequencing and assembly to recover complete genomes from environmental samples. To bin the genomes we use abundance information from multiple related samples and in general it is very easy to extract near complete genomes IF the assembly is decent.

The big problem is micro-diversity, e.g. closely related species, which prevents nice assemblies. The problematic similarity between species seem to be approximately 94-99% average nucleotide identity (dependent on kmer choice etc..). Anything else we can get decent assemblies from if we just have enough coverage.

Some metagenome assemblers try to deal with this during assembly (e.g. IDBA-UD). But I havn't really tried anything that has decent performance yet.

Do anyone know of de novo assemblers that would be able to use the abundance information from multiple related metagenome samples directly in the assembly stage?

Hello,

Why do you want to assemble multiple related metagenome samples together ?

If you look at Cortex ( http://www.nature.com/ng/journal/v44...l/ng.1028.html ),
each vertex in their de Bruijn subgraph has many coverage depth channels.

For metagenomics samples, one sample already contains a mix of different genomes in various abundances. Mixing these samples would lead, I believe, to a meta-metagenome (whatever that means).

If you want to assemble (and possibly profile) many samples individually,
you should try Ray (the "Ray Meta" workflow) for denovo genome assembly of metagenomes.

To download Ray (v2.2.0): http://denovoassembler.sourceforge.net/
See the paper also: http://genomebiology.com/2012/13/12/R122

Regarding multi-sample metagenomic analyses, we have a project in progress called
"Ray Surveyor" which basically build a distributed de Bruijn subgraph for many samples
like Cortex, but Ray engine is distributed (message passing) and this "Ray Surveyor" uses
the actor model (this is new too !).

Best luck to you in your research !

Originally posted by MadsAlbertsen View Post

Or alternatively something like khmer (https://khmer.readthedocs.org/en/latest/) that would allow read splitting prior to assembly - BUT using kmer abundance information from multiple samples?

rgds
Mads Albertsen

**MadsAlbertsen** · 10-24-2013, 02:07 AM

Thanks for the suggestions. I have been trying some of your Ray related projects

. Cortex seem to be in the direction I was thinking of.

Originally posted by seb567 View Post

Why do you want to assemble multiple related metagenome samples together ?

For metagenomics samples, one sample already contains a mix of different genomes in various abundances. Mixing these samples would lead, I believe, to a meta-metagenome (whatever that means).

I want to use multiple related metagenome samples to untangle individual species from metagenomes. I'm not interested in the metagenome as is. Only in extracting complete genomes.

It's already common to use abundance profiles for assembled scaffolds to do binning (scaffolds with similar abundance patterns originate from the same species) and also in clustering genes with similar expression profiles.

Hence, if the abundance information could be used in the assembly process directly it might lead to decent assembly of species with many closely related strains, which currently is impossible.

rgds
Mads

**seb567** · 10-25-2013, 07:59 AM

Originally posted by MadsAlbertsen View Post

Thanks for the suggestions. I have been trying some of your Ray related projects

. Cortex seem to be in the direction I was thinking of.

I want to use multiple related metagenome samples to untangle individual species from metagenomes. I'm not interested in the metagenome as is. Only in extracting complete genomes.

It's already common to use abundance profiles for assembled scaffolds to do binning (scaffolds with similar abundance patterns originate from the same species) and also in clustering genes with similar expression profiles.

Hence, if the abundance information could be used in the assembly process directly it might lead to decent assembly of species with many closely related strains, which currently is impossible.

rgds
Mads

That's a good idea, but the coverage depth for each sample kmers needs to normalized with the number of reads since sample A may have twice the number of reads present in sample B.

Coverage depth would be measured in X per reads, for example.

Topics	Statistics	Last Post
Evaluating Genome Sequencing for ECMO Patients in the NICU by seqadmin Started by seqadmin, 12-17-2024, 10:28 AM	0 responses 22 views 0 likes	Last Post by seqadmin 12-17-2024, 10:28 AM
New Genetic Toolkit Refines Studies on Gene Function and Disease by seqadmin Started by seqadmin, 12-13-2024, 08:24 AM	0 responses 42 views 0 likes	Last Post by seqadmin 12-13-2024, 08:24 AM
Study Links Brain Mechanism to Emotional Responses in Animals and Humans by seqadmin Started by seqadmin, 12-12-2024, 07:41 AM	0 responses 28 views 0 likes	Last Post by seqadmin 12-12-2024, 07:41 AM
Study Identifies Ribosomal RNA Fingerprints as Early Cancer Biomarkers by seqadmin Started by seqadmin, 12-11-2024, 07:45 AM	0 responses 42 views 0 likes	Last Post by seqadmin 12-11-2024, 07:45 AM

Seqanswers Leaderboard Ad

Announcement

De novo assembly utilizing abundance information from multiple samples

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News