SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
overlap between multiple de novo assembled samples boetsie Bioinformatics 1 12-13-2011 08:08 AM
Paired End Protocol for Very Low Abundance Samples Daytwa Illumina/Solexa 2 01-09-2011 01:13 PM
de novo assembly of micrbiome samples dattam RNA Sequencing 0 08-27-2010 08:26 AM

Reply
 
Thread Tools
Old 10-23-2013, 02:08 AM   #1
MadsAlbertsen
Member
 
Location: Denmark

Join Date: Aug 2010
Posts: 26
Default De novo assembly utilizing abundance information from multiple samples

In my lab we do loads of metagenome sequencing and assembly to recover complete genomes from environmental samples. To bin the genomes we use abundance information from multiple related samples and in general it is very easy to extract near complete genomes IF the assembly is decent.

The big problem is micro-diversity, e.g. closely related species, which prevents nice assemblies. The problematic similarity between species seem to be approximately 94-99% average nucleotide identity (dependent on kmer choice etc..). Anything else we can get decent assemblies from if we just have enough coverage.

Some metagenome assemblers try to deal with this during assembly (e.g. IDBA-UD). But I havn't really tried anything that has decent performance yet.

Do anyone know of de novo assemblers that would be able to use the abundance information from multiple related metagenome samples directly in the assembly stage?

Or alternatively something like khmer (https://khmer.readthedocs.org/en/latest/) that would allow read splitting prior to assembly - BUT using kmer abundance information from multiple samples?

rgds
Mads Albertsen
MadsAlbertsen is offline   Reply With Quote
Old 10-23-2013, 12:46 PM   #2
seb567
Senior Member
 
Location: Québec, Canada

Join Date: Jul 2008
Posts: 260
Default Re: assemble multiple related metagenome samples

Quote:
Originally Posted by MadsAlbertsen View Post
In my lab we do loads of metagenome sequencing and assembly to recover complete genomes from environmental samples. To bin the genomes we use abundance information from multiple related samples and in general it is very easy to extract near complete genomes IF the assembly is decent.

The big problem is micro-diversity, e.g. closely related species, which prevents nice assemblies. The problematic similarity between species seem to be approximately 94-99% average nucleotide identity (dependent on kmer choice etc..). Anything else we can get decent assemblies from if we just have enough coverage.

Some metagenome assemblers try to deal with this during assembly (e.g. IDBA-UD). But I havn't really tried anything that has decent performance yet.

Do anyone know of de novo assemblers that would be able to use the abundance information from multiple related metagenome samples directly in the assembly stage?
Hello,


Why do you want to assemble multiple related metagenome samples together ?

If you look at Cortex ( http://www.nature.com/ng/journal/v44...l/ng.1028.html ),
each vertex in their de Bruijn subgraph has many coverage depth channels.

For metagenomics samples, one sample already contains a mix of different genomes in various abundances. Mixing these samples would lead, I believe, to a meta-metagenome (whatever that means).


If you want to assemble (and possibly profile) many samples individually,
you should try Ray (the "Ray Meta" workflow) for denovo genome assembly of metagenomes.

To download Ray (v2.2.0): http://denovoassembler.sourceforge.net/
See the paper also: http://genomebiology.com/2012/13/12/R122


Regarding multi-sample metagenomic analyses, we have a project in progress called
"Ray Surveyor" which basically build a distributed de Bruijn subgraph for many samples
like Cortex, but Ray engine is distributed (message passing) and this "Ray Surveyor" uses
the actor model (this is new too !).


Best luck to you in your research !

Quote:
Originally Posted by MadsAlbertsen View Post
Or alternatively something like khmer (https://khmer.readthedocs.org/en/latest/) that would allow read splitting prior to assembly - BUT using kmer abundance information from multiple samples?

rgds
Mads Albertsen
seb567 is offline   Reply With Quote
Old 10-24-2013, 02:07 AM   #3
MadsAlbertsen
Member
 
Location: Denmark

Join Date: Aug 2010
Posts: 26
Default

Thanks for the suggestions. I have been trying some of your Ray related projects . Cortex seem to be in the direction I was thinking of.

Quote:
Originally Posted by seb567 View Post
Why do you want to assemble multiple related metagenome samples together ?

For metagenomics samples, one sample already contains a mix of different genomes in various abundances. Mixing these samples would lead, I believe, to a meta-metagenome (whatever that means).
I want to use multiple related metagenome samples to untangle individual species from metagenomes. I'm not interested in the metagenome as is. Only in extracting complete genomes.

It's already common to use abundance profiles for assembled scaffolds to do binning (scaffolds with similar abundance patterns originate from the same species) and also in clustering genes with similar expression profiles.

Hence, if the abundance information could be used in the assembly process directly it might lead to decent assembly of species with many closely related strains, which currently is impossible.

rgds
Mads

Last edited by MadsAlbertsen; 10-24-2013 at 03:18 AM.
MadsAlbertsen is offline   Reply With Quote
Old 10-25-2013, 07:59 AM   #4
seb567
Senior Member
 
Location: Québec, Canada

Join Date: Jul 2008
Posts: 260
Default

Quote:
Originally Posted by MadsAlbertsen View Post
Thanks for the suggestions. I have been trying some of your Ray related projects . Cortex seem to be in the direction I was thinking of.



I want to use multiple related metagenome samples to untangle individual species from metagenomes. I'm not interested in the metagenome as is. Only in extracting complete genomes.

It's already common to use abundance profiles for assembled scaffolds to do binning (scaffolds with similar abundance patterns originate from the same species) and also in clustering genes with similar expression profiles.

Hence, if the abundance information could be used in the assembly process directly it might lead to decent assembly of species with many closely related strains, which currently is impossible.

rgds
Mads
That's a good idea, but the coverage depth for each sample kmers needs to normalized with the number of reads since sample A may have twice the number of reads present in sample B.

Coverage depth would be measured in X per reads, for example.
seb567 is offline   Reply With Quote
Reply

Tags
assembly metagenome

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 08:18 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO