Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • De novo assembly utilizing abundance information from multiple samples

    In my lab we do loads of metagenome sequencing and assembly to recover complete genomes from environmental samples. To bin the genomes we use abundance information from multiple related samples and in general it is very easy to extract near complete genomes IF the assembly is decent.

    The big problem is micro-diversity, e.g. closely related species, which prevents nice assemblies. The problematic similarity between species seem to be approximately 94-99% average nucleotide identity (dependent on kmer choice etc..). Anything else we can get decent assemblies from if we just have enough coverage.

    Some metagenome assemblers try to deal with this during assembly (e.g. IDBA-UD). But I havn't really tried anything that has decent performance yet.

    Do anyone know of de novo assemblers that would be able to use the abundance information from multiple related metagenome samples directly in the assembly stage?

    Or alternatively something like khmer (https://khmer.readthedocs.org/en/latest/) that would allow read splitting prior to assembly - BUT using kmer abundance information from multiple samples?

    rgds
    Mads Albertsen

  • #2
    Re: assemble multiple related metagenome samples

    Originally posted by MadsAlbertsen View Post
    In my lab we do loads of metagenome sequencing and assembly to recover complete genomes from environmental samples. To bin the genomes we use abundance information from multiple related samples and in general it is very easy to extract near complete genomes IF the assembly is decent.

    The big problem is micro-diversity, e.g. closely related species, which prevents nice assemblies. The problematic similarity between species seem to be approximately 94-99% average nucleotide identity (dependent on kmer choice etc..). Anything else we can get decent assemblies from if we just have enough coverage.

    Some metagenome assemblers try to deal with this during assembly (e.g. IDBA-UD). But I havn't really tried anything that has decent performance yet.

    Do anyone know of de novo assemblers that would be able to use the abundance information from multiple related metagenome samples directly in the assembly stage?
    Hello,


    Why do you want to assemble multiple related metagenome samples together ?

    If you look at Cortex ( http://www.nature.com/ng/journal/v44...l/ng.1028.html ),
    each vertex in their de Bruijn subgraph has many coverage depth channels.

    For metagenomics samples, one sample already contains a mix of different genomes in various abundances. Mixing these samples would lead, I believe, to a meta-metagenome (whatever that means).


    If you want to assemble (and possibly profile) many samples individually,
    you should try Ray (the "Ray Meta" workflow) for denovo genome assembly of metagenomes.

    To download Ray (v2.2.0): http://denovoassembler.sourceforge.net/
    See the paper also: http://genomebiology.com/2012/13/12/R122


    Regarding multi-sample metagenomic analyses, we have a project in progress called
    "Ray Surveyor" which basically build a distributed de Bruijn subgraph for many samples
    like Cortex, but Ray engine is distributed (message passing) and this "Ray Surveyor" uses
    the actor model (this is new too !).


    Best luck to you in your research !

    Originally posted by MadsAlbertsen View Post
    Or alternatively something like khmer (https://khmer.readthedocs.org/en/latest/) that would allow read splitting prior to assembly - BUT using kmer abundance information from multiple samples?

    rgds
    Mads Albertsen

    Comment


    • #3
      Thanks for the suggestions. I have been trying some of your Ray related projects . Cortex seem to be in the direction I was thinking of.

      Originally posted by seb567 View Post
      Why do you want to assemble multiple related metagenome samples together ?

      For metagenomics samples, one sample already contains a mix of different genomes in various abundances. Mixing these samples would lead, I believe, to a meta-metagenome (whatever that means).
      I want to use multiple related metagenome samples to untangle individual species from metagenomes. I'm not interested in the metagenome as is. Only in extracting complete genomes.

      It's already common to use abundance profiles for assembled scaffolds to do binning (scaffolds with similar abundance patterns originate from the same species) and also in clustering genes with similar expression profiles.

      Hence, if the abundance information could be used in the assembly process directly it might lead to decent assembly of species with many closely related strains, which currently is impossible.

      rgds
      Mads
      Last edited by MadsAlbertsen; 10-24-2013, 03:18 AM.

      Comment


      • #4
        Originally posted by MadsAlbertsen View Post
        Thanks for the suggestions. I have been trying some of your Ray related projects . Cortex seem to be in the direction I was thinking of.



        I want to use multiple related metagenome samples to untangle individual species from metagenomes. I'm not interested in the metagenome as is. Only in extracting complete genomes.

        It's already common to use abundance profiles for assembled scaffolds to do binning (scaffolds with similar abundance patterns originate from the same species) and also in clustering genes with similar expression profiles.

        Hence, if the abundance information could be used in the assembly process directly it might lead to decent assembly of species with many closely related strains, which currently is impossible.

        rgds
        Mads
        That's a good idea, but the coverage depth for each sample kmers needs to normalized with the number of reads since sample A may have twice the number of reads present in sample B.

        Coverage depth would be measured in X per reads, for example.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Advancing Precision Medicine for Rare Diseases in Children
          by seqadmin




          Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
          12-16-2024, 07:57 AM
        • seqadmin
          Recent Advances in Sequencing Technologies
          by seqadmin



          Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

          Long-Read Sequencing
          Long-read sequencing has seen remarkable advancements,...
          12-02-2024, 01:49 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 12-17-2024, 10:28 AM
        0 responses
        22 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 12-13-2024, 08:24 AM
        0 responses
        42 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 12-12-2024, 07:41 AM
        0 responses
        28 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 12-11-2024, 07:45 AM
        0 responses
        42 views
        0 likes
        Last Post seqadmin  
        Working...
        X