Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • De novo assembly utilizing abundance information from multiple samples

    In my lab we do loads of metagenome sequencing and assembly to recover complete genomes from environmental samples. To bin the genomes we use abundance information from multiple related samples and in general it is very easy to extract near complete genomes IF the assembly is decent.

    The big problem is micro-diversity, e.g. closely related species, which prevents nice assemblies. The problematic similarity between species seem to be approximately 94-99% average nucleotide identity (dependent on kmer choice etc..). Anything else we can get decent assemblies from if we just have enough coverage.

    Some metagenome assemblers try to deal with this during assembly (e.g. IDBA-UD). But I havn't really tried anything that has decent performance yet.

    Do anyone know of de novo assemblers that would be able to use the abundance information from multiple related metagenome samples directly in the assembly stage?

    Or alternatively something like khmer (https://khmer.readthedocs.org/en/latest/) that would allow read splitting prior to assembly - BUT using kmer abundance information from multiple samples?

    rgds
    Mads Albertsen

  • #2
    Re: assemble multiple related metagenome samples

    Originally posted by MadsAlbertsen View Post
    In my lab we do loads of metagenome sequencing and assembly to recover complete genomes from environmental samples. To bin the genomes we use abundance information from multiple related samples and in general it is very easy to extract near complete genomes IF the assembly is decent.

    The big problem is micro-diversity, e.g. closely related species, which prevents nice assemblies. The problematic similarity between species seem to be approximately 94-99% average nucleotide identity (dependent on kmer choice etc..). Anything else we can get decent assemblies from if we just have enough coverage.

    Some metagenome assemblers try to deal with this during assembly (e.g. IDBA-UD). But I havn't really tried anything that has decent performance yet.

    Do anyone know of de novo assemblers that would be able to use the abundance information from multiple related metagenome samples directly in the assembly stage?
    Hello,


    Why do you want to assemble multiple related metagenome samples together ?

    If you look at Cortex ( http://www.nature.com/ng/journal/v44...l/ng.1028.html ),
    each vertex in their de Bruijn subgraph has many coverage depth channels.

    For metagenomics samples, one sample already contains a mix of different genomes in various abundances. Mixing these samples would lead, I believe, to a meta-metagenome (whatever that means).


    If you want to assemble (and possibly profile) many samples individually,
    you should try Ray (the "Ray Meta" workflow) for denovo genome assembly of metagenomes.

    To download Ray (v2.2.0): http://denovoassembler.sourceforge.net/
    See the paper also: http://genomebiology.com/2012/13/12/R122


    Regarding multi-sample metagenomic analyses, we have a project in progress called
    "Ray Surveyor" which basically build a distributed de Bruijn subgraph for many samples
    like Cortex, but Ray engine is distributed (message passing) and this "Ray Surveyor" uses
    the actor model (this is new too !).


    Best luck to you in your research !

    Originally posted by MadsAlbertsen View Post
    Or alternatively something like khmer (https://khmer.readthedocs.org/en/latest/) that would allow read splitting prior to assembly - BUT using kmer abundance information from multiple samples?

    rgds
    Mads Albertsen

    Comment


    • #3
      Thanks for the suggestions. I have been trying some of your Ray related projects . Cortex seem to be in the direction I was thinking of.

      Originally posted by seb567 View Post
      Why do you want to assemble multiple related metagenome samples together ?

      For metagenomics samples, one sample already contains a mix of different genomes in various abundances. Mixing these samples would lead, I believe, to a meta-metagenome (whatever that means).
      I want to use multiple related metagenome samples to untangle individual species from metagenomes. I'm not interested in the metagenome as is. Only in extracting complete genomes.

      It's already common to use abundance profiles for assembled scaffolds to do binning (scaffolds with similar abundance patterns originate from the same species) and also in clustering genes with similar expression profiles.

      Hence, if the abundance information could be used in the assembly process directly it might lead to decent assembly of species with many closely related strains, which currently is impossible.

      rgds
      Mads
      Last edited by MadsAlbertsen; 10-24-2013, 03:18 AM.

      Comment


      • #4
        Originally posted by MadsAlbertsen View Post
        Thanks for the suggestions. I have been trying some of your Ray related projects . Cortex seem to be in the direction I was thinking of.



        I want to use multiple related metagenome samples to untangle individual species from metagenomes. I'm not interested in the metagenome as is. Only in extracting complete genomes.

        It's already common to use abundance profiles for assembled scaffolds to do binning (scaffolds with similar abundance patterns originate from the same species) and also in clustering genes with similar expression profiles.

        Hence, if the abundance information could be used in the assembly process directly it might lead to decent assembly of species with many closely related strains, which currently is impossible.

        rgds
        Mads
        That's a good idea, but the coverage depth for each sample kmers needs to normalized with the number of reads since sample A may have twice the number of reads present in sample B.

        Coverage depth would be measured in X per reads, for example.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM
        • seqadmin
          Techniques and Challenges in Conservation Genomics
          by seqadmin



          The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

          Avian Conservation
          Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
          03-08-2024, 10:41 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Yesterday, 06:37 PM
        0 responses
        8 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, Yesterday, 06:07 PM
        0 responses
        8 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-22-2024, 10:03 AM
        0 responses
        49 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-21-2024, 07:32 AM
        0 responses
        66 views
        0 likes
        Last Post seqadmin  
        Working...
        X