Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Assembled contigs vs short reads

    I have recently finished assembling some metagenome sequences and after assigning function to my contigs I see that most genes belong to three specific types of microorganisms. I also submitted the unassembled short reads to MG-RAST to get an overview of functional genes. However when I look through the MG-RAST results the genes that are most abundant are not necessarily the same ones that dominate the assembled contigs. I was wondering why the two types of information wouldn't match?

  • #2
    The number of reads is based on the abundance of specific community members, while the number of contigs is based on the overall diversity of the community. If 99% of the organisms are one species of bacteria with gene X, then gene X might be the most abundant gene based on read mapping. But if there are 1000 other species in the community making up the other 1% of the population, and none of them have gene X but all of them have gene Y, then you might get 1000 different versions of gene Y contigs.

    Also, sometimes the dominant organism does not assemble very well because it may have lots of different strains, which confuse the assembler.

    Comment


    • #3
      Thank you for your response. In this case I know from the short read information that the sample is very diverse with many genes (x,y, z etc) with different functional roles and one set of genes with a specific functional role shows up as highly abundant. When I look at the assembled results, most the genes with functional roles are from three microorganisms. Why the difference between the number of functional genes with different roles in the short read analysis vs the assembled one?

      Comment


      • #4
        Well... another possibility is that most of the metagenome simply didn't assemble do to low depth. Sometimes it can be useful to normalize the data prior to assembly, or use an iterative approach where you subsample, assemble, map to the assembly, then assemble the unmapped reads. Or use a different assembler. How did you do the assembly?

        Comment


        • #5
          I used Velvet and Metavelvet to do the assemblies and kmergenie to find the coverage cut-off to use in the assembly. There were about 12 million reads that went into the assembly. My reads had lengths that were between 70-110bp long. I ended up using a coverage of 6, kmer length of 33 and insert length cut off of 400 plus I opted for scaffolding. This combination provided me with the longest contigs and N50 of 350bp which is the best that I could get.

          Comment


          • #6
            Ahh... that's a very low coverage metagenome; I'm not surprised only the most abundant organisms assembled. Metagenomes (especially complex ones) are much harder to assemble than isolates, and thus have greater demands on data - high depth, long reads, low error-rates. You should probably try to get more data, or else try different metagenome assemblers such as Megahit.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM
            • seqadmin
              Techniques and Challenges in Conservation Genomics
              by seqadmin



              The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

              Avian Conservation
              Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
              03-08-2024, 10:41 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, Yesterday, 06:37 PM
            0 responses
            10 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, Yesterday, 06:07 PM
            0 responses
            9 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-22-2024, 10:03 AM
            0 responses
            51 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-21-2024, 07:32 AM
            0 responses
            67 views
            0 likes
            Last Post seqadmin  
            Working...
            X