Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Assembled contigs vs short reads

    I have recently finished assembling some metagenome sequences and after assigning function to my contigs I see that most genes belong to three specific types of microorganisms. I also submitted the unassembled short reads to MG-RAST to get an overview of functional genes. However when I look through the MG-RAST results the genes that are most abundant are not necessarily the same ones that dominate the assembled contigs. I was wondering why the two types of information wouldn't match?

  • #2
    The number of reads is based on the abundance of specific community members, while the number of contigs is based on the overall diversity of the community. If 99% of the organisms are one species of bacteria with gene X, then gene X might be the most abundant gene based on read mapping. But if there are 1000 other species in the community making up the other 1% of the population, and none of them have gene X but all of them have gene Y, then you might get 1000 different versions of gene Y contigs.

    Also, sometimes the dominant organism does not assemble very well because it may have lots of different strains, which confuse the assembler.

    Comment


    • #3
      Thank you for your response. In this case I know from the short read information that the sample is very diverse with many genes (x,y, z etc) with different functional roles and one set of genes with a specific functional role shows up as highly abundant. When I look at the assembled results, most the genes with functional roles are from three microorganisms. Why the difference between the number of functional genes with different roles in the short read analysis vs the assembled one?

      Comment


      • #4
        Well... another possibility is that most of the metagenome simply didn't assemble do to low depth. Sometimes it can be useful to normalize the data prior to assembly, or use an iterative approach where you subsample, assemble, map to the assembly, then assemble the unmapped reads. Or use a different assembler. How did you do the assembly?

        Comment


        • #5
          I used Velvet and Metavelvet to do the assemblies and kmergenie to find the coverage cut-off to use in the assembly. There were about 12 million reads that went into the assembly. My reads had lengths that were between 70-110bp long. I ended up using a coverage of 6, kmer length of 33 and insert length cut off of 400 plus I opted for scaffolding. This combination provided me with the longest contigs and N50 of 350bp which is the best that I could get.

          Comment


          • #6
            Ahh... that's a very low coverage metagenome; I'm not surprised only the most abundant organisms assembled. Metagenomes (especially complex ones) are much harder to assemble than isolates, and thus have greater demands on data - high depth, long reads, low error-rates. You should probably try to get more data, or else try different metagenome assemblers such as Megahit.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Recent Innovations in Spatial Biology
              by seqadmin


              Spatial biology is an exciting field that encompasses a wide range of techniques and technologies aimed at mapping the organization and interactions of various biomolecules in their native environments. As this area of research progresses, new tools and methodologies are being introduced, accompanied by efforts to establish benchmarking standards and drive technological innovation.

              3D Genomics
              While spatial biology often involves studying proteins and RNAs in their...
              Yesterday, 07:30 PM
            • seqadmin
              Advancing Precision Medicine for Rare Diseases in Children
              by seqadmin




              Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
              12-16-2024, 07:57 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 12-30-2024, 01:35 PM
            0 responses
            26 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 12-17-2024, 10:28 AM
            0 responses
            41 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 12-13-2024, 08:24 AM
            0 responses
            55 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 12-12-2024, 07:41 AM
            0 responses
            41 views
            0 likes
            Last Post seqadmin  
            Working...
            X