I have recently finished assembling some metagenome sequences and after assigning function to my contigs I see that most genes belong to three specific types of microorganisms. I also submitted the unassembled short reads to MG-RAST to get an overview of functional genes. However when I look through the MG-RAST results the genes that are most abundant are not necessarily the same ones that dominate the assembled contigs. I was wondering why the two types of information wouldn't match?
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
The number of reads is based on the abundance of specific community members, while the number of contigs is based on the overall diversity of the community. If 99% of the organisms are one species of bacteria with gene X, then gene X might be the most abundant gene based on read mapping. But if there are 1000 other species in the community making up the other 1% of the population, and none of them have gene X but all of them have gene Y, then you might get 1000 different versions of gene Y contigs.
Also, sometimes the dominant organism does not assemble very well because it may have lots of different strains, which confuse the assembler.
-
Thank you for your response. In this case I know from the short read information that the sample is very diverse with many genes (x,y, z etc) with different functional roles and one set of genes with a specific functional role shows up as highly abundant. When I look at the assembled results, most the genes with functional roles are from three microorganisms. Why the difference between the number of functional genes with different roles in the short read analysis vs the assembled one?
Comment
-
Well... another possibility is that most of the metagenome simply didn't assemble do to low depth. Sometimes it can be useful to normalize the data prior to assembly, or use an iterative approach where you subsample, assemble, map to the assembly, then assemble the unmapped reads. Or use a different assembler. How did you do the assembly?
Comment
-
I used Velvet and Metavelvet to do the assemblies and kmergenie to find the coverage cut-off to use in the assembly. There were about 12 million reads that went into the assembly. My reads had lengths that were between 70-110bp long. I ended up using a coverage of 6, kmer length of 33 and insert length cut off of 400 plus I opted for scaffolding. This combination provided me with the longest contigs and N50 of 350bp which is the best that I could get.
Comment
-
Ahh... that's a very low coverage metagenome; I'm not surprised only the most abundant organisms assembled. Metagenomes (especially complex ones) are much harder to assemble than isolates, and thus have greater demands on data - high depth, long reads, low error-rates. You should probably try to get more data, or else try different metagenome assemblers such as Megahit.
Comment
Latest Articles
Collapse
-
by seqadmin
Spatial biology is an exciting field that encompasses a wide range of techniques and technologies aimed at mapping the organization and interactions of various biomolecules in their native environments. As this area of research progresses, new tools and methodologies are being introduced, accompanied by efforts to establish benchmarking standards and drive technological innovation.
3D Genomics
While spatial biology often involves studying proteins and RNAs in their...-
Channel: Articles
Yesterday, 07:30 PM -
-
by seqadmin
Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...-
Channel: Articles
12-16-2024, 07:57 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 12-30-2024, 01:35 PM
|
0 responses
26 views
0 likes
|
Last Post
by seqadmin
12-30-2024, 01:35 PM
|
||
Started by seqadmin, 12-17-2024, 10:28 AM
|
0 responses
41 views
0 likes
|
Last Post
by seqadmin
12-17-2024, 10:28 AM
|
||
Started by seqadmin, 12-13-2024, 08:24 AM
|
0 responses
55 views
0 likes
|
Last Post
by seqadmin
12-13-2024, 08:24 AM
|
||
Started by seqadmin, 12-12-2024, 07:41 AM
|
0 responses
41 views
0 likes
|
Last Post
by seqadmin
12-12-2024, 07:41 AM
|
Comment