SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Low mapping percentage of reads on assembled contigs morning latte Bioinformatics 20 03-23-2014 09:01 AM
Assembled contigs to proteins vishwesh Bioinformatics 1 02-04-2014 04:01 AM
contigs to fully assembled genome hhlee De novo discovery 4 05-31-2012 12:08 PM
PubMed: Local De Novo Assembly of RAD Paired-End Contigs Using Short Sequencing Reads Newsbot! Literature Watch 0 05-05-2011 11:40 PM
Assembling De Novo 454 Transcriptome Contigs and Singletons with Illumina Short Reads Vickenstein Bioinformatics 7 03-05-2011 12:43 AM

Reply
 
Thread Tools
Old 01-30-2015, 04:03 AM   #1
vanillasky
Member
 
Location: Europe

Join Date: Mar 2014
Posts: 42
Default Assembled contigs vs short reads

I have recently finished assembling some metagenome sequences and after assigning function to my contigs I see that most genes belong to three specific types of microorganisms. I also submitted the unassembled short reads to MG-RAST to get an overview of functional genes. However when I look through the MG-RAST results the genes that are most abundant are not necessarily the same ones that dominate the assembled contigs. I was wondering why the two types of information wouldn't match?
vanillasky is offline   Reply With Quote
Old 01-30-2015, 08:49 AM   #2
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

The number of reads is based on the abundance of specific community members, while the number of contigs is based on the overall diversity of the community. If 99% of the organisms are one species of bacteria with gene X, then gene X might be the most abundant gene based on read mapping. But if there are 1000 other species in the community making up the other 1% of the population, and none of them have gene X but all of them have gene Y, then you might get 1000 different versions of gene Y contigs.

Also, sometimes the dominant organism does not assemble very well because it may have lots of different strains, which confuse the assembler.
Brian Bushnell is offline   Reply With Quote
Old 02-03-2015, 03:57 AM   #3
vanillasky
Member
 
Location: Europe

Join Date: Mar 2014
Posts: 42
Default

Thank you for your response. In this case I know from the short read information that the sample is very diverse with many genes (x,y, z etc) with different functional roles and one set of genes with a specific functional role shows up as highly abundant. When I look at the assembled results, most the genes with functional roles are from three microorganisms. Why the difference between the number of functional genes with different roles in the short read analysis vs the assembled one?
vanillasky is offline   Reply With Quote
Old 02-03-2015, 09:51 AM   #4
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

Well... another possibility is that most of the metagenome simply didn't assemble do to low depth. Sometimes it can be useful to normalize the data prior to assembly, or use an iterative approach where you subsample, assemble, map to the assembly, then assemble the unmapped reads. Or use a different assembler. How did you do the assembly?
Brian Bushnell is offline   Reply With Quote
Old 02-04-2015, 09:27 AM   #5
vanillasky
Member
 
Location: Europe

Join Date: Mar 2014
Posts: 42
Default

I used Velvet and Metavelvet to do the assemblies and kmergenie to find the coverage cut-off to use in the assembly. There were about 12 million reads that went into the assembly. My reads had lengths that were between 70-110bp long. I ended up using a coverage of 6, kmer length of 33 and insert length cut off of 400 plus I opted for scaffolding. This combination provided me with the longest contigs and N50 of 350bp which is the best that I could get.
vanillasky is offline   Reply With Quote
Old 02-04-2015, 01:05 PM   #6
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

Ahh... that's a very low coverage metagenome; I'm not surprised only the most abundant organisms assembled. Metagenomes (especially complex ones) are much harder to assemble than isolates, and thus have greater demands on data - high depth, long reads, low error-rates. You should probably try to get more data, or else try different metagenome assemblers such as Megahit.
Brian Bushnell is offline   Reply With Quote
Reply

Tags
assembly, contigs, metagenome assembly, metagenomes

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 01:15 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO