SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Genome Res De novo bacterial genome sequencing: millions of very short reads assembly b_seite Literature Watch 1 10-04-2017 11:26 PM
Cleanup and de novo assembly of a 2.9 Gb genome stvos Bioinformatics 0 08-01-2011 01:11 AM
The quality control index or parameter of the whole genome de novo sequencing wave001 Bioinformatics 0 06-25-2010 08:28 AM
Illumina de novo assembly with quality values Peter Bjarke Olsen Bioinformatics 2 06-21-2010 02:11 AM
The sequence and de novo assembly of the giant panda genome dan Literature Watch 0 12-21-2009 01:12 AM

Reply
 
Thread Tools
Old 01-11-2011, 06:50 AM   #1
rwness
Junior Member
 
Location: Edinburgh, Scotland

Join Date: Mar 2010
Posts: 6
Default Assessing quality and accuracy of de novo genome assembly

All,
I am curious whether anyone out there has a method for assessing the quality and accuracy of de novo genome assemblies? I am currently doing in silico simulations of de novo genome assembly from a previously sequenced genome to determine the best assembly parameters (K-mer size, coverage cutoff etc) and optimal dataset (mate pair library size, coverage etc). The ultimate goal will be to use these parameters to assemble the genome of a related species, de novo.

However, the difficulty is that after simulating the data and making a de novo assembly I don't know of any statistics or methods to compare the assembled contigs back to original sequence that they were simulated from. This requires two steps
(1) align assembled contigs to reference genome
(2) assess the fit

People often optimize N50, assembly size, contig number and other length-based measurements - but this only makes for bigger and bigger contigs and there is little information about whether these contigs are accurate. I have been using BLAST to compare the contigs to the reference and asking how well they fit, how long the alignments are and how many mis-assembled contigs there are. If anyone has ideas or methods for assessing the accuracy ( or overall similarity of an assembly and a genome) I would be grateful to hear about it. - Rob
rwness is offline   Reply With Quote
Old 01-11-2011, 07:34 AM   #2
nickloman
Senior Member
 
Location: Birmingham, UK

Join Date: Jul 2009
Posts: 356
Default

Good questions - I don't think there is a simple or single answer at this point.

This paper suggests a potential metric
http://genome.cshlp.org/content/20/5/675.full

You might be interested in the recently announced Assemblathon
http://assemblathon.org/
nickloman is offline   Reply With Quote
Old 01-11-2011, 08:34 AM   #3
bckirkup
Member
 
Location: Washington DC

Join Date: Jan 2011
Posts: 17
Default To check assemblies...

... you need an external truth. We are using 'optical mapping' from OpGen, for example. Any sort of physical map or some kinds of PCR can be used, however, depending on the size of the genomes. You can't rely on other genomes.

For some microbes, things like skew can help you get a sense of whether your assemblies seem wrong, but they aren't necessarily a solid confirmation one way or the other.
bckirkup is offline   Reply With Quote
Old 01-11-2011, 02:15 PM   #4
rwness
Junior Member
 
Location: Edinburgh, Scotland

Join Date: Mar 2010
Posts: 6
Default

Quote:
Originally Posted by bckirkup View Post
... you need an external truth. We are using 'optical mapping' from OpGen, for example. Any sort of physical map or some kinds of PCR can be used, however, depending on the size of the genomes. You can't rely on other genomes.

For some microbes, things like skew can help you get a sense of whether your assemblies seem wrong, but they aren't necessarily a solid confirmation one way or the other.
Sorry I think I was unclear - what I am doing is simulating short read data from a species with a sequenced genome. Then trying to do de novo assembly of that simulated data. Then I want to compare my assemblies to the original genome to see how well the assemblies performed. It is similar to just testing the quality of short read genome assemblers. However, my ultimate goal is to use what I have learned in this species to apply to de novo assembly of another, related species with NO previously sequenced genome. I am trying to find a good method/metric to assess the quality of those simulated assemblies.
Nickloman's comments are helpful, but I haven't yet been able to read the first paper that was recommended.
rwness is offline   Reply With Quote
Old 01-12-2011, 04:24 AM   #5
flxlex
Moderator
 
Location: Oslo, Norway

Join Date: Nov 2008
Posts: 415
Default

This paper (interesting read otherwise) describes some metrics, I guess there will be other papers with yet more ways of getting quality metrics. Feels like we should develop a consensus...
flxlex is offline   Reply With Quote
Old 01-31-2011, 03:13 PM   #6
azroger
Junior Member
 
Location: Tucson, AZ

Join Date: Oct 2010
Posts: 7
Default

I don't think there is any one metric for genome assembly quality. Obviously, size matters, but so does representation (how much of the genome sequence is actually covered), mismatch rate, indel rate, and misassembly rate. All these quality metrics are derived relatively easily for a known standard genome, when you use simulated reads extracted from it. Check out www.plantagora.org, for more information on the whole question. That's what it's focus is - simulated read assembly to evaluate a lot of different sequencing and assembly approaches. The project uses a long list of metrics for evaluation of the assemblies (so you can decide which are most important to you).
azroger is offline   Reply With Quote
Reply

Tags
assembly accuracy, assembly quality, de novo assembly, genome alignment

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:59 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO