SEQanswers Building a Circular de novo Assembler
 Register FAQ Members List Calendar Search Today's Posts Mark Forums Read

 Similar Threads Thread Thread Starter Forum Replies Last Post srividyanathan De novo discovery 2 05-28-2013 07:52 AM tmy1018 Bioinformatics 3 10-22-2012 08:31 AM grassgirl 454 Pyrosequencing 0 06-03-2011 11:40 AM corthay Bioinformatics 1 06-03-2010 05:07 AM doxologist De novo discovery 18 05-21-2010 05:55 AM

03-12-2014, 02:14 PM   #1
Geneious
Registered Vendor

Location: New Zealand

Join Date: Jul 2010
Posts: 22
Building a Circular de novo Assembler

Geneious developer Matt Kearse has written a blog about how he built the circular de novo assembler for the R7 Update. Hopefully some of you will find this an interesting read.

http://blog.geneious.com/blog/bid/37...novo-Assembler

Quote:
 There are two approaches I could have taken to add circular contig support. The simple approach is at the end of the assembly process to circularize any contigs whose ends look sufficiently similar. The second more complex approach is to allow contigs to circularize during the assembly process and still allow similar sequences and contigs to merge into the circular contigs later. This approach is more robust and more likely to produce correct results. For example if we have two related species present in a data set, the ends of the temporary linear contigs may be sufficiently similar to merge into a large incorrect linear contig. But if we circularize during the assembly process, instead of merging they'll correctly circularize first.

 03-13-2014, 05:32 AM #2 TiborNagy Senior Member   Location: Budapest Join Date: Mar 2010 Posts: 329 Looks interesting. But how can handle this algorithm more than one circular contigs? (For example a bacterial genome and it's plasmids.)
 03-13-2014, 12:31 PM #3 Matt Kearse Member   Location: New Zealand Join Date: Mar 2014 Posts: 20 The algorithm may produce multiple circular contigs as each contig may independently circularize. As a quick confirmation I downloaded a random sample of 100 viral genomes, 24 of which are circular. I generated simulated data from them all and mixed it all together. De novo assembly of this mixed data produced 106 contigs, 6 of them being tiny contigs consisting of reads with errors. The other 100 contigs produced matched the original genomes perfectly apart from a 2 bp uncertainty due to read errors in 1 genome. 77 contigs were linear and 23 were circular in keeping with the original genomes. 1 failed to circularize due to insufficient coverage.

 Tags circular, denovo assembly, plasmid