Building a Circular de novo Assembler

Geneious developer Matt Kearse has written a blog about how he built the circular de novo assembler for the R7 Update. Hopefully some of you will find this an interesting read.

There are two approaches I could have taken to add circular contig support. The simple approach is at the end of the assembly process to circularize any contigs whose ends look sufficiently similar. The second more complex approach is to allow contigs to circularize during the assembly process and still allow similar sequences and contigs to merge into the circular contigs later. This approach is more robust and more likely to produce correct results. For example if we have two related species present in a data set, the ends of the temporary linear contigs may be sufficiently similar to merge into a large incorrect linear contig. But if we circularize during the assembly process, instead of merging they'll correctly circularize first.
