SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   Vendor Forum (http://seqanswers.com/forums/forumdisplay.php?f=30)
-   -   Building a Circular de novo Assembler (http://seqanswers.com/forums/showthread.php?t=41608)

Geneious 03-12-2014 02:14 PM

Building a Circular de novo Assembler
 
Geneious developer Matt Kearse has written a blog about how he built the circular de novo assembler for the R7 Update. Hopefully some of you will find this an interesting read.

http://blog.geneious.com/blog/bid/37...novo-Assembler

Quote:

There are two approaches I could have taken to add circular contig support. The simple approach is at the end of the assembly process to circularize any contigs whose ends look sufficiently similar. The second more complex approach is to allow contigs to circularize during the assembly process and still allow similar sequences and contigs to merge into the circular contigs later. This approach is more robust and more likely to produce correct results. For example if we have two related species present in a data set, the ends of the temporary linear contigs may be sufficiently similar to merge into a large incorrect linear contig. But if we circularize during the assembly process, instead of merging they'll correctly circularize first.

TiborNagy 03-13-2014 05:32 AM

Looks interesting. But how can handle this algorithm more than one circular contigs? (For example a bacterial genome and it's plasmids.)

Matt Kearse 03-13-2014 12:31 PM

The algorithm may produce multiple circular contigs as each contig may independently circularize.

As a quick confirmation I downloaded a random sample of 100 viral genomes, 24 of which are circular. I generated simulated data from them all and mixed it all together.

De novo assembly of this mixed data produced 106 contigs, 6 of them being tiny contigs consisting of reads with errors. The other 100 contigs produced matched the original genomes perfectly apart from a 2 bp uncertainty due to read errors in 1 genome. 77 contigs were linear and 23 were circular in keeping with the original genomes. 1 failed to circularize due to insufficient coverage.


All times are GMT -8. The time now is 04:57 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.