I am in a project working with a 120 MB eukaryote genome. We have tried a number of assemblers on our 454 data (Mira, Abyss, different versions of Newbler, Celera, Arachne, and possibly some more, haven't done these myself) and get drastically different N50 values depending on the assembler used (anything from 40k to 90k). The thing is, that we would like to compare these assemblies to:
1) see what it is that the assemblers have done differently
2) find possible misassemblies
For example, can we trust the assemblies from the assembler that gives us the largest N50 (Celera) or are these large contigs due to misassemblies?
This cannot be an uncommon situation, so I am wondering how you experienced bioinformaticians out there go about these comparisons, or if you like, quality assessments.
Any/all help is much appreciated, thanks.
1) see what it is that the assemblers have done differently
2) find possible misassemblies
For example, can we trust the assemblies from the assembler that gives us the largest N50 (Celera) or are these large contigs due to misassemblies?
This cannot be an uncommon situation, so I am wondering how you experienced bioinformaticians out there go about these comparisons, or if you like, quality assessments.
Any/all help is much appreciated, thanks.