Go Back   SEQanswers > General

Similar Threads
Thread Thread Starter Forum Replies Last Post
Genome Res De novo bacterial genome sequencing: millions of very short reads assembly b_seite Literature Watch 1 10-04-2017 11:26 PM
Genome size confirmation through genome assembly bioman1 Bioinformatics 3 04-23-2014 01:05 PM
Compare de-novo transcriptome assembly to genome reference guided assembly IdoBar Bioinformatics 1 04-04-2014 12:28 AM
Targeted Genome Assembly for region poorly represented in reference genome? gumbos Bioinformatics 1 01-09-2012 04:01 PM

Thread Tools
Old 06-24-2015, 08:25 AM   #1
Senior Member
Location: MA

Join Date: Oct 2010
Posts: 160
Default After genome assembly...

...what to do?
Hi, I've been working a little with a large genome assembly (250Mb). I have Illumina and PacBio reads using diffrent approaches. At the end, used SPAdes assembly with Illumina data and PBJelly for filling gaps with PacBio subreads. Decent assembly at the end.
But when I compare the assembly with some regions I have already sequenced, looks like the assembly can still be improved. Situations like this are found:

reference: ****************************************************************************************
contigs:    -------------------------------                              -----------------------------------
where there are overlapping regions of 70-75 nt with 100% identity.
Is there any post assembly processor which would be able to deal with situations like this?
cascoamarillo is offline   Reply With Quote
Old 06-24-2015, 09:23 AM   #2
Brian Bushnell
Super Moderator
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707

70bp overlap with 100% identity should not be joined, in general. That's too short and would cause rampant misassemblies if you instituted it as a general policy. Besides, do you know that the reference in this case is actually correct?

I could be wrong, but I thought that recent versions of PBJelly did try to join contigs that had sufficient support.

I wrote a program called Dedupe that will find all of these overlaps: in=contigs.fa am=t ac=f fo=t mo=70 ngn=f

This will find and print all the overlaps of at least 70bp with 100% identity (annotated by the overlap length, coordinates, and number of substitutions/edits). You can allow a fixed number of edits or substitutions in the overlap region with the "e=" or "s=" flag.

It won't merge anything, though it would be nice if someone wrote a post-processing program that used the overlap information to do merging. Still, 70bp is too short.
Brian Bushnell is offline   Reply With Quote

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

All times are GMT -8. The time now is 06:50 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO