SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
The best genome de novo assembly software using hybrid data (Illumina, 454 & Sanger)? Godevil De novo discovery 36 08-01-2012 02:25 AM
Hybrid assembly of PacBio and Illumina allo Bioinformatics 3 05-01-2012 05:27 AM
illumina/454 de novo hybrid cDNA assembly with newbler2.6 Seqasaurus Bioinformatics 2 01-23-2012 08:20 AM
PubMed: Rapid hybrid de novo assembly of a microbial genome using only short reads: C Newsbot! Literature Watch 0 10-19-2011 11:40 PM
De novo hybrid assembly of 454/illumina : CLC workbench Bardj Bioinformatics 1 11-21-2010 04:14 PM

Reply
 
Thread Tools
Old 08-05-2008, 04:30 PM   #1
glacerda
Member
 
Location: Brazil

Join Date: Aug 2008
Posts: 27
Default de novo hybrid assembly

Hi you all!

I'm trying to build a pipeline for hybrid (454/sanger/solexa) de novo assemblies.

I thought of:

velvet: to assemble solexa
newbler: to assemble 454 (since newbler assembles at the flowgram level, I think it is the most accurate for 454)
cap3: to assemble sanger + newbler contigs + velvet contigs

Velvet assembles Streptococcus suis solexa reads well, but it performed poorly (N50 decreases from 5K to 4K) when I included 1.5X sanger reads.

What do you think is the best approach for de novo hybrid assemblies?
is there any software (besides Mira) that claims to do true hybrid assemblies?

Cheers

glacerda
glacerda is offline   Reply With Quote
Old 08-05-2008, 07:38 PM   #2
Torst
Senior Member
 
Location: The University of Melbourne, AUSTRALIA

Join Date: Apr 2008
Posts: 275
Default

I concur that velvet works well for Solexa data but goes a bit off when incorporating long (Sanger) reads.

Newbler does seem best at handling 454 data. Other assemblers (phrap, cap3) tend to not handle the homopolymeric runs very well, or maybe they just have bias toward Sanger like sequence projects.

We have done some testing of using phrap to assemble 454 CONTIGS (from Newbler) with Sanger paired end and primer walk reads. The results are mixed - I had to reduce the stringency of phrap via many of the never-touched parameters to get it to do better than the default.

I have yet to try cap3. I am interested as it claims to use paired end data properly, unlike phrap which seems to only use it to check consistency rather than use the pair insert size sensibly.

Mira looks like a powerful solution, but with power comes complexity, which makes it difficult to master. Haven't successfully done any hybrid assembly with it yet.

As an aside, beware of using N50 as your quality metric. For example, joining two contigs inappropriately will increase your N50 but your assembly is worse! I guess a metric which includes a reference sequence would be more meaningful (if you have a reference).
Torst is offline   Reply With Quote
Old 08-05-2008, 08:19 PM   #3
glacerda
Member
 
Location: Brazil

Join Date: Aug 2008
Posts: 27
Default

Thanks for sharing your experience. I have also tried Phrap to assemble 454 contigs (newbler) + sanger single end reads. I have tried different parameters, changed the quality scores for 454 contigs, broke the 454 contigs into sanger-sized pseudoreads and many other attempts but I couldn't produce a good assembly. Phrap produced chimeric contigs, merged repeats, broke contigs apart and has produced many singletons. Besides that, the sum of contigs sizes was between 10% and 20% greater than the genome size for many attempts.

The best software we could find to assemble Sanger + 454 is cap3. For this, we used the 454 contigs(newbler) without changing the quality scores and added the sanger reads. We have used -o 100 -p 90 -t 200000 (We have found that it is very important to increase -t if there are many reads, default value results in many singlets that overlap the contigs). Tweaking the gap penalties may be a good idea to use the sanger data to correct homopolymeric regions. The final assembly looks good but it is clearly far from perfect.

To validate the assembly when I have a reference genome, I agree that N50 is not the only quality metric we have to look at. I have been using nucmer+show-tiling (Mummer package) to build a tiling path of contigs in the reference genome. After taht, look for contigs that aren't 100% aligned to the genome and those ones that aren't aligned at all). But this is a manual and subjective inspection and it would be better if there were good metrics.
glacerda is offline   Reply With Quote
Old 08-05-2008, 11:20 PM   #4
sklages
Senior Member
 
Location: Berlin, DE

Join Date: May 2008
Posts: 628
Default

Hi,

for sanger / 454 hybrid assemblies we are successfully using Celera Assembler (http://wgs-assembler.sourceforge.net).
For our bacterial genome projects (3-12Mb) it is doing a far better job than phrap does (the "old" phrap).
sklages is offline   Reply With Quote
Old 08-06-2008, 03:02 AM   #5
tjahns
Junior Member
 
Location: Hamburg, Germany

Join Date: Jul 2008
Posts: 9
Default

For merging assemblies you might want to consider minimus2 from the amos package.
tjahns is offline   Reply With Quote
Old 08-06-2008, 04:47 PM   #6
mchaisso
Member
 
Location: Seattle, WA

Join Date: Apr 2008
Posts: 84
Default

EULER-SR handles all types of reads, and we're getting a lot of questions about doing this. Right now the process requires a lot of manual intervention, but some time this week I'll create a pipeline for assembling them without too much outside work.

cheers,
-mark

Quote:
Originally Posted by glacerda View Post
Hi you all!

I'm trying to build a pipeline for hybrid (454/sanger/solexa) de novo assemblies.

I thought of:

velvet: to assemble solexa
newbler: to assemble 454 (since newbler assembles at the flowgram level, I think it is the most accurate for 454)
cap3: to assemble sanger + newbler contigs + velvet contigs

Velvet assembles Streptococcus suis solexa reads well, but it performed poorly (N50 decreases from 5K to 4K) when I included 1.5X sanger reads.

What do you think is the best approach for de novo hybrid assemblies?
is there any software (besides Mira) that claims to do true hybrid assemblies?

Cheers

glacerda
mchaisso is offline   Reply With Quote
Old 08-06-2008, 09:52 PM   #7
glacerda
Member
 
Location: Brazil

Join Date: Aug 2008
Posts: 27
Default

Thank you Mark, I will try this new pipeline as soon as it's released!

About comparing assemblies to a reference genome, I found a great utility in the Mummer package called dnadiff. It is a perl script script and it has many bugs, but they are easy to identify and correct (incorrect number of SNPs and breakpoints).

For validating without a reference, I think that amosvalidate is the best tool. However, converting assemblies to amos format can lead to loss of information.
glacerda is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 01:23 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO