Unconfigured Ad

**severin** · 12-16-2011, 09:44 AM

Other Assembly programs

Ray can also handle the assembly of multiple formats.

**Ole** · 12-17-2011, 07:08 AM

You could try MSR-CA (http://www.genome.umd.edu/SR_CA_MANUAL.htm, the source code is here: ftp://ftp.genome.umd.edu/pub/MSR-CA/) too, if you get it up and running properly. I haven't managed to get it run properly on my complete dataset yet, it seems to have a couple of bottlenecks or weirdly designed code. I ran into memory problems with 1.3.3, and 1.4b have some perl scripts that is really slow (reduce_sr.pl have been running for 3-4 days now).

The premise for MSR-CA is really interesting though, assemble Illumina reads into highly confident unitigs/contigs with a de Bruijn graph, which is then combined with other data (454, Sanger) in CA afterwards.

**flxlex** · 12-19-2011, 12:09 AM

The big question is whether there ever will be one tool for all (these) different datatypes. The different assembler out there are tailored to different sequencing platforms for good reasons. Short reads can not be assembled using an OLC-based approach; this was solved by implementing the de Bruijn Graph. Now that these short-read technologies reach 100 bases, and 150 on the MiSeq (and GaIIx, apparently), this might change, though.

So, perhaps using the best assembler for each datatype, and then developing a merging strategy would be better? Getting the best contigs possible first, then merge them and scaffold them using the best scaffolder?

In this respect, the MSR-CA approach is quite interesting.

**Godevil** · 12-19-2011, 01:18 AM

[QUOTE=flxlex;59886]Getting the best contigs possible first, then merge them and scaffold them using the best scaffolder? [QUOTE]

In this case, the scaffolds maybe better, but not the contigs.

I'm performing genome assembly with SOAPdenovo. This software can assemble illumina short reads in to contigs and then generate scaffolds with some extra long reads (such as 454 and sanger) - the similar procedure like you said.

But in my work, the contigs from SOAPdenovo are always very short. So, that's why I want to find some software which can generate contigs with all those data. Maybe, we can get much better contigs.

**maubp** · 12-19-2011, 04:11 AM

MIRA supports Illumina, 454, Sanger and Ion Torrent data. And I think Bastien is looking into PacBio as well.

**SLB** · 01-26-2012, 08:13 AM

Has anyone tried the recent version of Cellera (7.0), allowing up to 2 billion reads? I have it running now with 700m reads of ~140 and some 454 and pretty eager to see how it turns out.

Also, has anyone been able to get MSR-CA running. I downloaded version 4, but it seems to stop during the generation of super-reads stage.

**Ole** · 01-26-2012, 09:50 AM

I'm started a couple of assemblies of only 454 reads (about 45 million and 85 million, respectively) with CA 7.0, but they are still at the scaffolding step, and I reckon they will run for a week or two more.

I've gotten MSR-CA 1.4 to run properly, but only on bacterial datasets (the Rhodobacter one from GAGE). I've tried it on our Illumina reads too (we have 200 million reads or something, getting more in some weeks), but it used a really long time on the reduce_sr.pl step (about 2-3 weeks). I had to stop it before it finished. So it is possible, but I think the implementation of reduce_sr.pl is a bottleneck in using MSR-CA on larger datasets. I'll come back to you when I get some experience with our new Illumina reads (in 6 weeks time).

**ians** · 01-27-2012, 10:09 AM

Here at Cofactor Genomics, we've seen limited success.
We have good results with transcript sequence. We preassembled ILMN and 454 reads separately and then brought them together with an OLC. Here's a case where we didn't even hit the entire genome (2.6 MB) until the hybrid assembly:

Error 404 (Not Found)!!1

https://docs.google.com/open?id=0BySV4NmVGJNfMjc2M2NmNDktNThjMC00NTdkLTk3YjktYzRhZWQ2NTM2OGEz

We are currently working on getting the same type of success with genomic sequence. Come see us at AGBT where we are presenting what does/doesn't work.

@Godevil
What kind of results are you getting on the Planarian assembly? How much sequence coverage do you have on each platform? We've done this recently and had a difficult time getting results.

**ians** · 02-21-2012, 01:47 PM

AGBT Poster

I thought I share with everyone our AGBT poster which outlines the success we had with consolidating multi-platform sequence to produce hybrid assemblies.
We outline our methods and conclusions to dealing with various types of genomes. Enjoy:

AGBT Poster

**vadim** · 02-22-2012, 01:33 AM

Originally posted by ians View Post

I thought I share with everyone our AGBT poster which outlines the success we had with consolidating multi-platform sequence to produce hybrid assemblies.
We outline our methods and conclusions to dealing with various types of genomes. Enjoy:

https://docs.google.com/open?id=0BySV4NmVGJNfZTA4Mjg3MDEtMTAxMi00NGM0LTljOWEtYmM2N2ZjMThiZTNh

The link is broken.

**ians** · 02-22-2012, 06:21 AM

Originally posted by vadim View Post

The link is broken.

oops. fixed!

**Godevil** · 03-08-2012, 05:43 PM

Originally posted by ians View Post

@Godevil
What kind of results are you getting on the Planarian assembly? How much sequence coverage do you have on each platform? We've done this recently and had a difficult time getting results.

I cannot see your document.

Our genome assembly is bad. I think that's because of low GC content, big genome size and high repetitiveness.
I'm now taking a training course in BGI in China. I hope I can get some useful information.

**erhuangzi** · 03-15-2012, 03:09 AM

question

Originally posted by Ole View Post

I'm started a couple of assemblies of only 454 reads (about 45 million and 85 million, respectively) with CA 7.0, but they are still at the scaffolding step, and I reckon they will run for a week or two more.

I've gotten MSR-CA 1.4 to run properly, but only on bacterial datasets (the Rhodobacter one from GAGE). I've tried it on our Illumina reads too (we have 200 million reads or something, getting more in some weeks), but it used a really long time on the reduce_sr.pl step (about 2-3 weeks). I had to stop it before it finished. So it is possible, but I think the implementation of reduce_sr.pl is a bottleneck in using MSR-CA on larger datasets. I'll come back to you when I get some experience with our new Illumina reads (in 6 weeks time).

which one step in using the reduce_sr.pl script？ no information about it in the manul of this software

**Ole** · 03-19-2012, 01:25 AM

Originally posted by erhuangzi View Post

which one step in using the reduce_sr.pl script？ no information about it in the manul of this software

The MSR-CA manual is pretty lacking, but this is the step where the program tries to find redundant super reads, and remove them. That's my guess at least.

Topics	Statistics	Last Post
Long-Read RNA Sequencing Uncovers a Hidden Layer of Immune Cell Regulation by SEQadmin2 Started by SEQadmin2, 06-02-2026, 12:03 PM	0 responses 19 views 0 reactions	Last Post by SEQadmin2 06-02-2026, 12:03 PM
DNA Methylation Study Reveals How Epigenetic Changes Pass Between Generations by SEQadmin2 Started by SEQadmin2, 06-02-2026, 11:40 AM	0 responses 14 views 0 reactions	Last Post by SEQadmin2 06-02-2026, 11:40 AM
MetaBeeAI Helps Scientists Process Research Literature Faster by SEQadmin2 Started by SEQadmin2, 05-28-2026, 11:40 AM	0 responses 29 views 0 reactions	Last Post by SEQadmin2 05-28-2026, 11:40 AM
Scientists Solve a 25-Year Mystery in RNA Interference by SEQadmin2 Started by SEQadmin2, 05-26-2026, 10:12 AM	0 responses 31 views 0 reactions	Last Post by SEQadmin2 05-26-2026, 10:12 AM

Unconfigured Ad

The best genome de novo assembly software using hybrid data (Illumina, 454 & Sanger)?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News