Seqanswers Leaderboard Ad

**TiborNagy** · 05-15-2014, 03:37 AM

Minia is a quick and memory efficient de-novo assembler, but the results is not so accurate.

**sujaikumar** · 05-15-2014, 07:15 AM

CLC Assembly Cell is probably one of the quickest out there with reasonable results. It's expensive, but they have a 2 week trial version so you can see if it meets your needs.

**Brian Bushnell** · 05-15-2014, 08:48 AM

Originally posted by TiborNagy View Post

Minia is a quick and memory efficient de-novo assembler, but the results is not so accurate.

Minia may be memory efficient, but I found it to take orders of magnitude longer than Velvet.

For fast assembly, I suggest subsampling or normalizing the input reads first, to reduce the input volume - that speeds things up a lot with Velvet, at least. Subsampling is faster but normalizing is often better. You can do either with BBTools. I find that a target depth of around 40 works well with Velvet.

subsample:
reformat.sh in=reads.fq out=sampled.fq samplerate=0.1

normalize:
bbnorm.sh in=reads.fq out=normalized.fq target=40

If you have paired reads in 2 files, you can use the in1, in2, out1, and out2 flags, and pairs will stay together.

**Retro** · 05-16-2014, 02:02 AM

normalization

Thanks for the suggestions.

As for CLC - we do have CLC genomics workbench - it works great but is still too slow for what we need, not much different from Velvet

As for normalization of reads before assembly - I do not understand the methods well enough, but I was told that when you normalize, it is not good for assembly methods based on K-mers. Possibly because the methods need the information about the abundance of reads containing the K-mers and that would be lost by normalization. I am not sure if that is the same as normalization, but I wanted to use Usearch program to reduce the read numbers (dereplication or UCLUST). Usearch is fast enough for our planned throughput.

**Brian Bushnell** · 05-16-2014, 08:46 AM

The effect of normalization really depends on the normalizer and the assembler.

In my testing of BBNorm, normalization universally improves the L50 with Velvet, and some other metrics (total number of errors, total size, longest contig length, total number of contigs) may go up or down slightly but generally there is a positive trend. There's also typically a positive trend with Soap. AllPathsLG appears to be much more sensitive to read abundance patterns, and normalization seems to have a negative impact just as often as a positive one.

But subsampling does not change the relative read abundance, it just scales it down by a constant factor across the whole genome, so if you are worried about the effects of normalization then subsampling is a better option. It's extremely fast. Dereplication is not a bad idea, but if you only remove identical read pairs, it won't decrease your data volume much. If you treat data as single-ended and remove all duplicate individual reads, it will reduce your data much more. However, dereplication DOES increase the error rate, since reads with errors are less likely to be duplicates. You may wish to error-correct first, which BBNorm can also do - that will cause more reads to be removed.

**rchikhi** · 05-27-2014, 04:35 PM

Minia dev here. I regret to hear that for some of you Minia has been inaccurate or too slow.

Minia is IO-intensive, so it can be slow if you run it on a network-mounted folder (e.g. your cluster's home directory). On a regular hard drive, or even better a SSD, it will be quick; I stand by the claim that human-sized genomes are assembled in a day on a plain desktop computer.

Regarding the quality of Minia results, in my tests (using QUAST) I never noticed more misassemblies than other assemblers. TiborNagy, could you elaborate your comment?

To contribute to the thread: if all you have is a single machine with many CPU's, then SOAPdenovo2/Velvet using all CPU cores are likely to be the fastest assemblers. Minia's pretty fast using just 1 thread. I recall that ABySS was able to assemble a human genome in half a day using a cluster, and it's likely that the Ray assembler will match this performance as well.

**TiborNagy** · 05-28-2014, 05:45 AM

Originally posted by rchikhi View Post

TiborNagy, could you elaborate your comment?

Of course I can. I have tried Minia 3 years ago and I have tried to assemble human HLA genes with different assemblers. (Yes, it is a very hard task, I known) I have mapped the contigs back to the human reference and watched the coverage. Minia was the fastest program, but the contigs were too small. (Sorry, I can not remember the exact values.)

I have read the Minia article. It is a clever algorithm, but does not fit for every task.

**rchikhi** · 05-28-2014, 07:04 AM

Thanks for your comment.

There's a difference between accuracy of contigs (misassembly, mismatches) and contiguity (how long the contigs are).

Yes it make sense to say that Minia doesn't always produce the most contiguous results, given that it has a very simple contig construction algorithm that doesn't use read information or paired-end. However, in terms of accuracy (misassembles, mismatches), it should perform reasonably well.

**Brian Bushnell** · 05-28-2014, 07:36 AM

I should clarify that I've only tried Minia once, and it was on a metagenome of unknown size and composition (the assemblies came out at 30 to 60 Mbp). I ran Velvet, Spades, Soap, and Minia. Soap crashed; Velvet was the fastest, and Minia took a long time. None of the assemblies were any good (L50 much shorter than read length). Our disk subsystem is very unpredictable and often extremely slow, which could have been the problem.

So, that could be an anomalous result compared to running it on an isolate using local disk.

**rchikhi** · 05-28-2014, 07:52 AM

Ty for the details -- slow disk system is the only reason why Minia can be slow, so this makes sense.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 25 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 29 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 25 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

ultrafast de novo assembly?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News