![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
STAR: ultrafast universal RNA-seq aligner | alexdobin | Bioinformatics | 218 | 04-02-2018 06:59 PM |
Compare de-novo transcriptome assembly to genome reference guided assembly | IdoBar | Bioinformatics | 1 | 04-04-2014 01:28 AM |
Inquiry: minimum length of reads for referece-based assembly or de novo assembly | sunfuhui | Bioinformatics | 1 | 10-04-2013 10:28 AM |
de novo assembly vs. reference assembly | fadista | General | 3 | 02-16-2011 12:11 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: Pennsylvania Join Date: Apr 2011
Posts: 27
|
![]()
Is there any way how to run "quick and dirty" de novo assembly of Illumina reads from a genome? All we need is to obtain contigs at least several hundred nucleotides long. Our current runs with SOAPdenovo and Velvet are good but way too time-consuming for what we need.
Thank you for any suggestions. |
![]() |
![]() |
![]() |
#3 |
Junior Member
Location: Edinburgh, UK Join Date: Jul 2009
Posts: 2
|
![]()
CLC Assembly Cell is probably one of the quickest out there with reasonable results. It's expensive, but they have a 2 week trial version so you can see if it meets your needs.
|
![]() |
![]() |
![]() |
#4 | |
Super Moderator
Location: Walnut Creek, CA Join Date: Jan 2014
Posts: 2,707
|
![]() Quote:
For fast assembly, I suggest subsampling or normalizing the input reads first, to reduce the input volume - that speeds things up a lot with Velvet, at least. Subsampling is faster but normalizing is often better. You can do either with BBTools. I find that a target depth of around 40 works well with Velvet. subsample: reformat.sh in=reads.fq out=sampled.fq samplerate=0.1 normalize: bbnorm.sh in=reads.fq out=normalized.fq target=40 If you have paired reads in 2 files, you can use the in1, in2, out1, and out2 flags, and pairs will stay together. |
|
![]() |
![]() |
![]() |
#5 |
Member
Location: Pennsylvania Join Date: Apr 2011
Posts: 27
|
![]()
Thanks for the suggestions.
As for CLC - we do have CLC genomics workbench - it works great but is still too slow for what we need, not much different from Velvet As for normalization of reads before assembly - I do not understand the methods well enough, but I was told that when you normalize, it is not good for assembly methods based on K-mers. Possibly because the methods need the information about the abundance of reads containing the K-mers and that would be lost by normalization. I am not sure if that is the same as normalization, but I wanted to use Usearch program to reduce the read numbers (dereplication or UCLUST). Usearch is fast enough for our planned throughput. |
![]() |
![]() |
![]() |
#6 |
Super Moderator
Location: Walnut Creek, CA Join Date: Jan 2014
Posts: 2,707
|
![]()
The effect of normalization really depends on the normalizer and the assembler.
In my testing of BBNorm, normalization universally improves the L50 with Velvet, and some other metrics (total number of errors, total size, longest contig length, total number of contigs) may go up or down slightly but generally there is a positive trend. There's also typically a positive trend with Soap. AllPathsLG appears to be much more sensitive to read abundance patterns, and normalization seems to have a negative impact just as often as a positive one. But subsampling does not change the relative read abundance, it just scales it down by a constant factor across the whole genome, so if you are worried about the effects of normalization then subsampling is a better option. It's extremely fast. Dereplication is not a bad idea, but if you only remove identical read pairs, it won't decrease your data volume much. If you treat data as single-ended and remove all duplicate individual reads, it will reduce your data much more. However, dereplication DOES increase the error rate, since reads with errors are less likely to be duplicates. You may wish to error-correct first, which BBNorm can also do - that will cause more reads to be removed. |
![]() |
![]() |
![]() |
#7 |
Member
Location: France Join Date: Jan 2013
Posts: 13
|
![]()
Minia dev here. I regret to hear that for some of you Minia has been inaccurate or too slow.
Minia is IO-intensive, so it can be slow if you run it on a network-mounted folder (e.g. your cluster's home directory). On a regular hard drive, or even better a SSD, it will be quick; I stand by the claim that human-sized genomes are assembled in a day on a plain desktop computer. Regarding the quality of Minia results, in my tests (using QUAST) I never noticed more misassemblies than other assemblers. TiborNagy, could you elaborate your comment? To contribute to the thread: if all you have is a single machine with many CPU's, then SOAPdenovo2/Velvet using all CPU cores are likely to be the fastest assemblers. Minia's pretty fast using just 1 thread. I recall that ABySS was able to assemble a human genome in half a day using a cluster, and it's likely that the Ray assembler will match this performance as well. |
![]() |
![]() |
![]() |
#8 |
Senior Member
Location: Budapest Join Date: Mar 2010
Posts: 329
|
![]()
Of course I can. I have tried Minia 3 years ago and I have tried to assemble human HLA genes with different assemblers. (Yes, it is a very hard task, I known) I have mapped the contigs back to the human reference and watched the coverage. Minia was the fastest program, but the contigs were too small. (Sorry, I can not remember the exact values.)
I have read the Minia article. It is a clever algorithm, but does not fit for every task. |
![]() |
![]() |
![]() |
#9 |
Member
Location: France Join Date: Jan 2013
Posts: 13
|
![]()
Thanks for your comment.
There's a difference between accuracy of contigs (misassembly, mismatches) and contiguity (how long the contigs are). Yes it make sense to say that Minia doesn't always produce the most contiguous results, given that it has a very simple contig construction algorithm that doesn't use read information or paired-end. However, in terms of accuracy (misassembles, mismatches), it should perform reasonably well. |
![]() |
![]() |
![]() |
#10 |
Super Moderator
Location: Walnut Creek, CA Join Date: Jan 2014
Posts: 2,707
|
![]()
I should clarify that I've only tried Minia once, and it was on a metagenome of unknown size and composition (the assemblies came out at 30 to 60 Mbp). I ran Velvet, Spades, Soap, and Minia. Soap crashed; Velvet was the fastest, and Minia took a long time. None of the assemblies were any good (L50 much shorter than read length). Our disk subsystem is very unpredictable and often extremely slow, which could have been the problem.
So, that could be an anomalous result compared to running it on an isolate using local disk. |
![]() |
![]() |
![]() |
#11 |
Member
Location: France Join Date: Jan 2013
Posts: 13
|
![]()
Ty for the details -- slow disk system is the only reason why Minia can be slow, so this makes sense.
|
![]() |
![]() |
![]() |
Thread Tools | |
|
|