I'm completely new at de novo sequencing - what are good tools to assemble from short Solexa tags?
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
oops... found another useful thread with these suggestions:
* MIRA2 - MIRA (Mimicking Intelligent Read Assembly) is able to perform true hybrid de-novo assemblies using reads gathered through 454 sequencing technology (GS20 or GS FLX). Compatible with 454, Solexa and Sanger data. Linux OS required.
* SHARCGS - De novo assembly of short reads. Authors are Dohm JC, Lottaz C, Borodina T and Himmelbauer H. from the Max-Planck-Institute for Molecular Genetics.
* SSAKE - Version 2.0 of SSAKE (23 Oct 2007) can now handle error-rich sequences. Authors are René Warren, Granger Sutton, Steven Jones and Robert Holt from the Canada's Michael Smith Genome Sciences Centre. Perl/Linux.
* VCAKE - De novo assembly of short reads with robust error correction. An improvement on early versions of SSAKE.
* Velvet - Velvet is a de novo genomic assembler specially designed for short read sequencing technologies, such as Solexa or 454. Need about 20-25X coverage and paired reads. Developed by Daniel Zerbino and Ewan Birney at the European Bioinformatics Institute (EMBL-EBI).
Anyone use more than one of these assemblers? I have low coverage with short solexa tags --> really just want to combine reads into longer reads.
-
Sharcgs, ssake, and vcake are...not the most sophisticated programs.
You want the kinds that use de brujin graphs. Velvet is genrally the most commonly used one, and it's constantly being updated and supported...I don't know that the others are you mentioned are.
There's also Euler-SR, and I think EDENA also works okay.
I haven't tried Euler yet, but I tried EDENA once, and it was way slower than velvet.
With low coverage solexa data, there's not going to be much you can do.
Comment
-
The new euler-sr is starting to reach the ballpark, or finally the runtime order, of velvet for time, and hopefully in the next couple of days I'll tweak a couple of things that will speed it up still.
There is a tool in euler called assemblesec.pl, for assembly sans error correction, which just builds a de Bruijn graph, and hands you the result. You can parse the output to find which reads are on the same contig, or run some "light" error correction on the resulting graph.
However, you may want to use the error correction, since that can patch overlaps in low-coverage projects. It just takes forever. Currently euler-sr guesses the average coverage, but this goes bad in very high and very low coverage projects. In the release I'll post later tonight, there is an option to specify the minimal coverage (most likely 2).
-mark
Originally posted by swbarnes2 View PostSharcgs, ssake, and vcake are...not the most sophisticated programs.
You want the kinds that use de brujin graphs. Velvet is genrally the most commonly used one, and it's constantly being updated and supported...I don't know that the others are you mentioned are.
There's also Euler-SR, and I think EDENA also works okay.
I haven't tried Euler yet, but I tried EDENA once, and it was way slower than velvet.
With low coverage solexa data, there's not going to be much you can do.
Comment
-
Originally posted by swbarnes2 View PostSharcgs, ssake, and vcake are...not the most sophisticated programs.
You want the kinds that use de brujin graphs. Velvet is genrally the most commonly used one, and it's constantly being updated and supported...I don't know that the others are you mentioned are.
There's also Euler-SR, and I think EDENA also works okay.
I haven't tried Euler yet, but I tried EDENA once, and it was way slower than velvet.
With low coverage solexa data, there's not going to be much you can do.
I'm using parts of NextGENe which incorporates some de brujin graphs...
Comment
-
Originally posted by mchaisso View PostThe new euler-sr is starting to reach the ballpark, or finally the runtime order, of velvet for time, and hopefully in the next couple of days I'll tweak a couple of things that will speed it up still.
There is a tool in euler called assemblesec.pl, for assembly sans error correction, which just builds a de Bruijn graph, and hands you the result. You can parse the output to find which reads are on the same contig, or run some "light" error correction on the resulting graph.
However, you may want to use the error correction, since that can patch overlaps in low-coverage projects. It just takes forever. Currently euler-sr guesses the average coverage, but this goes bad in very high and very low coverage projects. In the release I'll post later tonight, there is an option to specify the minimal coverage (most likely 2).
-mark
Comment
-
Originally posted by doxologist View PostVelvet is only colorspace right?
I'm using parts of NextGENe which incorporates some de brujin graphs...
As for the euler-sr post... there is some weird memory problem that is only appearing at the end of assembly of a 37 Mb genome, so it'll be a bit more time before it is posted.
-mark
Comment
-
Originally posted by doxologist View Postthanks - looking forward to the update.
Comment
-
Originally posted by doxologist View PostVelvet is only colorspace right?
I'm using parts of NextGENe which incorporates some de brujin graphs...
Comment
-
Velvet is licensed under GPL so no need to purchase a license. IANAL so I will not comment on implications for source release of their components. Also, Softgenetics didn't hide the fact that they incorporated Velvet. See the references in these two application notes:
Comment
-
Fair enough, I'll retract the previous post. However I'll point out that it was not immediately obvious as the previous posts indicated, and is not noted on the page: http://www.softgenetics.com/NextGENe.html.
Comment
-
Yes, they do cite velvet in their app notes. But don't you think it is ambiguous? Do they implement the method of velvet by themself, or do they use the code of velvet? If I were they, I would decalre I incorporated Velvet in my software and distribute my software with Velvet code modified for Win32/64 packed, or present the code in my website. This is what GPL license exactly asks.
BTW, it is very easy to compile Velvet in Win 32. It cost me only 3 hours to modify and compile the code in Visual Studio 2005.
Comment
Latest Articles
Collapse
-
by seqadmin
Spatial biology is an exciting field that encompasses a wide range of techniques and technologies aimed at mapping the organization and interactions of various biomolecules in their native environments. As this area of research progresses, new tools and methodologies are being introduced, accompanied by efforts to establish benchmarking standards and drive technological innovation.
3D Genomics
While spatial biology often involves studying proteins and RNAs in their...-
Channel: Articles
Yesterday, 07:30 PM -
-
by seqadmin
Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...-
Channel: Articles
12-16-2024, 07:57 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 12-30-2024, 01:35 PM
|
0 responses
21 views
0 likes
|
Last Post
by seqadmin
12-30-2024, 01:35 PM
|
||
Started by seqadmin, 12-17-2024, 10:28 AM
|
0 responses
41 views
0 likes
|
Last Post
by seqadmin
12-17-2024, 10:28 AM
|
||
Started by seqadmin, 12-13-2024, 08:24 AM
|
0 responses
55 views
0 likes
|
Last Post
by seqadmin
12-13-2024, 08:24 AM
|
||
Started by seqadmin, 12-12-2024, 07:41 AM
|
0 responses
40 views
0 likes
|
Last Post
by seqadmin
12-12-2024, 07:41 AM
|
Comment