SEQanswers

Go Back   SEQanswers > Applications Forums > De novo discovery



Similar Threads
Thread Thread Starter Forum Replies Last Post
YASRA (Yet Another Short Read Assembler) HereBeDragons Bioinformatics 4 04-04-2011 03:11 PM
fast short read assembler w/ quality scores blindtiger454 De novo discovery 0 11-13-2010 07:26 PM
De novo assembly of human genomes with massively parallel short read sequencing dan Literature Watch 0 12-21-2009 04:40 AM
De novo short read assembly? Which assembler is the best? Patrick De novo discovery 0 06-23-2009 06:42 PM
PubMed: ABySS: A parallel assembler for short read sequence data. Newsbot! Literature Watch 0 03-03-2009 05:00 AM

Reply
 
Thread Tools
Old 03-26-2009, 11:04 AM   #1
doxologist
Member
 
Location: USA

Join Date: Jan 2009
Posts: 96
Default De Novo Short Read Assembler?

I'm completely new at de novo sequencing - what are good tools to assemble from short Solexa tags?
doxologist is offline   Reply With Quote
Old 03-26-2009, 11:07 AM   #2
doxologist
Member
 
Location: USA

Join Date: Jan 2009
Posts: 96
Default

oops... found another useful thread with these suggestions:

* MIRA2 - MIRA (Mimicking Intelligent Read Assembly) is able to perform true hybrid de-novo assemblies using reads gathered through 454 sequencing technology (GS20 or GS FLX). Compatible with 454, Solexa and Sanger data. Linux OS required.
* SHARCGS - De novo assembly of short reads. Authors are Dohm JC, Lottaz C, Borodina T and Himmelbauer H. from the Max-Planck-Institute for Molecular Genetics.
* SSAKE - Version 2.0 of SSAKE (23 Oct 2007) can now handle error-rich sequences. Authors are René Warren, Granger Sutton, Steven Jones and Robert Holt from the Canada's Michael Smith Genome Sciences Centre. Perl/Linux.
* VCAKE - De novo assembly of short reads with robust error correction. An improvement on early versions of SSAKE.
* Velvet - Velvet is a de novo genomic assembler specially designed for short read sequencing technologies, such as Solexa or 454. Need about 20-25X coverage and paired reads. Developed by Daniel Zerbino and Ewan Birney at the European Bioinformatics Institute (EMBL-EBI).

Anyone use more than one of these assemblers? I have low coverage with short solexa tags --> really just want to combine reads into longer reads.
doxologist is offline   Reply With Quote
Old 03-26-2009, 11:36 AM   #3
swbarnes2
Senior Member
 
Location: San Diego

Join Date: May 2008
Posts: 912
Default

Sharcgs, ssake, and vcake are...not the most sophisticated programs.

You want the kinds that use de brujin graphs. Velvet is genrally the most commonly used one, and it's constantly being updated and supported...I don't know that the others are you mentioned are.

There's also Euler-SR, and I think EDENA also works okay.

I haven't tried Euler yet, but I tried EDENA once, and it was way slower than velvet.

With low coverage solexa data, there's not going to be much you can do.
swbarnes2 is offline   Reply With Quote
Old 03-26-2009, 05:03 PM   #4
mchaisso
Member
 
Location: Seattle, WA

Join Date: Apr 2008
Posts: 84
Default

The new euler-sr is starting to reach the ballpark, or finally the runtime order, of velvet for time, and hopefully in the next couple of days I'll tweak a couple of things that will speed it up still.

There is a tool in euler called assemblesec.pl, for assembly sans error correction, which just builds a de Bruijn graph, and hands you the result. You can parse the output to find which reads are on the same contig, or run some "light" error correction on the resulting graph.

However, you may want to use the error correction, since that can patch overlaps in low-coverage projects. It just takes forever. Currently euler-sr guesses the average coverage, but this goes bad in very high and very low coverage projects. In the release I'll post later tonight, there is an option to specify the minimal coverage (most likely 2).

-mark

Quote:
Originally Posted by swbarnes2 View Post
Sharcgs, ssake, and vcake are...not the most sophisticated programs.

You want the kinds that use de brujin graphs. Velvet is genrally the most commonly used one, and it's constantly being updated and supported...I don't know that the others are you mentioned are.

There's also Euler-SR, and I think EDENA also works okay.

I haven't tried Euler yet, but I tried EDENA once, and it was way slower than velvet.

With low coverage solexa data, there's not going to be much you can do.
mchaisso is offline   Reply With Quote
Old 03-27-2009, 07:59 AM   #5
doxologist
Member
 
Location: USA

Join Date: Jan 2009
Posts: 96
Default

Quote:
Originally Posted by swbarnes2 View Post
Sharcgs, ssake, and vcake are...not the most sophisticated programs.

You want the kinds that use de brujin graphs. Velvet is genrally the most commonly used one, and it's constantly being updated and supported...I don't know that the others are you mentioned are.

There's also Euler-SR, and I think EDENA also works okay.

I haven't tried Euler yet, but I tried EDENA once, and it was way slower than velvet.

With low coverage solexa data, there's not going to be much you can do.
Velvet is only colorspace right?
I'm using parts of NextGENe which incorporates some de brujin graphs...
doxologist is offline   Reply With Quote
Old 03-27-2009, 08:00 AM   #6
doxologist
Member
 
Location: USA

Join Date: Jan 2009
Posts: 96
Default

Quote:
Originally Posted by mchaisso View Post
The new euler-sr is starting to reach the ballpark, or finally the runtime order, of velvet for time, and hopefully in the next couple of days I'll tweak a couple of things that will speed it up still.

There is a tool in euler called assemblesec.pl, for assembly sans error correction, which just builds a de Bruijn graph, and hands you the result. You can parse the output to find which reads are on the same contig, or run some "light" error correction on the resulting graph.

However, you may want to use the error correction, since that can patch overlaps in low-coverage projects. It just takes forever. Currently euler-sr guesses the average coverage, but this goes bad in very high and very low coverage projects. In the release I'll post later tonight, there is an option to specify the minimal coverage (most likely 2).

-mark
thanks - looking forward to the update.
doxologist is offline   Reply With Quote
Old 03-27-2009, 08:32 AM   #7
mchaisso
Member
 
Location: Seattle, WA

Join Date: Apr 2008
Posts: 84
Default

Quote:
Originally Posted by doxologist View Post
Velvet is only colorspace right?
I'm using parts of NextGENe which incorporates some de brujin graphs...
No, Velvet was written for nucleotide space, but I believe Daniel has made changes to make it colorspace-aware. You can ask him, but he's defending around right now, so go easy on the requests.

As for the euler-sr post... there is some weird memory problem that is only appearing at the end of assembly of a 37 Mb genome, so it'll be a bit more time before it is posted.

-mark
mchaisso is offline   Reply With Quote
Old 03-27-2009, 02:19 PM   #8
RudyS
Member
 
Location: new york

Join Date: May 2008
Posts: 20
Default

For denovo assembly from single-end solexa reads are there programs that make use of the quality scores for the reads ... during the assembly decision-making process?

RudyS
RudyS is offline   Reply With Quote
Old 03-30-2009, 11:23 AM   #9
mchaisso
Member
 
Location: Seattle, WA

Join Date: Apr 2008
Posts: 84
Default

Quote:
Originally Posted by doxologist View Post
thanks - looking forward to the update.
Ok, the update is posted. Check: euler-assembler.ucsd.edu/portal for updates. There is one more change that I'll make that should improve some paired-end assembly, then it may be a bit before euler-sr is updated. Add any requests for functions now.

Last edited by mchaisso; 03-30-2009 at 11:24 AM. Reason: clarification.
mchaisso is offline   Reply With Quote
Old 04-06-2009, 11:36 PM   #10
luckybase
Junior Member
 
Location: HK

Join Date: Apr 2009
Posts: 2
Default

Quote:
Originally Posted by doxologist View Post
Velvet is only colorspace right?
I'm using parts of NextGENe which incorporates some de brujin graphs...
I tried NextGENe too. I guess Softgenetics integrated velvet in NextGENe. You can find two files in the package - "debruijng.exe" and "debruijnh.exe", which look very like "velvetg" and "velveth". The temperary files created by NextGENe with debruijn method are also very similar to those by velvet.
luckybase is offline   Reply With Quote
Old 04-07-2009, 06:52 AM   #11
bioinfosm
Senior Member
 
Location: USA

Join Date: Jan 2008
Posts: 482
Default

Bingo... thats what I surmised as well, NextGENe is using velvet for its de novo assembly
bioinfosm is offline   Reply With Quote
Old 04-07-2009, 08:42 AM   #12
mchaisso
Member
 
Location: Seattle, WA

Join Date: Apr 2008
Posts: 84
Default

retracted.

Last edited by mchaisso; 04-07-2009 at 10:29 AM.
mchaisso is offline   Reply With Quote
Old 04-07-2009, 10:02 AM   #13
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,148
Default

Velvet is licensed under GPL so no need to purchase a license. IANAL so I will not comment on implications for source release of their components. Also, Softgenetics didn't hide the fact that they incorporated Velvet. See the references in these two application notes:

http://www.softgenetics.com/DenovoAs...SR_AppNote.pdf
http://www.softgenetics.com/denovoAs...od_AppNote.pdf
kmcarr is offline   Reply With Quote
Old 04-07-2009, 10:28 AM   #14
mchaisso
Member
 
Location: Seattle, WA

Join Date: Apr 2008
Posts: 84
Default

Fair enough, I'll retract the previous post. However I'll point out that it was not immediately obvious as the previous posts indicated, and is not noted on the page: http://www.softgenetics.com/NextGENe.html.
mchaisso is offline   Reply With Quote
Old 04-08-2009, 08:10 PM   #15
luckybase
Junior Member
 
Location: HK

Join Date: Apr 2009
Posts: 2
Default

Yes, they do cite velvet in their app notes. But don't you think it is ambiguous? Do they implement the method of velvet by themself, or do they use the code of velvet? If I were they, I would decalre I incorporated Velvet in my software and distribute my software with Velvet code modified for Win32/64 packed, or present the code in my website. This is what GPL license exactly asks.

BTW, it is very easy to compile Velvet in Win 32. It cost me only 3 hours to modify and compile the code in Visual Studio 2005.
luckybase is offline   Reply With Quote
Old 04-22-2009, 02:40 AM   #16
Rao
Member
 
Location: India

Join Date: Oct 2008
Posts: 36
Default

http://www.cs.sunysb.edu/~skiena/shorty/
SHORTY
What is the status this tool...?
Rao is offline   Reply With Quote
Old 07-22-2009, 07:17 AM   #17
SoftGenetics
Registered Vendor
 
Location: pa

Join Date: Apr 2009
Posts: 32
Default short read assembler

Hi you might wish to try the new assemblers in nextGENe which in addition to de novo assembly has a condensation tool which removes chemistry and instrument errors...it is faster and more accurate than the ones mentioned.
SoftGenetics is offline   Reply With Quote
Old 09-26-2009, 12:18 AM   #18
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Quote:
Originally Posted by doxologist View Post
oops... found another useful thread with these suggestions:

* MIRA2 - MIRA (Mimicking Intelligent Read Assembly) is able to perform true hybrid de-novo assemblies using reads gathered through 454 sequencing technology (GS20 or GS FLX). Compatible with 454, Solexa and Sanger data. Linux OS required.
* SHARCGS - De novo assembly of short reads. Authors are Dohm JC, Lottaz C, Borodina T and Himmelbauer H. from the Max-Planck-Institute for Molecular Genetics.
* SSAKE - Version 2.0 of SSAKE (23 Oct 2007) can now handle error-rich sequences. Authors are René Warren, Granger Sutton, Steven Jones and Robert Holt from the Canada's Michael Smith Genome Sciences Centre. Perl/Linux.
* VCAKE - De novo assembly of short reads with robust error correction. An improvement on early versions of SSAKE.
* Velvet - Velvet is a de novo genomic assembler specially designed for short read sequencing technologies, such as Solexa or 454. Need about 20-25X coverage and paired reads. Developed by Daniel Zerbino and Ewan Birney at the European Bioinformatics Institute (EMBL-EBI).

Anyone use more than one of these assemblers? I have low coverage with short solexa tags --> really just want to combine reads into longer reads.
Don't forgett ABySS out of BCGSC. ABySS: A parallel assembler for short read sequence data. Simpson JT, Wong K, Jackman SD, Schein JE, Jones SJ, Birol I. Genome Research, 2009-June.

Last edited by nilshomer; 09-26-2009 at 12:19 AM. Reason: wrong author of ABySS
nilshomer is offline   Reply With Quote
Old 05-21-2010, 05:55 AM   #19
michael_0214
Junior Member
 
Location: China

Join Date: Feb 2010
Posts: 5
Smile A Question for Shorty

Hi! A question for Shorty: When installing the Shorty, a mistake took place- configuration file needed, in this step:/build conf/conf-file bin/shorty-assembler. Can anyone give me a hand ? Thank you!
michael_0214 is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 01:12 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO