Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • doxologist
    Member
    • Jan 2009
    • 96

    De Novo Short Read Assembler?

    I'm completely new at de novo sequencing - what are good tools to assemble from short Solexa tags?
  • doxologist
    Member
    • Jan 2009
    • 96

    #2
    oops... found another useful thread with these suggestions:

    * MIRA2 - MIRA (Mimicking Intelligent Read Assembly) is able to perform true hybrid de-novo assemblies using reads gathered through 454 sequencing technology (GS20 or GS FLX). Compatible with 454, Solexa and Sanger data. Linux OS required.
    * SHARCGS - De novo assembly of short reads. Authors are Dohm JC, Lottaz C, Borodina T and Himmelbauer H. from the Max-Planck-Institute for Molecular Genetics.
    * SSAKE - Version 2.0 of SSAKE (23 Oct 2007) can now handle error-rich sequences. Authors are René Warren, Granger Sutton, Steven Jones and Robert Holt from the Canada's Michael Smith Genome Sciences Centre. Perl/Linux.
    * VCAKE - De novo assembly of short reads with robust error correction. An improvement on early versions of SSAKE.
    * Velvet - Velvet is a de novo genomic assembler specially designed for short read sequencing technologies, such as Solexa or 454. Need about 20-25X coverage and paired reads. Developed by Daniel Zerbino and Ewan Birney at the European Bioinformatics Institute (EMBL-EBI).

    Anyone use more than one of these assemblers? I have low coverage with short solexa tags --> really just want to combine reads into longer reads.

    Comment

    • swbarnes2
      Senior Member
      • May 2008
      • 910

      #3
      Sharcgs, ssake, and vcake are...not the most sophisticated programs.

      You want the kinds that use de brujin graphs. Velvet is genrally the most commonly used one, and it's constantly being updated and supported...I don't know that the others are you mentioned are.

      There's also Euler-SR, and I think EDENA also works okay.

      I haven't tried Euler yet, but I tried EDENA once, and it was way slower than velvet.

      With low coverage solexa data, there's not going to be much you can do.

      Comment

      • mchaisso
        Member
        • Apr 2008
        • 84

        #4
        The new euler-sr is starting to reach the ballpark, or finally the runtime order, of velvet for time, and hopefully in the next couple of days I'll tweak a couple of things that will speed it up still.

        There is a tool in euler called assemblesec.pl, for assembly sans error correction, which just builds a de Bruijn graph, and hands you the result. You can parse the output to find which reads are on the same contig, or run some "light" error correction on the resulting graph.

        However, you may want to use the error correction, since that can patch overlaps in low-coverage projects. It just takes forever. Currently euler-sr guesses the average coverage, but this goes bad in very high and very low coverage projects. In the release I'll post later tonight, there is an option to specify the minimal coverage (most likely 2).

        -mark

        Originally posted by swbarnes2 View Post
        Sharcgs, ssake, and vcake are...not the most sophisticated programs.

        You want the kinds that use de brujin graphs. Velvet is genrally the most commonly used one, and it's constantly being updated and supported...I don't know that the others are you mentioned are.

        There's also Euler-SR, and I think EDENA also works okay.

        I haven't tried Euler yet, but I tried EDENA once, and it was way slower than velvet.

        With low coverage solexa data, there's not going to be much you can do.

        Comment

        • doxologist
          Member
          • Jan 2009
          • 96

          #5
          Originally posted by swbarnes2 View Post
          Sharcgs, ssake, and vcake are...not the most sophisticated programs.

          You want the kinds that use de brujin graphs. Velvet is genrally the most commonly used one, and it's constantly being updated and supported...I don't know that the others are you mentioned are.

          There's also Euler-SR, and I think EDENA also works okay.

          I haven't tried Euler yet, but I tried EDENA once, and it was way slower than velvet.

          With low coverage solexa data, there's not going to be much you can do.
          Velvet is only colorspace right?
          I'm using parts of NextGENe which incorporates some de brujin graphs...

          Comment

          • doxologist
            Member
            • Jan 2009
            • 96

            #6
            Originally posted by mchaisso View Post
            The new euler-sr is starting to reach the ballpark, or finally the runtime order, of velvet for time, and hopefully in the next couple of days I'll tweak a couple of things that will speed it up still.

            There is a tool in euler called assemblesec.pl, for assembly sans error correction, which just builds a de Bruijn graph, and hands you the result. You can parse the output to find which reads are on the same contig, or run some "light" error correction on the resulting graph.

            However, you may want to use the error correction, since that can patch overlaps in low-coverage projects. It just takes forever. Currently euler-sr guesses the average coverage, but this goes bad in very high and very low coverage projects. In the release I'll post later tonight, there is an option to specify the minimal coverage (most likely 2).

            -mark
            thanks - looking forward to the update.

            Comment

            • mchaisso
              Member
              • Apr 2008
              • 84

              #7
              Originally posted by doxologist View Post
              Velvet is only colorspace right?
              I'm using parts of NextGENe which incorporates some de brujin graphs...
              No, Velvet was written for nucleotide space, but I believe Daniel has made changes to make it colorspace-aware. You can ask him, but he's defending around right now, so go easy on the requests.

              As for the euler-sr post... there is some weird memory problem that is only appearing at the end of assembly of a 37 Mb genome, so it'll be a bit more time before it is posted.

              -mark

              Comment

              • RudyS
                Member
                • May 2008
                • 20

                #8
                For denovo assembly from single-end solexa reads are there programs that make use of the quality scores for the reads ... during the assembly decision-making process?

                RudyS

                Comment

                • mchaisso
                  Member
                  • Apr 2008
                  • 84

                  #9
                  Originally posted by doxologist View Post
                  thanks - looking forward to the update.
                  Ok, the update is posted. Check: euler-assembler.ucsd.edu/portal for updates. There is one more change that I'll make that should improve some paired-end assembly, then it may be a bit before euler-sr is updated. Add any requests for functions now.
                  Last edited by mchaisso; 03-30-2009, 11:24 AM. Reason: clarification.

                  Comment

                  • luckybase
                    Junior Member
                    • Apr 2009
                    • 2

                    #10
                    Originally posted by doxologist View Post
                    Velvet is only colorspace right?
                    I'm using parts of NextGENe which incorporates some de brujin graphs...
                    I tried NextGENe too. I guess Softgenetics integrated velvet in NextGENe. You can find two files in the package - "debruijng.exe" and "debruijnh.exe", which look very like "velvetg" and "velveth". The temperary files created by NextGENe with debruijn method are also very similar to those by velvet.

                    Comment

                    • bioinfosm
                      Senior Member
                      • Jan 2008
                      • 483

                      #11
                      Bingo... thats what I surmised as well, NextGENe is using velvet for its de novo assembly
                      --
                      bioinfosm

                      Comment

                      • mchaisso
                        Member
                        • Apr 2008
                        • 84

                        #12
                        retracted.
                        Last edited by mchaisso; 04-07-2009, 10:29 AM.

                        Comment

                        • kmcarr
                          Senior Member
                          • May 2008
                          • 1181

                          #13
                          Velvet is licensed under GPL so no need to purchase a license. IANAL so I will not comment on implications for source release of their components. Also, Softgenetics didn't hide the fact that they incorporated Velvet. See the references in these two application notes:


                          Comment

                          • mchaisso
                            Member
                            • Apr 2008
                            • 84

                            #14
                            Fair enough, I'll retract the previous post. However I'll point out that it was not immediately obvious as the previous posts indicated, and is not noted on the page: http://www.softgenetics.com/NextGENe.html.

                            Comment

                            • luckybase
                              Junior Member
                              • Apr 2009
                              • 2

                              #15
                              Yes, they do cite velvet in their app notes. But don't you think it is ambiguous? Do they implement the method of velvet by themself, or do they use the code of velvet? If I were they, I would decalre I incorporated Velvet in my software and distribute my software with Velvet code modified for Win32/64 packed, or present the code in my website. This is what GPL license exactly asks.

                              BTW, it is very easy to compile Velvet in Win 32. It cost me only 3 hours to modify and compile the code in Visual Studio 2005.

                              Comment

                              Latest Articles

                              Collapse

                              • SEQadmin2
                                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                by SEQadmin2


                                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                ...
                                06-02-2026, 10:05 AM
                              • SEQadmin2
                                Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                                by SEQadmin2


                                With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                                Introduction

                                Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                                05-22-2026, 06:42 AM
                              • SEQadmin2
                                Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                                by SEQadmin2

                                Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                                Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                                05-06-2026, 09:04 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, Yesterday, 08:59 AM
                              0 responses
                              14 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-02-2026, 12:03 PM
                              0 responses
                              22 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-02-2026, 11:40 AM
                              0 responses
                              19 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 05-28-2026, 11:40 AM
                              0 responses
                              32 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...