Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Assemblathon: Collaborative Assembler Comparison!

    The folks at the UCDavis Genome Center are organizing a collaborative effort to evaluate and improve genome assemblies. This looks like it will be very informative in determining which assemblers perform well on what data types.

    Find the Assemblathon here: http://assemblathon.org/

    Thanks to Nickloman for bringing this to my attention.

  • #2
    I just found this like (Draft 1): http://korflab.ucdavis.edu/Datasets/...n_analysis.pdf.

    Comment


    • #3
      The full results from the Assemblathon can be found at:

      Comment


      • #4
        Linked to Genome10K Project

        Hi all,

        This was actually a collaborative effort between David Haussler's group at UCSC, Ian Korf's lab here at UC Davis, and the UC Davis Genome Center's Bioinformatics Core. David Haussler initiated the collaboration to complement the recent Genome10K Project meeting this past March, and we discussed the results at the Genome Assembly Workshop attached to that meeting. There will be a paper discussing the results in great detail - it's in preparation now. Finally, the Assemblathon "competition" was meant to be the first of many; Assemblathon 2 is slated to start later in the summer and wrap in the fall sometime. As far as I understand, the Broad Institute and BGI are contributing novel sequence data from previously unsequenced organisms, to be used in Assemblathon 2.

        Comment


        • #5
          Assemblathon 2 data will be released June 1 (a fish, bird, and snake). Groups will then have until September 1 to assemble the genomes. The results will be announced at CSHL Genome Informatics in November. These are the plans, and I hope we don't fall behind schedule. Please check out the website and join the mailing list if you're interested.

          Comment


          • #6
            Originally posted by iankorf View Post
            Assemblathon 2 data will be released June 1 (a fish, bird, and snake). Groups will then have until September 1 to assemble the genomes. The results will be announced at CSHL Genome Informatics in November. These are the plans, and I hope we don't fall behind schedule. Please check out the website and join the mailing list if you're interested.
            How about adding some smaller genomes? Like one or two bacteria and one or two small eukaryotes (yeasts, fungi).

            There is a definitive bias from the organizers of both the Assemblathon and dnGASP to "think big" whereas having a look at smaller things - which are supposedly easy - may also be very ... interesting.

            B.

            Comment


            • #7
              The first (and second) Assemblathon were born out of the needs of the G10K project. We aren't thinking big as much as we are thinking vertebrate. But you're absolutely correct: there are small assembly problems that are also important. We'll get there soon.

              Comment


              • #8
                I'd also say depth is important.

                Some assemblers basically take the approach of sheer depth alone is enough to ensure that any sequence with an error becomes irrelevant as there's probably another sequencing spanning the same region that is error free. This technique does indeed work, but it's very costly to implement. So some assemblies of lower depth sets would be nice too.

                Then there are issues of library sizes, singular size or mix, etc. It's a large field to survey basically. Anyway more variety could be interesting. I suspect no one assembler will "win", but rather some will have their own particular niche.
                Last edited by jkbonfield; 05-19-2011, 05:48 AM. Reason: Minor grammar

                Comment


                • #9
                  Library type / depth issues

                  Some of the parameters of the data (library insert sizes, depths) are determined more by the parties who are willing to donate novel data "to the cause," rather than pure ab initio considerations of what data people would like to see (based on their own focus, or what kind of data is usually available to them). This is a little unfortunate, as it constrains the input to what a sub-population of the larger assembling community would prefer.

                  In addition, we hesitate to include too many options / sub-problems in the competition, as this increases the workload of the evaluators (who may or may not be funded for their Assemblathon-related efforts).

                  But, as Ian said, we'll probably get there in future Assemblathons, because the issues you mention are definitely interesting to many people, and may also have relevance for the Genome10K Project (metagenomic assemblies of microbes and vertebrate host?, mitochondrial assemblies?).

                  ~Joe

                  Comment


                  • #10
                    Originally posted by jnfass View Post
                    ... (metagenomic assemblies of microbes and vertebrate host?, mitochondrial assemblies?).
                    Oh. My. God. Noooo! No mitochondria or chloroplasts.

                    Include mitochondrial and chloroplast data only if you feel sadistic and want to see assembly programs (and then evaluators) sweat: host contamination which was not filtered away; very high, but uneven coverage (maybe due to GC content); genetic variations in sequenced samples (like ploidy, but worse); repeats; etc.pp

                    B.

                    PS: let's see whether reverse psychology works
                    PPS: I still think that small and "easy" well-known bacterial or fungal genomes should be part of any evaluation ... simply because it also gives the evaluators and then readers of the results a warm and fuzzy feeling on how well actually the evaluation process works. I'll wait for Assemblathon 3 then.

                    Comment


                    • #11
                      I'd add to Joe and Ian's comments by saying that it's great the genome assembly community has a thirst for tackling lots of different areas of genome assembly. We'd like to address all areas of sequence assembly, but we had to start somewhere. Indeed, part of the goal of Assemblathon 1 was just to see whether it was even possible to get a group of people to all work on the same problem at once.

                      Going forward, people should feel free to approach the Assemblathon organizers ideas and suggestions, though ideally we'd like to hear from people who have – or will have – short read data that can be used in future Assemblathons.

                      Finally, I'd ask that if people want to be kept in the loop on Assemblathon discussions then they should join the Assemblathon mailing list: http://assemblathon.org/pages/mailing-list

                      I also write the occasional short blog post on the Assemblathon website which can be subscribed to as an RSS feed, and there is also the Assemblathon twitter account.

                      Comment


                      • #12
                        assemblathon 2

                        Data is now posted for Assemblathon 2, the submission date is September 1st.

                        Comment


                        • #13
                          Assemblathon 1: A competitive assessment of de novo short read assembly methods

                          I don't think I'm the first one to spot this in the press but thought it may be relevant to the thread.

                          An international, peer-reviewed genome sciences journal featuring outstanding original research that offers novel insights into the biology of all organisms

                          Comment


                          • #14
                            Hi All

                            I'm trying to reproduce some of Assemblathon 1 results and so far the metrics (N50 , NG50) I'm getting for SOAPdenovo are far from what has been reported. UCDavis people told me they don't have the parameters that the assemblers were run with. I emailed BGI but did not get a reply back. Any suggestions on parameter setting( K-mer size, which libraries to use for contig, scaffold creation and....) for Assemblathon 1 data?
                            Thanks in advance.

                            Comment

                            Latest Articles

                            Collapse

                            • seqadmin
                              Advancing Precision Medicine for Rare Diseases in Children
                              by seqadmin




                              Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
                              12-16-2024, 07:57 AM
                            • seqadmin
                              Recent Advances in Sequencing Technologies
                              by seqadmin



                              Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

                              Long-Read Sequencing
                              Long-read sequencing has seen remarkable advancements,...
                              12-02-2024, 01:49 PM

                            ad_right_rmr

                            Collapse

                            News

                            Collapse

                            Topics Statistics Last Post
                            Started by seqadmin, 12-17-2024, 10:28 AM
                            0 responses
                            33 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 12-13-2024, 08:24 AM
                            0 responses
                            49 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 12-12-2024, 07:41 AM
                            0 responses
                            34 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 12-11-2024, 07:45 AM
                            0 responses
                            46 views
                            0 likes
                            Last Post seqadmin  
                            Working...
                            X