SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
EDENA assembler vani s kulkarni Illumina/Solexa 3 02-26-2012 09:40 PM
Assemblathon:Unable to reproduce results reported for SOAPdenovo Mahtab Bioinformatics 0 12-21-2011 05:01 PM
Velvet assembler bioinf Bioinformatics 31 08-24-2011 10:19 AM
Choosing an Assembler(s) charltt Bioinformatics 6 06-21-2011 03:20 AM
PE Assembler ewalt98 Bioinformatics 2 04-01-2011 02:21 PM

Reply
 
Thread Tools
Old 01-11-2011, 08:50 AM   #1
ECO
--Site Admin--
 
Location: SF Bay Area, CA, USA

Join Date: Oct 2007
Posts: 1,358
Default Assemblathon: Collaborative Assembler Comparison!

The folks at the UCDavis Genome Center are organizing a collaborative effort to evaluate and improve genome assemblies. This looks like it will be very informative in determining which assemblers perform well on what data types.

Find the Assemblathon here: http://assemblathon.org/

Thanks to Nickloman for bringing this to my attention.
ECO is offline   Reply With Quote
Old 03-15-2011, 08:43 PM   #2
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

I just found this like (Draft 1): http://korflab.ucdavis.edu/Datasets/...n_analysis.pdf.
nilshomer is offline   Reply With Quote
Old 03-21-2011, 04:51 AM   #3
Graham Etherington
Member
 
Location: Norwich, England.

Join Date: Apr 2010
Posts: 22
Default

The full results from the Assemblathon can be found at:
http://assemblathon.org/
Graham Etherington is offline   Reply With Quote
Old 05-18-2011, 11:02 AM   #4
jnfass
Member
 
Location: Davis, CA

Join Date: Aug 2008
Posts: 88
Default Linked to Genome10K Project

Hi all,

This was actually a collaborative effort between David Haussler's group at UCSC, Ian Korf's lab here at UC Davis, and the UC Davis Genome Center's Bioinformatics Core. David Haussler initiated the collaboration to complement the recent Genome10K Project meeting this past March, and we discussed the results at the Genome Assembly Workshop attached to that meeting. There will be a paper discussing the results in great detail - it's in preparation now. Finally, the Assemblathon "competition" was meant to be the first of many; Assemblathon 2 is slated to start later in the summer and wrap in the fall sometime. As far as I understand, the Broad Institute and BGI are contributing novel sequence data from previously unsequenced organisms, to be used in Assemblathon 2.
jnfass is offline   Reply With Quote
Old 05-18-2011, 11:17 AM   #5
iankorf
Junior Member
 
Location: California

Join Date: Mar 2011
Posts: 2
Default

Assemblathon 2 data will be released June 1 (a fish, bird, and snake). Groups will then have until September 1 to assemble the genomes. The results will be announced at CSHL Genome Informatics in November. These are the plans, and I hope we don't fall behind schedule. Please check out the website and join the mailing list if you're interested.
iankorf is offline   Reply With Quote
Old 05-19-2011, 01:46 AM   #6
BaCh
Member
 
Location: Germany

Join Date: May 2008
Posts: 79
Default

Quote:
Originally Posted by iankorf View Post
Assemblathon 2 data will be released June 1 (a fish, bird, and snake). Groups will then have until September 1 to assemble the genomes. The results will be announced at CSHL Genome Informatics in November. These are the plans, and I hope we don't fall behind schedule. Please check out the website and join the mailing list if you're interested.
How about adding some smaller genomes? Like one or two bacteria and one or two small eukaryotes (yeasts, fungi).

There is a definitive bias from the organizers of both the Assemblathon and dnGASP to "think big" whereas having a look at smaller things - which are supposedly easy - may also be very ... interesting.

B.
BaCh is offline   Reply With Quote
Old 05-19-2011, 06:14 AM   #7
iankorf
Junior Member
 
Location: California

Join Date: Mar 2011
Posts: 2
Default

The first (and second) Assemblathon were born out of the needs of the G10K project. We aren't thinking big as much as we are thinking vertebrate. But you're absolutely correct: there are small assembly problems that are also important. We'll get there soon.
iankorf is offline   Reply With Quote
Old 05-19-2011, 06:47 AM   #8
jkbonfield
Senior Member
 
Location: Cambridge, UK

Join Date: Jul 2008
Posts: 146
Default

I'd also say depth is important.

Some assemblers basically take the approach of sheer depth alone is enough to ensure that any sequence with an error becomes irrelevant as there's probably another sequencing spanning the same region that is error free. This technique does indeed work, but it's very costly to implement. So some assemblies of lower depth sets would be nice too.

Then there are issues of library sizes, singular size or mix, etc. It's a large field to survey basically. Anyway more variety could be interesting. I suspect no one assembler will "win", but rather some will have their own particular niche.

Last edited by jkbonfield; 05-19-2011 at 06:48 AM. Reason: Minor grammar
jkbonfield is offline   Reply With Quote
Old 05-19-2011, 11:03 AM   #9
jnfass
Member
 
Location: Davis, CA

Join Date: Aug 2008
Posts: 88
Default Library type / depth issues

Some of the parameters of the data (library insert sizes, depths) are determined more by the parties who are willing to donate novel data "to the cause," rather than pure ab initio considerations of what data people would like to see (based on their own focus, or what kind of data is usually available to them). This is a little unfortunate, as it constrains the input to what a sub-population of the larger assembling community would prefer.

In addition, we hesitate to include too many options / sub-problems in the competition, as this increases the workload of the evaluators (who may or may not be funded for their Assemblathon-related efforts).

But, as Ian said, we'll probably get there in future Assemblathons, because the issues you mention are definitely interesting to many people, and may also have relevance for the Genome10K Project (metagenomic assemblies of microbes and vertebrate host?, mitochondrial assemblies?).

~Joe
jnfass is offline   Reply With Quote
Old 05-20-2011, 02:25 AM   #10
BaCh
Member
 
Location: Germany

Join Date: May 2008
Posts: 79
Default

Quote:
Originally Posted by jnfass View Post
... (metagenomic assemblies of microbes and vertebrate host?, mitochondrial assemblies?).
Oh. My. God. Noooo! No mitochondria or chloroplasts.

Include mitochondrial and chloroplast data only if you feel sadistic and want to see assembly programs (and then evaluators) sweat: host contamination which was not filtered away; very high, but uneven coverage (maybe due to GC content); genetic variations in sequenced samples (like ploidy, but worse); repeats; etc.pp

B.

PS: let's see whether reverse psychology works
PPS: I still think that small and "easy" well-known bacterial or fungal genomes should be part of any evaluation ... simply because it also gives the evaluators and then readers of the results a warm and fuzzy feeling on how well actually the evaluation process works. I'll wait for Assemblathon 3 then.
BaCh is offline   Reply With Quote
Old 05-20-2011, 09:51 AM   #11
kbradnam
Member
 
Location: Davis, CA

Join Date: May 2011
Posts: 53
Default

I'd add to Joe and Ian's comments by saying that it's great the genome assembly community has a thirst for tackling lots of different areas of genome assembly. We'd like to address all areas of sequence assembly, but we had to start somewhere. Indeed, part of the goal of Assemblathon 1 was just to see whether it was even possible to get a group of people to all work on the same problem at once.

Going forward, people should feel free to approach the Assemblathon organizers ideas and suggestions, though ideally we'd like to hear from people who have – or will have – short read data that can be used in future Assemblathons.

Finally, I'd ask that if people want to be kept in the loop on Assemblathon discussions then they should join the Assemblathon mailing list: http://assemblathon.org/pages/mailing-list

I also write the occasional short blog post on the Assemblathon website which can be subscribed to as an RSS feed, and there is also the Assemblathon twitter account.
kbradnam is offline   Reply With Quote
Old 06-01-2011, 11:35 PM   #12
jstjohn
Member
 
Location: San Francisco, CA

Join Date: Jun 2010
Posts: 35
Default assemblathon 2

Data is now posted for Assemblathon 2, the submission date is September 1st.

http://assemblathon.org/assemblathon-2-begins-today
jstjohn is offline   Reply With Quote
Old 09-21-2011, 02:09 AM   #13
mjp
Member
 
Location: USA

Join Date: Mar 2011
Posts: 25
Default Assemblathon 1: A competitive assessment of de novo short read assembly methods

I don't think I'm the first one to spot this in the press but thought it may be relevant to the thread.

http://genome.cshlp.org/content/earl...9.111.abstract
mjp is offline   Reply With Quote
Old 12-19-2011, 09:14 PM   #14
Mahtab
Member
 
Location: University of Melbourne

Join Date: Aug 2011
Posts: 10
Default

Hi All

I'm trying to reproduce some of Assemblathon 1 results and so far the metrics (N50 , NG50) I'm getting for SOAPdenovo are far from what has been reported. UCDavis people told me they don't have the parameters that the assemblers were run with. I emailed BGI but did not get a reply back. Any suggestions on parameter setting( K-mer size, which libraries to use for contig, scaffold creation and....) for Assemblathon 1 data?
Thanks in advance.
Mahtab is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 02:36 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO