SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   Bioinformatics (http://seqanswers.com/forums/forumdisplay.php?f=18)
-   -   ICGC-TCGA DREAM Somatic Mutation Calling Challenge (http://seqanswers.com/forums/showthread.php?t=40294)

khoulahan 01-27-2014 05:15 AM

ICGC-TCGA DREAM Somatic Mutation Calling Challenge
 
Introducing the ICGC-TCGA DREAM Somatic Mutation Calling Challenge

We are very excited to announce an international effort to benchmark methods for identifying somatic mutations in cancer genomes from whole-genome sequencing. This message outlines the competition. We encourage all groups to download a set of standard datasets and submit their results. This will be useful to both algorithm-developers, to benchmark their newest techniques, and to data-analysts, to verify their current pipelines are internationally competitive! Details below, and registration at:
https://lh4.googleusercontent.com/Rq...0bA=w1487-h557
https://lh4.googleusercontent.com/3J...VsQ=w1487-h557

What problem are we trying to solve?

Cancer is a family of diseases caused by somatic genetic mutations. Fundamental questions remain about the causes of these mutations and their roles in shaping cellular phenotypes. The particular variations in tumour genomes can influence which treatments best suit patients. The genomics revolution is now systematically characterizing every somatic variation in every tumour for large cohorts (>300 patients). The bottleneck has now become the informatics analysis of these data. Accurately identifying these variants remains an open problem in the field. The major factor influencing the poor performance of today’s mutation callers is the heterogeneity of tumour biopsies. Cancer samples are a complex mixture of normal cells of different types and multiple tumour sub-clones, mixed together in ways that vary spatially within individual tumours. These sources of noise have profound effects on mutation callers. Benchmark studies conducted by TCGA and ICGC have discovered that different mutation calling software run on the same data have limited intersection between the resulting lists of mutations (overlaps of only ~20% are typical). Thus, a great debate has ensued about which software should be run to yield a unified set of calls for major cancer genomics efforts.

How are we trying to solve it?

In response, we have launched the ICGC-TCGA DREAM Somatic Mutation Calling (SMC) Challenge, a community-based collaborative competition of researchers from across the world, to find the most accurate SNV calling and break-point detection algorithms. This Challenge will create a “living benchmark” for mutation-detection pipelines, continually evaluating the best methods and accelerating the adoption of standards. evaluating the best methods and accelerating the adoption of standards. It will create a general platform extensible to addressing other key problems in cancer genome analysis, such as reconstructing tumour phylogeny, detecting fusion transcripts from RNA sequencing data, distinguishing driver from passenger mutations, amongst others.

How will the Challenge be run?
The Challenge will include two components. First, to help bring in researchers from other fields, a series of synthetic tumours of increasing difficulty will be simulated and made available to any team in the world, with a live leaderboard showing top results. Second, a set of 10 tumour-normal pairs from actual patients will be made available to any team, after approval of data-access by the ICGC Data Access Compliance Office. Importantly, methods will be evaluated in the real tumours by experimentally verification on the same patient DNA used for the original sequencing. Validation will be conducted for thousands (i.e., 5,000-10,000) of predictions via deep-sequencing using an independent technology, with the entire Challenge completed in about a year. Both somatic single-nucleotide and structural variation prediction accuracy will be benchmarked on both synthetic and patient-derived data, providing a global picture of mutation-detection accuracy.

The best performing methods will be applied retrospectively to over ten thousand cancer genomes, and the results distributed publicly to the research community via CGHub. Moreover, the top-scoring methods will be made available as an open source tools, allowing users around the world to process their own data using the same pipelines validated and used by the ICGC and TCGA. Challenge-assisted peer review and early editorial feedback will help identify publishable themes that cut across multiple approaches. The involvement of major journals introduces the possibility of reaching a broad audience and raises the impact and exposure of contestant contributions, which in turn increase incentives and overall morale. Nature Publishing Group has stepped up to coordinate publication models stemming from the SMC challenge.

What resources are available to Challenge participants?
The Challenge is run on the Synapse (https://www.synapse.org/) open computational platform. Synapse serves not just as a data repository but also as a set of tools for conducting collaborative analysis and sharing and documenting data, models and analysis methods. Synapse enables researchers to seamlessly and transparently conduct, track and share their ongoing work – building up living research projects in real-time.

GeneTorrent client, an open-source software developed by Annai Systems, is available for local data download. A comprehensive description of GeneTorrent features and operation is available on the CGHub website: https://cghub.ucsc.edu/docs/user/index.html

Google is offering Google Cloud Platform credits of $2,000 to approved DREAM contest participants, including free access to contest data in Google Cloud Storage. These credits can be used for Compute Engine VMs and other Cloud Platform services. Access to Challenge data is provided via a Google Cloud Storage bucket, so all computation and submissions can be performed on the Google Cloud Platform.

Who is running the Challenge?
  • Paul C. Boutros, Ontario Institute for Cancer Research
  • Lincoln D. Stein, Ontario Institute for Cancer Research
  • Josh Stuart, University of California, Santa Cruz
  • Gustavo Stolovitzky, IBM, DREAM
  • Stephen Friend, Sage Bionetworks
  • Adam Margolin, Sage Bionetworks
  • Thea Norman, Sage Bionetworks

The organizers include leaders of prominent national and international initiatives related to cancer-genome science. Leaders of the ICGC (Stein, Boutros) and TCGA (Stuart) cancer genomics projects will ensure broad exposure in the cancer genomics community and sanction that the results will set the standard for sequence analysis performed by the ICGC and TCGA. Challenge organizers also include leaders of DREAM Challenges (Stolovitzky, Friend, Margolin and Norman).

Where can I ask more questions?
We encourage all questions be posted on the ICGC-TCGA DREAM Mutation Calling Challenge Forum: http://support.sagebase.org/sagebase...ling_challenge

ECO 01-30-2014 10:49 AM

Stuck and front paged!

Richard Finney 02-01-2014 07:22 AM

Remove the period from the last URL : http://support.sagebase.org/sagebase...ling_challenge.

should be
http://support.sagebase.org/sagebase...ling_challenge

(edit, read the post carefully)
"The Challenge will include two components. First, to help bring in researchers from other fields, a series of synthetic tumours of increasing difficulty will be simulated and made available to any team in the world, with a live leaderboard showing top results. Second, a set of 10 tumour-normal pairs from actual patients will be made available to any team, after approval of data-access by the ICGC Data Access Compliance Office."

I sure wish they'd just use the real data and make it open, not relying on "Compliance Office".

Can't we make a human Tumor/Normal cancer data set available in the public domain?

khoulahan 02-03-2014 12:57 PM

Dear Richard,
Thank you for your comment. As you know, there are significant ethics and data-privacy issues revolving around the use of whole-genome sequencing data. The patients used in this study were consented in such a way that only access to their raw data through a DACO is permissible. Perhaps in the future, additional Challenges will be able to use public domain tumour/normal cancer data pairs. In the meantime, we have tried to facilitate the data-access approval process by providing template answers (https://www.synapse.org/#!Synapse:syn312572/wiki/60702) and of course have created the synthetic data for those who may not which to seek DACO approval. Thanks for your thoughts and suggestions!
Sincerely,
Katie on behalf of the ICGC-TCGA DREAM SMC Challenge Team

Michael.James.Clark 02-18-2014 01:37 PM

Let me just say that I think this is a very important project and I'm excited that it's getting off the ground. Somatic calling needs to undergo substantial advancement, so this project will be vital to that I think.

Is it going to be possible to get access to the DNA so that we can actually sequence it ourselves or will you only be sharing sequencing data that your groups produce?

khoulahan 02-19-2014 09:11 AM

Hi Michael,

We are encouraging all interested to post their questions on the DREAM forum located here: http://support.sagebase.org/sagebase...ling_challenge

Thank you,
Katie on behalf of the ICGC-TCGA DREAM SMC Challenge Team

Brian Bushnell 02-24-2014 05:28 PM

I have written the highest sensitivity and specificity short-read aligner, BBMap, and would love to apply it to this project, but I don't have time. It's ideal for cross-species and cancer mapping, as it will correctly map horribly mutated reads and is extremely tolerant of indels (even very long ones, such as missing genes). If anyone is interested in collaborating - meaning, me contributing nothing except supporting all questions/issues regarding BBMap - please PM me.

karencmartin 05-27-2014 09:47 PM

intro
 
hello guys

locday123 01-15-2016 11:34 PM

Stuck and front paged!


All times are GMT -8. The time now is 01:31 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.