View Full Version : Planning a cancer exome sequencing project

05-04-2011, 10:55 AM
Hello everyone,
Hope you having a fine day!

I have recently got interested into NGS and its applications for studying cancer genomes. My group is interested in studying somatic mutation landscape of a particular tumor. After going through some literature and considering my current funding, I have decided to pursue exome sequencing.

The question for which I am or will be using exome sequencing is to see if there is any co-relation between the mutations harbored in the cancer exome and the cancer progression..

Do you think 5 tumor samples will be a good number to start with? We will be using Illumina genome analyzer IIx, PE 58 bp...

Simon Anders
05-04-2011, 11:16 PM
No. You are hopelessly underpowered.

See this thread (http://seqanswers.com/forums/showthread.php?t=10813) and maybe this News&Views article (http://www.nature.com/news/2011/110402/full/news.2011.203.html).

05-09-2011, 10:00 AM
Thanks Simon for your response and sharing the relevant links.

Have you looked into these studies?




For whole exome sequencing, the first study sequenced 14 tumor and matched normal DNA and the second study sequenced 7 tumor and matched normal DNA. Considering these two reports both getting published in highly reputable journal, why do you think 5 or 6 tumor DNA with their matched normal DNA will be 'hopelessly underpowered'???

Simon Anders
05-09-2011, 10:55 AM
These two studies set out to find mutations that are frequent in a given tumour type. You have asked about correlating mutation with disease progression. This is quite a difference!

Varela et al. and Wei et al. set out to find mutations which are frequent in a given cancer type, as opposed to mutations at random places. Given the large size of the human genome, it is in fact unlikely to see the same mutation twice unless this is a hotspot. (Nevertheless, both studies amplified the loci they saw mutated more than once in more than a hundred additional samples to make sure.)

Finding such mutation is one thing. Figuring out which of these mutations are markers for good or bad prognosis is much more challenging. Imagine, you have 5 samples, two from patients who died soon afterwards and three from patients who survived long. Now, you find a mutation in three of your five samples, namely in both the bad and only in one of the three good cases. Of course, you cannot claim that this indicates that the mutation marks a bad prognosis without checking the mutation in samples from very many more patients.

05-09-2011, 10:56 AM

Your first post makes it seems like you are interested in cancer progression, as opposed to simple association of mutations with cancer. For the latter there are only two distinct types of samples, those with cancer and those without. For the former there are many more distinct types of samples, those without cancer, those with initial signs of cancer, those with more developed cancer, etc. Or, probably more appropriately, you would need samples from cancer patients who have had their cancer progress to a severe stage and compare mutations in different patients with different rates of progression. I'm guessing that would make it much more underpowered.

05-09-2011, 11:23 AM
Thank you Simon and Heisman.

To be more precise, what I meant by cancer progression was cancer metastasis. My hypothesis is that primary tumors (tumor from the original site) must harbor the genomic alterations that would indicate that the tumor has metastasized. In this case, primary tumors from patients with no lymph nodes and distant metastasis will be different from primary tumors from patients with positive lymph nodes atleast.

To test this hypothesis and considering the limited funding, do you think searching for mutations through WES in two distinct sets of total 6 primary tumors (n = 3 + 3) followed by screening of hotspots in a bigger pool of samples (n = 100 or more) will be a viable idea?

Please don't mind as I have just started to learn about NGS and would really like your responses. Thank you once again.

05-09-2011, 08:38 PM
The challenge you will face with most tumors is that there are likely to be huge numbers of mutations in the tumors, but it depends on the tumor type. Also, with only 6 primaries you'll only be likely to find very frequent hotspots (roughly those present at frequency 1/6, but of course it's more complicated than that -- and too late at night for me to even attempt to botch the calculation!). The number of mutations present at 15% or more is relatively limited compared to the many biologically important mutations which are not. The odds that you'll find a shared mutation specific to the distant metastasis samples with that kind of frequency are quite poor.

Also, it may well be that the distant metastasis arises from a relatively rare clone within the primary, which means it will be challenging to detect.

For the number of samples you can afford, I would think a project more likely to yield something would be to compare matched primary & distant metastasis samples ala Ding (http://www.ncbi.nlm.nih.gov/pubmed/20393555) or Jones (http://www.ncbi.nlm.nih.gov/pubmed?term=20696054).

You should look very carefully at what your cost internally is to run this vs. going outside; a number of commercial providers will generate human exomes for about $2K each (50X coverage). They are getting efficiencies from using the HiSeq & buying the capture reagents in large quantities. I'm not guaranteeing it will be cheaper, but it could well be since you are proposing to use a GAIIx. The one thing to be careful of is what coverage the provider will guarantee vs. what you want, but you need to watch that with an internal project as well.

Finally, I would (of course! :-) recommend you get a copy of Brief Bioinform. 2010 Sep;11(5):524-34 (http://bib.oxfordjournals.org/content/11/5/524.abstract); it won't cover what's been published since February of 2010 & it doesn't really cover the experimental design issue you are getting into, but does give an overview of pretty much everything in cancer genomics with NGS up to them. I don't know of a more up-to-date review.