• National Cancer Institute
  • National Human Genome Research Institute

Posted: June 12, 2014

CASE STUDY: Filling in the Gaps in the Catalog of Cancer Genes

Jean Hazel Mendoza

Image: Dr. Gad Getz, Ph.D.

Scientists have been hunting down cancer-causing genes since the first mutated gene in human cancer was discovered in 1982.1 While the catalog of cancer genes has greatly expanded, a recent study finds it remains far from complete. Analyzing nearly 5,000 patient samples from 21 tumor types, researchers identified 33 new cancer genes and determined that many more genes remain to be discovered.

“Are we done yet? Have we found all the cancer genes or are there more to be found?” are the questions posed in the study published January 23, 2014, in Nature, says Gad Getz, Ph.D., Director of Cancer Genome Computational Analysis at the Broad Institute of MIT and Harvard and Director of Bioinformatics in the Massachusetts General Hospital Cancer Center and Department of Pathology. He led the study along with Eric S. Lander, Ph.D., President and Founding Director of the Broad. “I think people didn’t know,” says Dr. Getz. “No one had a clue how much more is out there.”

Researchers have come to recognize that the cancer genome landscape is vast, but they didn’t know how much remains uncharted. Most cancer genes are mutated at high frequencies (above 20 percent), that is, they occur in more than one in every five patients. For example, TP53 is the most commonly mutated gene in human cancer, affecting about half of all patients.2 Frequently mutated genes have been mostly discovered. However, most genes are mutated at what the authors define as intermediate frequencies (two to 20 percent) or lower, appearing in a much smaller percentage of patients. Most patients have mutations in cancer genes within intermediate frequencies. While frequently mutated genes seem to be the “low-hanging fruits” of cancer genomics, infrequently mutated genes have long eluded researchers.

To illustrate how scientists detect cancer genes, Dr. Getz compares the genome to a dart board. During a person’s lifetime, many mutations accumulate in the genome, like darts thrown onto the dart board. Most of the darts are what are known as “passengers,” mutations that do not contribute to cancer, while only a few are “drivers,” mutations that give cells the competitive advantage to grow into a tumor. The main challenge is that “you see all these darts in the genome, but you don’t know which ones are actually the ones that caused the cancer,” says Dr. Getz. “So in this single patient, you have no clue which ones are the drivers.”

To solve this problem, Dr. Getz and his team examined the genomes of many patients—nearly 5,000 tumor and matching normal-tissue samples spanning 21 types of cancer, many from The Cancer Genome Atlas (TCGA). Armed with a larger sample size, they could better find which genes, or areas on the dart boards, contain more mutations than expected by chance. A dense concentration of mutations in one gene could signal the presence of driver mutations. As Dr. Getz says, “If the darts are thrown randomly on this dart board, the chance of hitting the same spot many times … is very low—and that’s when we get excited.”

Dr. Getz and his colleagues analyzed the samples using an algorithm called MutSig that weeds out passenger from driver mutations. MutSig takes into account the number of mutations, clusters of mutations or “hotspots” within a gene, and mutations in functional sites of a gene that have been evolutionarily conserved. Together, these characteristics indicate if there is a strong chance that a mutation is not merely a random passenger, but a critical driver.

Using MutSig, the researchers identified nearly all known cancer genes in the 21 tumor types as well as 33 novel ones. They were able to detect the new genes by combining samples from the different cancer types, demonstrating the benefits of cross-tumor analysis, as was seen in the TCGA Pan-Cancer project. Closer inspection revealed that the novel genes are involved in cell proliferation, apoptosis, genome stability, and other functions strongly related to cancer.

Furthermore, the group found that the discovery of cancer genes is not saturating any time soon. While the number of genes mutated at high frequencies being discovered is plateauing, that of genes at intermediate frequencies is climbing rapidly. “Ones that are between 10 percent and 20 percent [frequencies] are still rising. So as we are adding more samples, we are finding more and more cancer genes,” says Dr. Getz. “That means that we need many more samples in order to find them.”

Exactly how many samples? Using mathematical power analysis, the researchers calculated that detecting mutated cancer genes down to the two percent frequency would require an average of 2,000 tumors for each type of cancer. Consequently, characterizing the 50 most common cancers, which are likely to have “the [greatest] impact on human life,” says Dr. Getz, would mean acquiring samples from about 100,000 patients. “There are millions of patients—many millions of patients—with cancer living in the world right now,” he says. “So a hundred thousand is not a huge number.”

A complete catalog of cancer genes would serve as a foundation for precision medicine, a term for the diagnosis and treatment of disease that is tailored to the individual’s genome. “We want to cure cancer,” says Dr. Getz. A comprehensive understanding of cancer genes and the pathways and mechanisms they are involved in, he says, is “one of the building blocks toward precision medicine … [and] toward the curing of cancer.”

When Dr. Getz considers the unknown cancer genes that have yet to populate the catalog, he sees the periodic table of elements at a time when scientists were still trying to fill in gaps in the table. “It’s something you do once in history—to figure out what are the key genes in cancer—and I think [we have] the technology to do it today with a reasonable amount of effort,” says Dr. Getz. “I think we should do it.”


Lawrence, M.S., Stojanov, P., Mermel, C.H., Robinson, J.T., Garraway, L.A., Golub, T.R., Meyerson, M., Gabriel, S.B., Lander, E.S., and Getz, G. (2014) Discovery and saturation analysis of cancer genes across 21 tumour types. Nature. 505(7484):495-501. Read the full article

Selected References

1 Futreal, P.A., Coin, L., Marshall, M., Down, T., Hubbard, T., Wooster, R., Rahman, N., and Stratton, M.R. (2004) A census of human cancer genes. Nat Rev Cancer. 4(3):177-83. Read the full article

2 Genetics Home Reference. (2014) TP53. U.S. National Library of Medicine. http://ghr.nlm.nih.gov/gene/TP53