• National Cancer Institute
  • National Human Genome Research Institute
LEADERSHIP UPDATE

Posted: April 1, 2014

TCGA's Pan-Cancer Efforts and Expansion to Include Whole Genome Sequence

Image: Dr. Carolyn Hutter

Carolyn Hutter, Ph.D.
Program Director of the Division of Genomic Medicine at the National Human Genome Research Institute (NHGRI)

In 2013, TCGA’s ‘Pan-Cancer’ analysis on over 5,000 cases from 12 tumor projects (see figure) was featured in Nature Genetics with a complementary focus website, which presented over 15 papers and 5 thematic threads. The threads highlight key findings for mutational drivers, network models, exposures and pathogens, data discovery and future directions.

TCGA is currently expanding efforts to characterize commonalities, differences, and emergent themes across cancer types in collaboration with the International Cancer Genome Consortium (ICGC) through the Pan-Cancer Analysis of Whole Genomes (PAWG) project. The goal is to analyze the genomes, including genome-wide sequence data, of approximately 2000 pairs of tumor and normal samples, and integrate those results with clinical and other molecular data on the same cases. The genomic sequence data will be available to the research community through the TCGA Data Portal, CGHub, and the ICGC Data Repository. Investigators around the globe will lead analysis in a number of scientific areas, including: integration of transcriptome and genome analyses, patterns of structural variations, novel somatic mutation-calling methods, evolution and heterogeneity, and germline cancer genome variation.

Integrated data set for comparing and contrasting multiple tumor types.Figure 1: Integrated data set for comparing and contrasting multiple tumor types. The Cancer Genome Atlas Research Network, Weinstein, J.N., Collisson, E.A., Mills, G.B., Shaw, K.M., Ozenberger, B.A., Ellrott, K., Shmulevich, I., Sander, C., and Stuart, J.M. (2013) The Cancer Genome Atlas Pan-Cancer analysis project. Nat Genet. doi:10.1038/ng.2764. Read the full article.

The TCGA/ICGC PAWG will capitalize on existing TCGA data and infrastructure, and will incorporate information from other NIH-funded projects, such as the Encyclopedia of DNA Elements (ENCODE), the Genotype-Tissue Expression (GTEx) Program and the Roadmap Epigenomics Program. As with other TCGA Pan-Cancer efforts to date, this work represents a significant effort and underscores the importance of team science. Using integrative approaches, investigators will be better able to distinguish the signal from the noise and focus on functionally relevant genomic alterations, pathways and mechanisms. However, whole genome analysis also poses a number of key challenges and research needs, such as improved approaches for computing on petabytes of data, more robust standards for cross-project mutation calling, and more effective methods for analyzing and interpreting non-coding variation.

Overall, combining whole genome sequence analysis and comprehensive genomic characterization in this coordinated cross-cancer analysis will enhance our knowledge of cancer genomics and biology. Such work will move TCGA closer towards our goal to improve our ability to diagnose, treat and prevent cancer. Furthermore, the advances in this project will extend beyond cancer research, as the improved capabilities in whole genome sequence analysis and interpretation will be applicable to studies of other diseases and of biology in general.