Posted: March 28, 2012

TCGA Data and Cancer Systems Biology: Ex pluribus unum

Dr. Andrea Califano

Image: Dr. Andrea Califano

The Cancer Genome Atlas (TCGA) is providing an extraordinarily comprehensive set of complementary resources to help researchers elucidate the complete repertoire of mechanisms contributing to tumor initiation and progression.  Yet, perhaps surprisingly, no research field has benefited more from TCGA’s comprehensive profiling efforts within specific tumor types than cancer systems biology. Indeed, the success of this discipline is fundamentally rooted in the ability to generate accurate and informative models of cell regulation, including transcriptional, post-transcriptional, and post-translational processes determining normal cell physiology, whose dysregulation may lead to tumorigenesis. Necessary data to reverse engineer and interrogate these distinct, yet complementary regulatory layers in integrative fashion was essentially non-existent in the pre-TCGA world.

Pre-TCGA, efforts to understand global regulation had been stifled by the complexity of capturing multiple data modalities and by the difficulties of studying even simple relationships such as those between microRNA, gene copy number alteration, promoter methylation, and gene expression. Indeed, with few exception in non-cancer related fields, 1, 2 large-scale expression, microRNA, or copy number alteration profiles of human malignancies collected by individual labs or by a few consortia had been mostly studied in isolation, thus providing a useful, yet highly fragmentary picture of the underlying processes.3, 4

The TCGA program has dramatically altered this relationship, allowing for the first time integration of these complementary and highly interdependent layers. This has helped produce a better, more comprehensive picture of the dysregulated processes that contribute to oncogenesis and even to discover entirely new layers of regulations that could not have been glimpsed without these data. For instance, availability of microRNA and gene expression profiles for a large number of matched samples in glioblastoma, has allowed the inference of hundreds of thousands of mRNA-mRNA regulatory interactions that are mediated by microRNAs and yet do not depend on microRNA variability.5 Experimental validation of these interactions has helped address some of the missing variability associated with oncogenesis. For instance, 13 genes were identified, whose deletions in glioma contribute to PTEN inactivation through microRNA mediated interactions.5 Similarly, coordinated study of glioblastoma samples across gene expression, copy number alterations, methylation, and mutational data from TCGA has allowed the characterization of several glioblastoma subtypes, as well as a repertoire of genetic and epigenetic alterations that contribute to some of them,6,7 including specific temporal event patterns leading to gliomagenesis.8  Finally, analysis of TCGA data in combination with other datasets, has helped dissect genome-wide regulatory mechanisms that can be interrogated to identify synergistic regulation of tumor subtypes, candidate biomarkers for aggressive tumors, and potential therapeutic targets.9  

We are far from being done, as TCGA data modalities available to the research community have been heavily biased towards those that can be most effectively and economically profiled via microarray and sequencing technologies. This, unfortunately, does not encompass proteomics and in particular, the recently established ability to profile the phospho-proteome.10,11 Large-scale profiling of other omics layers, such as glycomics, metabolomics, and lipidomics for instance, are even further on the horizon. Yet, TCGA has created an extensible and highly scalable model to pursue the complete molecular characterization of large-scale repositories of high-quality, clinically annotated tumor samples. Thus, we have no doubt that the collection of omics data under the banner of the TCGA program is destined only to grow and that these more exoteric data modalities will eventually become available to the research community.  

Taken together, these contributions have had a tremendous impact in establishing the ability of cancer systems biology to elucidate critical mechanisms of oncogenesis and tumor progression that would have escaped more traditional approaches, thus helping this new discipline transition from theoretical promise to tangible value in the study of human malignancies.

1.Schadt, E.E., Monks, S.A., Drake, T.A., Lusis, A.J., Che, N., Colinayo, V., Ruff, T.G., Milligan, S.B., Lamb, J.R., Cavet, G., et al. (2003) Genetics of gene expression surveyed in maize, mouse and man. Nature 422: 297-302.

2.Schadt, E.E., Lamb, J., Yang, X., Zhu, J., Edwards, S., Guhathakurta, D., Sieberts, S.K., Monks, S., Reitman, M., Zhang, C., et al. (2005) An integrative genomics approach to infer causal associations between gene expression and disease. Nat Genet 37: 710-717.

3.Alizadeh, A.A., Eisen, M.B., Davis, R.E., Ma, C., Lossos, I.S., Rosenwald, A., Boldrick, J.C., Sabet, H., Tran, T., Yu, X., et al. (2000) Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403: 503-511.

4.Basso, K., Margolin, A.A., Stolovitzky, G., Klein, U., Dalla-Favera, R. and Califano, A. (2005) Reverse engineering of regulatory networks in human B cells. Nat Genet 37: 382-390.

5.Sumazin, P., Yang, X., Chiu, H.S., Chung, W.J., Iyer, A., Llobet-Navas, D., Rajbhandari, P., Bansal, M., Guarnieri, P., Silva, J., et al. (2011) An Extensive MicroRNA-Mediated Network of RNA-RNA Interactions Regulates Established Oncogenic Pathways in Glioblastoma. Cell 147(2): 307.

6.Verhaak, R.G., Hoadley, K.A., Purdom, E., Wang, V., Qi, Y., Wilkerson, M.D., Miller, C.R., Ding, L., Golub, T., Mesirov, J.P., et al. (2010) Integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1, EGFR, and NF1. Cancer Cell 17(1): 98-110.

7.Masica, D.L. and Karchin, R. (2011) Correlation of somatic mutation and expression identifies genes important in human glioblastoma progression and survival. Cancer Res 71: 4550-4561.

8.Cheng, Y.K., Beroukhim, R., Levine, R.L., Mellinghoff, I.K., Holland, E.C. and Michor, F. (2012) A mathematical methodology for determining the temporal order of pathway alterations arising during gliomagenesis. PLoS Comput Biol 8: e1002337.

9.Carro, M.S., Lim, W.K., Alvarez, M.J., Bollo, R.J., Zhao, X., Snyder, E.Y., Sulman, E.P., Anne, S.L., Doetsch, F., Colman, H., et al. (2010) The transcriptional network for mesenchymal transformation of brain tumours. Nature 463: 318-25.

10.Rikova, K., Guo, A., Zeng, Q., Possemato, A., Yu, J., Haack, H., Nardone, J., Lee, K., Reeves, C., Li, Y., et al. (2007) Global survey of phosphotyrosine signaling identifies oncogenic kinases in lung cancer. Cell 131(6): 1190-1203.

11.Bodenmiller, B., Mueller, L.N., Mueller, M., Domon, B. and Aebersold, R. (2007) Reproducible isolation of distinct, overlapping segments of the phosphoproteome. Nat Methods 4: 231-237.