Quick Facts


The Cancer Genome Atlas (TCGA) is a comprehensive, collaborative effort led by the National Institutes of Health (NIH) to map the genomic changes associated with specific types of tumors to improve the prevention, diagnosis and treatment of cancer. Its mission is to accelerate the understanding of the molecular basis of cancer through the application of genome analysis and characterization technologies.

A pilot project initiated in 2006 established the scientific infrastructure and demonstrated the "proof of concept" needed to mount a large-scale, cancer genome mapping project. Based on this success, TCGA announced in September 2009 that it would map the genomes of at least 20 cancers over the next five years.

In December 2013, TCGA finalized tissue sample collection with 11,000 cases collected for genomic characterization. Today, 33 cancer types and subtypes have been analyzed, including 10 rare cancer types.

TCGA has made its data swiftly available to the worldwide research community, a model that has inspired the creation of international programs like the International Cancer Genome Consortium, in which TCGA participates.

To learn more about TCGA, go to http://cancergenome.nih.gov/

TCGA Cancer Types and Data

Types of Cancer

The pilot effort achieved comprehensive genomic characterization of brain (glioblastoma) and ovarian (serous cystadenocarcinoma) cancers and has since expanded to characterize the genomes of 33 cancer types and subtypes, including 10 rare cancers. At least 500 samples of tumor and matched normal tissues were characterized for the study of common cancer types and fewer cases for rare cancers. To learn more about the cancers chosen for study in TCGA, go to: http://cancergenome.nih.gov/cancersselected.

Types of Data

TCGA generates data on seven genomic facets of tumor and matched normal tissue:

  1. Whole exome sequence – The sequence of exons, segments of DNA that code for proteins 
  2. mRNA sequence – The sequence of messenger RNAs, genetic molecules that serve as intermediates between the DNA code and proteins
  3. microRNA sequence – The sequence of microRNAs, small RNA molecules that while not coding for protein, are an integral part of the regulation of gene expression
  4. DNA copy number profile – Analysis of the changes in the quantity of genomic fragments, as well as identification of structural rearrangements, or changes in the position of a genomic fragment from one area to another
  5. DNA methylation profile – Analysis of the placement of methyl groups, chemical additions to DNA that regulate gene expression in a reversible process, one aspect of genomic regulation called epigenetics
  6. Whole genome sequence – The sequence of the whole complement of DNA, both protein-coding and non-coding regulatory regions
  7. Reverse-phase protein array (RPPA) expression profile – Analysis of the expression and activation status of proteins

TCGA data also include limited clinical information, histopathology slide images, and metadata about the samples (e.g. the weight of the sample, etc.). 

Data Access

All data generated by the TCGA Research Network have been made rapidly available to the research community through the TCGA Data Portal.

There are two tiers of data:

  • Open Access – Data that cannot be aggregated to generate a data set unique to an individual. User certification is not required for accessing such data.
  • Controlled Access – Clinical data and other individually unique information. This tier of data requires user certification for access.

To reach the TCGA Data Portal and to learn more about its access policies, go to http://tcga-data.nci.nih.gov/tcga/.


Funding Institutions: National Cancer Institute (NCI) and National Human Genome Research Institute (NHGRI), both part of NIH.

Funding: $175 million in American Recovery and Reinvestment Act funds, plus $50 million each in appropriated funds from NCI and NHGRI, over two years.

Pilot Funding: The pilot effort was funded by $50 million each from NCI and NHGRI for three years.

TCGA Research Network

Number of Researchers: More than 150

Number of Institutions: More than two dozen nationwide

Major Research Centers:

  • Biospecimen Core Resource
    • The Research Institute at Nationwide Children’s Hospital, Columbus, Ohio
  • Genome Characterization Centers
    • Baylor College of Medicine, Houston, Texas
    • Brigham & Women's Hospital and Harvard Medical School, Boston, Mass.
    • British Columbia Cancer Agency, Vancouver, B.C., Canada
    • Broad Institute, Cambridge, Mass.
    • University of North Carolina, Chapel Hill, N.C.
    • Van Andel Institute, Grand Rapids, Mich.and Johns Hopkins University, Baltimore, Md.
    • MD Anderson Cancer Center, Houston, Texas
  • Genome Sequencing Centers
    • Baylor College of Medicine, Houston, Texas
    • Broad Institute, Cambridge, Mass.
    • Washington University School of Medicine, St. Louis, Mo.
  • Genome Data Analysis Centers
    • Broad Institute, Cambridge, Mass.
    • Institute for Systems Biology, Seattle, Wash.
    • Oregon Health and Science University, Portland, Oregon
    • Memorial Sloan-Kettering Cancer Center, New York, N.Y.
    • University of California, Santa Cruz, Santa Cruz, Calif.
    • University of North Carolina at Chapel Hill, Chapel Hill, N.C.
    • University of Texas M. D. Anderson Cancer Center, Houston, Texas
  • Data Coordinating Center
    • SRA International, Bethesda, Md.
    • University of California, Santa Cruz, Calif.

To learn more about the participating institutions, go to http://cancergenome.nih.gov/newsevents/multimedialibrary/interactives/howitworks.


For a list of publications by the TCGA research network, go to http://cancergenome.nih.gov/publications/TCGANetworkPublications.