Genome Sequencing Centers
The Cancer Genome Atlas (TCGA) Genome Sequencing Centers (GSCs) performed large-scale DNA sequencing on two DNA samples from every TCGA cancer case – one from the tumor specimen and the second from either blood or non-malignant tissue to serve as a control. Supported by the National Human Genome Research Institute (NHGRI) large-scale sequencing program, the GSCs generated the enormous volume of data required by TCGA, while continually improving existing technologies and methods to expand the frontier of what can be achieved in cancer genome sequencing.
The evolution of the sequencing approaches used for TCGA can be seen in this brief timeline:
- October 2008: TCGA publication on the glioblastoma multiforme genome included polymerase chain reaction/Sanger dideoxy method for sequencing 601 target genes. At the same time, GSCs were validating protocols using new second-generation sequencing instruments.
- March 2009: GSCs introduced hybrid-capture procedure and second-generation sequencing instruments (Illumina and ABI SOLiD) to enable analysis of more than 6,000 known cancer-associated target genes and at production scale.
- July 2009: GSCs submitted first of 24 whole genome sequence (i.e., entire 6 billion nucleotides from both tumor and blood specimens from a cancer case) datasets from the glioblastoma multiforme and ovarian tumor projects.
- January 2010: GSCs validated whole exome capture methods, thereby expanding analysis of each tumor sample from 6,000 genes to all protein-coding and RNA genes.
Whole Exome vs. Whole Genome
The GCCs analyzed the majority of TCGA samples by whole exome sequencing using second-generation sequencing instruments. Whole exome sequencing means that all of the DNA that codes for protein products (called exons) were sequenced, but not the DNA sequences that do not directly code for proteins. Approximately 10 percent of the samples from each TCGA tumor project instead underwent whole genome sequencing, which sequences every base pair of DNA and can reveal alterations that lie outside of the exome regions. All TCGA sequencing data are now in the Genomic Data Commons.
The following programs, funded by NHGRI's Large-Scale Sequencing Research Network, were TCGA's GSCs:
- Broad Institute Genomics Platform, Broad Institute, Cambridge, Mass.
Principal Investigator: Eric Lander, Ph.D.
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas
Principal Investigator: Richard Gibbs, Ph.D.
- The McDonnell Genome Institute at Washington University, Washington University School of Medicine, St. Louis, Mo.
Principal Investigator: Richard Wilson, Ph.D.