Data Sharing and Data Management
The Cancer Genome Atlas Project (TCGA) yielded an unprecedented amount of genomic information on participant samples. The Data Coordinating Center (DCC) collected, stored and distributed the clinical and genomic data generated by the project. All information, except lower level sequencing data, were made available through the TCGA data portal.
The Data Coordinating Center (DCC) was tasked with:
- Protecting patient privacy and confidentiality through secure access for controlled access datasets
- Developing data standards and controlled vocabularies
- Establishing informatics pipelines for dataflow from production centers to a central repository for data access
Managing Lower Level Sequence Data in CGHub
As TCGA continued to generate more and more data, the size of lower level sequence files had made it necessary for these data to be stored in a larger database. The Cancer Genomics Hub (CGHub) was a secure repository for storing, cataloging and accessing lower levels of sequence data such as sequences and alignments. CGHub was located at University of California Santa Cruz.
Lower level sequence data and metadata were transmitted from the Genome Sequencing Centers (GSCs) andGenome Characterization Centers (GCCs) to CGHub, where they could be accessed by the research community.
The NCI Genomic Data Commons
In June 2016, NCI's Center for Cancer Genomics launched the Genomic Data Commons (GDC), a data-sharing platform for cancer research. The Genomic Data Commons now stores all TCGA data in one location and provides additional resources for data analysis. The TCGA data in the GDC is also now harmonized, meaning that it is standardized and data can be compared across different TCGA tumor types and with other genomics projects in the GDC. Learn more about the GDC.