Posted: June 14, 2012
Large Scale Genome Storage Facility Realized in CGHub
Liming Yang, Ph.D.
Bioinformatics Program Director, The Cancer Genome Atlas Program Office
On January 25, 2012, The Cancer Genome Atlas (TCGA) announced the beta launch of the Cancer Genomics Hub (CGHub) as the new secure repository for storing, cataloging, and accessing genome and RNA sequences and alignments from TCGA. CGHub is managed by the University of California, Santa Cruz (UCSC), under a subcontract from SAIC-Frederick . CGHub plans to store the sequencing data for the National Cancer Institute’s large scale cancer genome projects, such as TCGA, Therapeutically Applicable Research to Generate Effective Treatments, and the Cancer Genome Characterization Initiative. CGHub’s goal is to facilitate the use of TCGA data in scientific research projects.
Today, CGHub is out of beta and into the regular version. It has been designed to hold sequence data for 25,000 cases with an average value of 200 gigabytes per case. In the upcoming months, CGHub will add over 100 terabytes of new data. By way of comparison, that’s more than 12 times the data generated by the Human Genome Project. The team is working hard to make sure these valuable data are available to the cancer research community. They are also developing an interactive, web-based, user-friendly interface, in addition to the command line driven interface available now.
CGHub aims to be one of the largest sequence repositories in the world. As the cost of sequencing decreases, more sequence data are generated for research and clinical uses. It is predicted that genome sequencing will one day be part of the standard of care in diagnostics and treatment. A sophisticated secure system such as CGHub will play a critical role in basic and clinical cancer research.
However, CGHub can only facilitate advances in cancer research if investigators take advantage of it. Primary data from TCGA are provided to the cancer research community for use in any legitimate research project. CGHub will ease access to these data and maximize the resources provided to researchers by TCGA.