Posted: June 28, 2013
TCGA Data Consumption by the Scientific Community
Julia Zhang, B.A.
Scientific Program Manager of The Cancer Genome Atlas Program Office
Since the inception of The Cancer Genome Atlas (TCGA) in 2006, almost 6,000 cases (with a case defined as tumor and a source of germline) have been characterized, representing more than 25 different tumor types. Extensive DNA/RNA/miRNA sequencing, expression, methylation, SNP and copy number data have been provided to the public via the TCGA Data Portal and the Cancer Genomics Hub (CGHub). In addition to generating the core datasets, the TCGA Research Network has published comprehensive tumor-specific marker papers that describe global analyses of the data, novel discoveries and clinical insights. The generation of comprehensive, high-quality cancer genomics data and initial integrative analysis of the data are important goals of TCGA. However, the data and reports are intended to be only starting points, and the participation of the broader research community in analyzing the datasets is necessary to realize their full value.
Therefore we examined three metrics of TCGA data usage to determine whether the scientific community has been consuming and analyzing TCGA data: (1) usage of the TCGA Data Portal and Cancer Genomics Hub, (2) grant applications that cite TCGA data in the research objectives and (3) publications that incorporate TCGA data.
Usage of the TCGA Data Portal
Figure 1. Number of unique visitors to the TCGA Data Portal from 2011-2012. Courtesy of NCI CBIIT.
Figure 1 shows the number of unique visitors to the TCGA Data Portal from 2011 to 2012*. In just two years, the number of unique visitors more than doubled from 3,386 to 8,267 between January 2011 and December 2012. This shows that more and more users are seeking TCGA data.
*Earlier usage statistics were unavilable.
Usage of the Cancer Genomics Hub
The Cancer Genomics Hub (CGHub) was established in late 2011 as a repository for the secure storage, cataloging and dissemination of large-scale primary sequence data for TCGA and other projects. As of December 2012, over 150 users have downloaded almost 4,400 terabytes of data. The top users come from a variety of institutions (universities, research centers, pharmaceutical companies, etc.) and from many different countries.
Figure 2. The number of grant applications that cite TCGA data submitted to the NIH each year from 2006-2012
Figure 2 shows the number of new grant applications submitted to the NIH that cite TCGA data in the research objectives. The sum of applications has increased from 46 applications submitted in 2006 (at the beginning of TCGA’s pilot project) to 215 applications submitted in 2012. This shows that more and more researchers are incorporating TCGA data in their projects and are seeking funding to analyze TCGA data. As of December 2012, almost 800 unique applications citing the use of TCGA data have been submitted.
Of the 800 grant applications submitted, 278 were awarded. This is an award rate of almost 35 percent. The award rate for all applications submitted to the National Cancer Institute is on average 15 percent.
Figure 3. The number of articles that analyzed TCGA data published each year from 2008-2012
Figure 3 shows the number of papers published that analyzed TCGA data has increased dramatically from three papers in 2008 to 157 papers in 2012. At the end of 2012, more than 265 papers were published in total. By May 2013, more than 350 papers have been published in total.
There has been wide speculation that almost all papers that analyze TCGA data are authored by members of the TCGA Research Network, however the opposite is true. Most of these papers (65 percent) were authored by non-TCGA consortium scientists, dispelling the misconception that TCGA data are only accessible to “insiders.”
TCGA’s first marker paper,
The Cancer Genome Atlas Research Network. (2008) Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature. 455 (7216):1061-1068.
remains in the top 10 of the most cited cancer research papers since 2008 (based on an analysis from the Scopus database).
Together, these observations suggest that TCGA is on its way to meeting goals of creating an atlas that is broadly used by the cancer research community. However, we have a long way yet to go as TCGA works to complete this phase of the catalog by the end of 2014. We continue to challenge the community to utilize the data to develop new algorithms, form new hypotheses, initiate molecularly-informed clinical trials and make discoveries that will inform how we will better diagnose, treat and prevent cancer in the future.