Next-Generation Pathology: TCGA Microscope Slides Helped Train an Automated Lung Cancer Diagnostic Tool

Amy E. Blum, M.A.

Image: Can you tell the difference? The image analysis pipeline developed using TCGA data predicted which of these lung adenocarcinoma slides of the same grade belonged to a patient who would survive long and which belonged to a patient who survived a short time after diagnosis. Image Credit: Yu et al. (2016) Nature Communications

A new fully automated pipeline can diagnose and distinguish between more and less aggressive lung cancers more accurately than conventional pathology methods. Using over 2,000 hemotoxylin and eosin (H&E) stained pathology slides of lung cancer from The Cancer Genome Atlas (TCGA), researchers from Stanford University trained a machine-learning pipeline to create the first computational model that analyzes image features to better predict patient outcomes.

In the current model of cancer care, pathologists diagnose cancer, distinguish between different types of cancer from the same tissue, and assign tumor stages and grades by interpreting histopathology slides viewed under a microscope. However, in the case of non-small cell lung cancer, expert pathologists agree on a particular diagnosis only approximately 60 percent of the time, and clinical grading is a weak predictor of the length of time that a patient is likely to survive.

More Than What Meets the Eye

To develop a better diagnostic and predictive method, the Stanford University research team applied a machine-learning approach to lung adenocarcinoma and lung squamous cell carcinoma histopathological sides from TCGA.

The automated pipeline that the researchers built could detect almost 10,000 image features, and 240 of these features were used to diagnose non-small cell lung cancers. These features included quantitative characteristics, such as the textures of cell nuclei and pixel intensity distributions, that would be difficult for the human eye to recognize. The trained computer identified these features using algorithms that find patterns in a million-pixel grid.

Using a statistical analysis of the model’s performance, the scientists determined that it could accurately distinguish between lung adenocarcinomas and lung squamous cell carcinomas.

A Glance Into the Crystal Ball

Using the same approach, coupled with patient data from TCGA and the Stanford TMA database, the researchers built predictive models for lung adenocarcinoma and lung squamous cell carcinoma survivorship. According to a statistical analysis, the workflow that the team developed successfully predicted long or short-term survivorship.

As this technology develops, quantitative analysis of pathology images may provide prognostic information that enhances precision oncology and helps guide clinical care.

Yu, K., Zhang, C., Berry, G.J., Altman, R.B., Re, C., Rubin, D.L., and Snyder, M. (2016) Predicting non-small cell lung cancer prognosis by fully automated microscopic pathology image features. Nature Communications. 7:12474. doi:10.1038/ncomms12474