• National Cancer Institute
  • National Human Genome Research Institute

Posted: October 30, 2013

Scientists Using TCGA Data Identify 21 Mutational Signatures in Cancer

Jean Hazel Mendoza

Many mutations have been implicated in human cancer, but the biological mechanisms that produce them remain largely unknown. In a study published online in Nature on August 14, 2013, researchers identified 21 signatures of mutational processes underlying 30 types of cancer. Characterizing mutational signatures may provide a greater understanding of the mechanistic basis of cancer and potentially lead to better treatments that target its root causes.

Scientists have long known that cancer is caused by mutations, changes in the genome of a cell that result from DNA damage and faulty repair processes. Using the latest sequencing technology, global initiatives such as The Cancer Genome Atlas (TCGA) have compiled catalogs of mutations from thousands of tumors. Yet, as the study’s lead researchers from the Wellcome Trust Sanger Institute (WTSI), U.K., previously described, a mutational catalog only presents an “archeological record,”1 like artifacts found during an excavation, but with few clues as to how they got there. To reconstruct the biological mechanisms behind the mutations, the research team searched for signatures—different patterns of mutations that reflect the variety of mutational processes operative in cancer.

Searching Mutational Catalogs for Signatures

The researchers analyzed almost five million mutations from about seven thousand samples of 30 different cancer classes, many from TCGA’s freely available datasets. The number of mutations varied widely between cancer types. For example, childhood cancers displayed about 0.1 mutations per megabase. On the other hand, lung cancer and melanoma, which are associated with chronic exposure to tobacco and ultraviolet light, respectively, carried about 10 mutations per megabase. This difference in mutation number suggests that mutations do accumulate over time and with exposure to carcinogens.

The scientists used an algorithm called the WTSI Mutational Signature Framework to extract the signatures suggested by the mutations.2 They looked at base substitution mutations, which replace one DNA base for another (for example, C is exchanged for A, G, or T; and T for A, G, or C). In addition, they examined the sequence context of each mutation, that is, the bases immediately adjacent to each mutated base. Thus, six classes of base substitution mutations and four types of bases on either side of the mutated base lead to a total of 96 possible substitution mutations.

From these 96 potential mutations, the researchers identified and validated 21 distinct mutational signatures. While some signatures displayed all 96 mutations in roughly equal amounts, others primarily exhibited only one or two mutation types.

For example, one signature called signature 2 harbored mostly C>T and C>G mutations at this series of DNA bases: TpCpN (where the underlined letter is the mutated base, p is the bond between bases, and N is any base). Found in more than half of the 30 cancer types, signature 2 most likely resulted from the overactivity of the APOBEC family of enzymes, which play a critical role in base excision repair and DNA replication.

Some cancer types showed only one or two mutational signatures, while others like liver, uterine, and stomach cancers bore six signatures, the most found in any cancer class. This wide range suggests the broad diversity and complexity of mutational processes in different cancer types.

Signatures with Transcriptional Strand Bias

The researchers also incorporated transcriptional strand bias into their analysis. This bias refers to the tendency of mutations to occur preferentially on the transcribed strand (the DNA strand that provides the template for DNA to be copied into RNA) or the untranscribed strand. From a total of 192 conceivable mutation arrangements in this classification (96 possible substitution mutations on each strand), scientists found strong transcriptional strand bias in seven of the 21 mutational signatures.

Most notably, signature 4 showed a preference for C>A mutations on transcribed strands and was found in cancers of the lung, head and neck, and liver. Tumors of the lung and head and neck are often associated with tobacco use. Indeed, the researchers found similarities between mutational patterns in signature 4 and mutations caused by tobacco carcinogens, many of which produce bulky adducts (covalently bound complexes of a DNA base and carcinogen) in DNA, derailing normal DNA repair and potentially leading to cancer. Still, with over 60 carcinogens present in cigarette smoke, the scientists noted that other mutational processes are likely in play.

Similarly, signature 7 demonstrated a strong bias for C>T mutations on untranscribed strands and was found in malignant melanoma. This mutational pattern is characteristic of mutations usually caused by ultraviolet light, where a base erroneously bonds to the base adjacent to it instead of across from it. This type of mutation distorts the double helix and prevents normal DNA replication and transcription.

Signatures with Insertions and Deletions

Finally, the researchers expanded their analysis to include insertions and deletions, or indels, and found significant numbers of indels in three of the 21 signatures. Notably, signature 6 harbored a pattern of indels known as microsatellite instability (MI), which are mutations in repeating short sequences, or microsatellites. Microsatellite instability has been known to play a role in cancers with damaged repair mechanisms of mismatched DNA base pairs. Indeed, signature 6 in colorectal cancer, which is characterized by high prevalence of MI, strongly correlated with defective DNA mismatch repair genes.

Signature 3 displayed a second pattern of indels that involve deletions with microhomology, which are matching sequences on separated DNA strands at a double-helix breakpoint. Indels with microhomology were found in breast, ovarian, and pancreatic cancer, which have been previously linked to mutations in the BRCA1 and BRCA2 genes. Indeed, signature 3 in these cancer types strongly associated with BRCA1 and BRCA2 mutations.

Each year, almost seven million people worldwide die from cancer, making it the leading cause of human death.3 Cancer is not a single disease, but encompasses more than 100 different classes, depending on the cell type and organ from which it originates. Through multiple layers of analysis of mutational catalogs, including those provided by TCGA, the researchers were able to identify 21 signatures of mutational processes in 30 cancer types. Investigating these mutational signatures may help piece together a clearer picture of the complex biological mechanisms underlying the broad spectrum of human cancers.


Alexandrov, L.B., Nik-Zainal, S., Wedge, D.C., Aparicio, S.A., Behjati, S., Biankin, A.V., Bignell, G.R., Bolli, N., Borg, A., Børresen-Dale, A.L, et al. (2013) Signatures of mutational processes in human cancer. Nature. 500(7463):415-421. Read the full article

Selected References

1 Alexandrov, L.B., Nik-Zainal, S., Wedge, D.C., Campbell, P.J., and Stratton, M.R. (2013) Deciphering signatures of mutational processes operative in human cancer. Cell Rep. 3(1):246–259. Read the full article

2 Ibid. Alexandrov, L.B., et al. have made the WTSI Mutational Signature Framework freely available for download here: http://www.mathworks.com/matlabcentral/fileexchange/38724.

3 World Health Organization. (2013) Cancer. http://www.who.int/mediacentre/factsheets/fs297/en/.