In asexual populations that don’t undergo recombination, such as cancer, deleterious mutations are expected to accrue readily due to genome-wide linkage between mutations. Despite this mutational load of often thousands of deleterious mutations, many tumors thrive. How tumors survive the damaging consequences of this mutational load is not well understood. Here, we investigate the functional consequences of mutational load in 10,295 human tumors by quantifying their phenotypic response through changes in gene expression. Using a generalized linear mixed model (GLMM), we find that high mutational load tumors up-regulate proteostasis machinery related to the mitigation and prevention of protein misfolding. We replicate these expression responses in cancer cell lines and show that the viability in high mutational load cancer cells is strongly dependent on complexes that degrade and refold proteins. This indicates that upregulation of proteostasis machinery is causally important for high mutational burden tumors and uncovers new therapeutic vulnerabilities.
Statement of Significance
Cancers can successfully survive an accumulation of thousands of protein-damaging mutations. Here, we show that high mutational load tumors mitigate these damaging consequences by up-regulating complexes that buffer against protein misfolding stress – providing novel therapeutic vulnerabilities and suggesting that disruption of proteostasis is a hallmark of somatic evolution.
Tilk and colleagues present a valuable computational analysis of tumor transcriptomes to investigate the hypothesis that the large number of somatic mutations in some tumors is detrimental such that these detrimental effects are mitigated by an up-regulation by pathways and mechanisms that prevent protein misfolding. The authors address this question by fitting a model that explains the log expression of a gene as a linear function of the log number of mutations in the tumor and show that specific categories of genes (proteasome, chaperones, ...) tend to be upregulated in tumors with a large number of somatic mutations. Some of the associations presented could arise through confounding, but overall the authors present solid evidence that mutational load is associated with higher expression of genes involved in mitigation of protein misfolding - an important finding with general implications for our understanding of cancer evolution.
Cancer develops from an accumulation of somatic mutations over time. While a small subset of these mutations drive tumor progression, the vast majority of remaining mutations, known as passengers, don’t help and might hinder cancer growth. The role that passengers play in tumor progression has traditionally received little attention despite their abundance and variation across cancer types. The number of passengers in a tumor can vary by over four orders of magnitude, even within the same cancer type, from just a few to tens of thousands of point mutations1.
Whether these passengers are neutral or damaging to tumors has long been a matter of debate2–10. Some have argued that passengers are functionally unimportant to tumors given that most non-synonymous mutations are not removed by negative selection in somatic tissues2,3. This is in direct contrast to the human germ-line, where non-synonymous mutations are functionally damaging to most genes11 and signals of negative selection are pervasive3. The common explanation for why damaging protein-coding mutations are removed in the human-germline but maintained in somatic tissues is that most genes are only important for multi-cellular function at the organismal level (e.g. during development), but not during somatic growth2,12.
However, the notion that non-synonymous mutations are only selectively neutral in somatic tissues is surprising given their known functional consequences in the germ-line. Non-synonymous mutations are known to be damaging in the human germ-line due to their effects on protein folding and stability13, which ought to be shared between somatic and germline evolution. An alternative explanation is that non-synonymous mutations are indeed damaging in somatic evolution, but negative selection is too inefficient at removing them due to linkage effects driven by the lack of recombination in somatic cells10. Without recombination to break apart combinations of mutations, selection must act on beneficial drivers and deleterious passengers that arise in the same genome together. This makes it less efficient for selection to individually favor beneficial drivers or remove deleterious passengers14. As a result, a substantial number of weakly damaging passengers can accrue in cancer due to inefficient negative selection over time. In support of this model, tumors with very small numbers of passengers – where linkage effects are expected to be negligible – have recently been shown to exhibit signatures of negative selection and weed out damaging non-synonymous mutations10. In contrast, the remaining majority (>95%) of tumors, which contain much larger numbers of linked mutations, display patterns of inefficient negative selection. This provides evidence in favor of the inefficient selection model and implies that most tumors carry a correspondingly large deleterious mutational load.
If individual passengers are in fact substantially damaging in cancer, successful tumors with thousands of linked mutations must find ways to maintain their viability by mitigating this large mutational load. While paths to mitigation are difficult to predict for non-coding mutations, tumors with mutations in protein-coding genes are expected to minimize the damaging phenotypic effects of protein mis-folding stress. Here, we investigate this hypothesis by analyzing tumor tissues with paired mutational and gene expression profiles to assess how the physiological state of cancer cells change as they accumulate protein coding mutations. Using a general linear mixed effects regression model (GLMM), we leverage variation across 10,295 tumors from 33 cancer types and find that complexes that re-fold proteins (chaperones), degrade proteins (proteasome) and splice mRNA (spliceosome) are up-regulated in high mutation load tumors. We validate these results by showing that similar physiological responses occur in high mutational load cancer cell lines as well. Finally, we establish a causal connection by showing that high mutational load cell lines are particularly sensitive when proteasome and chaperone function is disrupted through downregulation of expression via short-hairpin RNA (shRNA) knock-down or targeted therapies. Collectively, these data indicate that the viability of high mutational load tumors is strongly dependent on the up-regulation of complexes that degrade and refold proteins, revealing a generic vulnerability of cancer that can potentially be therapeutically exploited.
Quantifying transcriptional response to mutational load in human tumors
We first performed a genome-wide screen to systematically identify which genes are transcriptionally upregulated in response to mutational load in human tumors. To do so, we utilized publicly available whole-exome and gene expression data from 10,295 human tumors across 33 cancer types from The Cancer Genome Atlas (TCGA)15,16. We considered multiple classes of mutations to define mutational load and investigated their degree of collinearity, focusing on protein-coding regions since the use of whole-exome data limits the ability to accurately assess mutations in non-coding regions. We find that there is a high degree of collinearity among synonymous, non-synonymous and nonsense point mutations in protein coding genes (R > 0.9) but weak collinearity between point mutations and copy number alterations (R < 0.05) (Supplemental Figure 1). Thus, we decided to focus on the aggregate effects of protein-coding mutations and for all analyses defined mutational load as log10 of the total number of point mutations in protein-coding genes. For simplicity, we used all mutations rather than focusing only on passenger mutations since identifying genuine drivers against a background of linked passenger events can be difficult, especially for tumors with many mutations.
Since gene expression can vary across tumors due to many factors, such as cancer type, tumor purity and other unknown factors, we utilized a generalized linear mixed model (GLMM) to measure the association of mutational load and gene expression while accounting for these potential confounders (Fig. 1A). Within the GLMM, tumor purity and mutational load were modeled as fixed effects whereas cancer type was modeled as a random effect since it varies across groups of patients and can be interpreted as repeated measurements across groups. The following GLMM was applied separately to each gene,
where Y is a vector of normalized expression values across all tumors, β0 is the fixed intercept, β1 is the fixed slope for the predictor variable X1 which is a vector of mutational load values for each tumor, β2 is the fixed slope for the predictor variable X2 which is a vector of the purity of each tumor, v is the random intercept for each cancer type, and e is a Gaussian error term (Methods).
Using this approach, we applied the GLMM to all tumors in TCGA and identified 5,330 genes that are significantly up-regulated in response to mutational load (β1 > 0, FDR < 0.05). Next, we linked these genes to cellular function by performing gene set enrichment to known protein complexes (CORUM database17, Fig. 1B) and pathways (KEGG database18, Fig. 1C) using gprofiler219. As expected for tumors with many mutations, pathways and protein complexes related to cell cycle, DNA replication and DNA repair were enriched in tumors with a high mutational load. However, some of the most significant enrichment terms were for protein complexes and pathways that regulate translation (mitochondrial ribosomes), protein degradation (proteasome complex), and protein folding (CCT complex/HSP60), consistent with the hypothesis that high mutational load tumors experience protein misfolding stress. Surprisingly, we also found that the spliceosome, a large protein complex that regulates alternative splicing in cells, is up-regulated in response to mutational load. This suggests that transcription itself could also be regulated in response to protein misfolding stress, as seen in other studies20,21. In addition, we confirmed that the same proteostasis complexes are identified when performing gene set enrichment analysis only genes with the largest effect sizes from the transcriptional screen (in the upper quartile of β1 regression coefficients), which contain half the number of significant genes as identified previously (N=2,152 vs 5,330; Supplemental Figure 2).
Gene silencing through alternative splicing in high mutational load tumors
We next investigated in detail how these protein complexes could mitigate the damaging effects of protein misfolding in high mutational load tumors by examining the role of the spliceosome in gene silencing. We hypothesized that the up-regulation of the spliceosome in high mutational load tumors prevents further protein misfolding by regulating pre-mRNA transcripts to be degraded rather than translated. The down-regulation of gene expression via alternative splicing events, such as intron retention, is one known mechanism to silence genes by funneling transcripts to mRNA decay pathways.22–24
To test whether gene expression is down-regulated in high mutational load tumors through intron retention, we utilized previously called alternative splicing events in TCGA25. Alternative splicing events within this dataset were quantified through a metric called percent spliced in or PSI. PSI is calculated as the number of reads that overlap the alternative splicing event (e.g. for intron retention, either at intronic regions or those at the boundary of exon to intron junctions) divided by the total number of reads that support and don’t support the alternative splicing event. Thus, PSI estimates the probability of alternative splicing events only at specific exonic boundaries in the entire transcript population without requiring information on the complete underlying composition of each full length-transcript.
Using these alternative splicing calls, we reasoned that if a transcript contains an intron retention event and is downregulated in expression, the transcript is more likely to have been degraded by mRNA decay pathways. For all genes, we first quantified whether intron retention events were present based on a threshold value >80% PSI. For each gene with an intron retention event, we quantified whether the expression of the same gene was under-expressed. Each gene was counted as under-expressed if it was one standard deviation below the mean expression within the same cancer type. To control for mutations that might affect patterns of expression, (i.e., expression quantitative trait loci or eQTL effects), alternative splicing events that contained a point mutation within the same gene were removed from the analysis (which only represent ∼1% of intron retention events across all tumors; Methods). We find that relative to all transcripts with intron retention events, the number of transcripts that are under-expressed increases with tumor mutational load (Fig. 2A), suggesting that the degree of intron-retention driven mRNA decay is elevated in high mutational load tumors. This trend is robust to other PSI value thresholds (>50-90% PSI), even for other alternative splicing events (e.g., exon skipping, mutually exclusive exons, etc.) and when not filtering for potential eQTL effects (Supplemental Figure 3 and 4).
We next investigated which genes are more likely to be silenced through mRNA decay between low and high mutational load tumors. For each intron retention event, we calculated whether PSI values were significantly different in low mutational load tumors (<10 total protein-coding mutations) compared to high mutational load tumors (>1000 total protein-coding mutations) using a t-test. This approach identified 606 and 201 genes that have more and less intron retention events in high mutational load tumors, respectively. Using gene set enrichment analysis, we find that cytoplasmic ribosomes contain more intron retention events in high mutational load tumors, potentially leading to their down-regulation through mRNA decay to prevent further protein mis-folding (Fig. 2B). Genes that contain fewer intron retention events in high mutational load tumors, which are less likely to undergo mRNA decay, are primarily related to mRNA splicing.
Regulation of translation, protein folding and protein degradation in high mutational load tumors
Next, we investigated in detail how the remaining proteostasis complexes that were significant in our genome-wide screen, which regulate protein synthesis, degradation and folding, could mitigate protein misfolding in high mutational load tumors. To do so, we expanded our gene sets to include other chaperone families, all ribosomal complexes and proteasomal subunits (Fig. 3A). Using the GLMM framework detailed above, we find that the expression of nearly all individual genes in chaperone families that participate in protein folding (HSP60, HSP70 and HSP90), protein disaggregation (HSP100), and have organelle-specific roles (ER and mitochondrial) are significantly up-regulated in response to mutational load. Interestingly, however, small heat shock proteins, which don’t participate in protein folding or disaggregation, are significantly down-regulated in response to increased protein coding mutations. The role of small heat shock proteins is primarily to hold unfolded proteins in a reversible state for re-folding or degradation by other chaperones26 and thus, could possibly be down-regulated due to their inefficiency in mitigating protein misfolding.
We further examined differences in expression of different structural components of the proteasome, a large protein complex responsible for degradation of intracellular proteins. Consistent with the over-expression of chaperone families that mitigate protein mis-folding, both the 19s regulatory particle (which recognizes and imports proteins for degradation) and the 20s core (which cleaves peptides) of the proteasome are up-regulated in response to mutational load in TCGA (Fig. 3A). In addition, we find that specifically mitochondrial — but not cytoplasmic — ribosome complexes are up-regulated in high mutational load tumors. As previously reported in yeast27 and human cells28, mitochondrial ribosome biogenesis has been shown to occur under conditions of chronic protein misfolding as a mechanism of compartmentalization and degradation of proteins. In contrast, translation of proteins through cytosolic ribosome biogenesis has been previously characterized to be attenuated and slowed to prevent further protein mis-folding29. This decrease in expression of cytoplasmic ribosomes is also consistent with observed patterns of alternative splicing coupled to mRNA decay pathways in high mutational load tumors (Fig. 2B).
Finally, we performed a jackknife re-sampling procedure to confirm that specific cancer types aren’t driving patterns of association within the GLMM. This was achieved by removing each cancer type from the regression model one at a time, and re-calculating regression coefficients on the remaining set of samples. Overall, regression coefficients were stable across cancer types and trends were unchanged (Supplemental Figure 5). In addition, we also performed linear regression within cancer types and found similar expression responses to mutational load across proteostasis complexes (Supplemental Figure 6). Finally, we also confirmed that patient age was not driving patterns of association of mutational load and gene expression within the GLMM (Supplemental Figure 7). Taken together, this suggests that protein re-folding, protein disaggregation, protein degradation, and down-regulation of cytoplasmic translation are potential mechanisms to mitigate and prevent protein misfolding in high mutational load tumors.
Validating proteostasis expression responses in cancer cell lines and establishing a causal connection through perturbation experiments
We next sought to validate these results by first examining whether the expression patterns observed in human tumors replicate within cancer cell lines from the Cancer Cell Line Encyclopedia (CCLE)30. Unlike TCGA, samples within each cancer type in CCLE can be small and are unbalanced (i.e., some cancer types have <10 samples and others have >100 samples). Since GLMMs may not be able to estimate among-population variance accurately in these cases31, we utilized a simple generalized linear model (GLM) instead to measure the effect of mutational load on patterns of expression without over-constraining the model. Indeed, we find that expression patterns seen in human tumors broadly replicate in cancer cell lines (Fig. 3). Similar to the expression analysis in TCGA, we also confirmed through a jackknife re-sampling procedure that specific cancer types aren’t driving patterns of association within the GLM (Supplemental Figure 8). Finally, we further validated these trends by incorporating protein abundance estimates in CCLE, which contains the largest dataset available of RNA (n=1377) and protein (n=373) abundances that are harmonized across samples. We find similar patterns of expression and protein abundances in response to mutational load in CCLE within proteostasis complexes (Supplemental Figure 9).
Overall, this indicated that the expression patterns observed are cell autonomous (i.e., independent of organismal effects such as the immune system, age or microenvironment) and consistent across high mutational load cancer cells. Importantly, it also demonstrates that cancer cell lines are a reasonable model to causally interrogate these effects further through functional and pharmacological perturbation experiments.
To establish a causal relationship between the over-expression of proteostasis machinery and maintenance of cell viability under high mutational load, we utilized expression knock-down (shRNA) estimates from project Achilles32 for the same cancer cell lines as in CCLE. We sought to measure how mutational load impacts cell viability when protein complexes and gene families undergo a loss of function through expression knock-down. Since the shRNA screen was performed on an individual gene basis, we utilized a GLM framework that aggregates expression knock-down estimates of all genes within a given proteostasis gene family to jointly measure how mutational load impacts cell viability after loss of function. Specifically, we included an additional categorical variable of the gene name within each gene family to allow for a change in the intercept within each gene in the GLM when measuring the association of mutational load and cell viability after expression knock-down. In addition, we similarly evaluated whether specific cancer types were driving patterns of association within the GLM through jackknife re-sampling by cancer type (Fig. 4A).
Overall, we find that elevated mutational load is associated with decreased cell viability when the function of most chaperone gene families are disrupted through expression knock-down (Fig. 4A). However, only chaperones within the HSP100 family, which have the unique ability to rescue and reactivate existing protein aggregates in cooperation with other chaperone families33, show a significant negative relationship between mutational load and cell viability across almost all cancer types. Similarly, we find specificity in the vulnerability that mutational load generates when the function of the proteasome and different ribosomal complexes are disrupted (Fig. 4A). Mutational load significantly decreases cell viability only when expression knock-down of the 19s regulatory particle of the proteasome is disrupted, suggesting that targeting the protein import machinery of the proteasome is more effective than targeting the protein cleaving machinery in the 20s core. Finally, mutational load significantly increases cell viability when cytoplasmic ribosomes – which are already down-regulated in response to mutational load (Fig. 2B) – undergo a loss of function through expression knock-down. Conversely, expression knock-down of mitochondrial ribosomes significantly decreases viability with increased mutational load in cell lines, which is also consistent with the patterns of expression observed.
Since functional redundancy in the human genome can make expression knock-down estimates within individual genes noisy, we also examined how drugs targeting the function of whole complexes impacts viability with mutational load across all cancer types and when removing individual cancer types through jackknife re-sampling. To do so, we utilized drug sensitivity screening data in project PRISM34 within CCLE and used a simple GLM to measure the association of mutational load and cell viability after drug inhibition. We find that treatment with the majority of proteasome inhibitors (6/8) and ubiquitin-specific proteasome inhibitors (2/3), which target protein degradation complexes, are significantly associated with a decrease in cell viability in high mutational load cell lines. Similarly, most HSP90 inhibitors decrease cell viability with mutational load (8/10), although only a few drugs show a significant relationship. This variability in the efficacy of drugs with similar mechanisms of action likely reflects that the efficacy to disrupt the function of proteostasis machinery is dependent on the specific molecular affinity of a compound to its target and downstream effectors. While these are the only relevant proteostasis drugs in the PRISM dataset that are currently available, we anticipate that drugs targeting other chaperone machinery or splicing complexes could also target other potential vulnerabilities in high mutational load cancers. Collectively, these results indicate that elevated expression of protein degradation and folding machinery is causally related to the maintenance of viability in in high mutational load cell lines, and likely in high mutational load tumors by extension.
Lastly, we find that most drugs in the PRISM database do not significantly decrease cell viability with mutational load (Fig. 5A), suggesting that high mutational load cancer cells are not generically vulnerable to all classes of drugs. Specifically, we find that drugs which inhibit transcription, cytoskeleton organization, protein degradation, chaperones, protein synthesis and promote apoptosis are most effective at targeting high mutational load cancer cells – delineating additional potential therapeutic vulnerabilities in high mutational burden tumors (Fig. 5B).
Here, we test the hypothesis that cancer cells regulate their proteostasis machinery to mitigate the damaging effects of passenger mutations, which can destabilize and misfold proteins. Misfolded proteins can arise from non-synonymous or nonsense passengers which cause abnormal amino acid modifications or pre-mature truncations in proteins. Even synonymous passengers, which are traditionally thought to be functionally silent, can lead to misfolding of proteins through changes in mRNA stability35, translational pausing36,37, and non-optimal codon usage.38,39 As a result, protein misfolding can be damaging in cells not only due to a loss of function of the original protein, but also due to a gain in toxicity caused by the aggregation of aberrant peptides. It is intriguing to consider the possibility that the need to manage protein misfolding stress is a hallmark of somatic evolution in cancer.
To maintain viability by minimizing these cytotoxic effects, we find that high mutational load tumors – similar to yeast40, bacteria41,42, and viruses43 – up-regulate the expression of chaperones, which allow mutated proteins that would otherwise be misfolded to retain function. We find evidence suggesting that specific chaperone families that actively participate in protein re-folding (HSP60, HSP90 and HSP70) or disaggregation (HSP100) are up-regulated in response to mutational load, while other chaperone machinery that salvage proteins (Small HS) are downregulated. In addition, we find degradation of mutated proteins through up-regulation of the proteasome to be another possible strategy high mutational load tumors use to mitigate protein misfolding stress.
Finally, we find additional mechanisms that high mutational load tumors use to not just mitigate but also prevent protein misfolding. By utilizing post-transcriptional processes that couple alternative splicing with mRNA decay pathways known to occur in normal human tissues22,44,45, high mutational load tumors appear to selectively prevent protein production by regulating certain pre-mRNA transcripts to be degraded rather than translated. We find evidence suggesting that the targets of this coordinated un-productive splicing are primarily related to cytoplasmic ribosomal gene expression that controls the translation of proteins, consistent with observations in other organisms46–48. Intriguingly, we find that while cytoplasmic ribosome expression is attenuated, mitochondrial ribosome biogenesis in human tumors is up-regulated in response to mutational load. This could both be another mechanism that high mutational load tumors use to compartmentalize and degrade proteins27 and reflect the increased energetic demands of proteostasis maintenance49.
The expression responses observed here are not only consistent with protein misfolding stress in other organisms, but also cross-validate in cancer cell lines, where we find similar expression responses to mutational load. This provides further evidence of a generic, cell intrinsic phenomenon occurring that cannot be explained by extrinsic organismal effects, such as aging, changes in the immune system or microenvironment. Furthermore, we move beyond correlations of gene expression responses to mutational load and establish a causal connection by demonstrating that mitigation of protein misfolding through protein degradation and re-folding is necessary for high mutational load cancer cells to maintain viability through perturbation experiments via knockdown experiments with shRNA and drug profiling.
The results presented here have many implications. First, they suggest that while there is direct selection during somatic evolution for pathogenic drivers that allow cancer cells to continually proliferate, damaging passengers that destabilize proteins must also cause cancer cells to experience second-order indirect selection for alterations that allow tumors to overcome this proteostasis imbalance. This could occur through phenotypic plasticity, shifts in methylation and chromatin structure, or through compensatory point mutations and duplications, consistent with other studies50,51. Indeed, gene duplication, where one copy can still perform the required function while the other copy is non-functional, is another known mechanism that allows cells to maintain robustness to damaging mutations in many eukaryotic organisms52,53. In support of this, whole genome-duplication, which is common in cancer, has recently been shown as another potential mechanism that tumor cells could use to maintain robustness to deleterious passengers54. However, duplication events are also known to be deleterious due to gene dosage effects that cause protein imbalance55, which could further exacerbate protein misfolding. Further experimental studies are needed to distinguish how cancer cells compensate for protein misfolding and the role that genome duplication may play in this process.
Second, the extra demands of proteostasis maintenance presents important vulnerabilities in high mutational load cancers that could be exploited. The clinical use of chaperone inhibitors for cancer treatment has been explored for over two decades56–58 but no study, to our knowledge, has compared the efficacy of chaperone inhibitor use in tumors stratified by mutational load. Similarly, the clinical use of proteasome inhibitors, which are currently only approved for the treatment of multiple myeloma and mantle-cell lymphoma59,60, has not been directed specifically to high mutational load tumors. While the efficacy of proteasome inhibitors in multiple myeloma patients is linked to the protein misfolding stress response61,62, it is currently unknown whether high mutational load tumors are more susceptible to these inhibitors. Outside of drugs in the clinic, the need for cancers to compensate for protein misfolding could also present additional vulnerabilities due to evolutionary trade-offs, where the improvement in fitness of one trait comes at the expense of another. Previous work in yeast has identified strong trade-offs between the adaptive mechanisms that allow for the tolerance of mistranslation and survival under conditions of starvation49. Whether similar conditions could be exploited in high mutational load cancer cells warrants additional further investigation.
Finally, our results contribute to an accumulating body of evidence that cancer and aging are different manifestations of related underlying evolutionary processes63–65. The same forces of mutation and inefficient selection in somatic evolution generates a persistent problem of deleterious mutation accumulation in normal somatic tissues and during tumor growth. Disruption of proteostasis is a known hallmark of aging in normal tissues66. Many transcriptional responses observed in high mutational load tumors — such as shifts in regulation of alternative splicing67, protein degradation68, and protein re-folding69— are also observed in normal aging tissues which contain somatic mutations. Despite this, aging tissues appear to utilize different strategies to deal with proteostasis disruption — such as up-regulation of chaperones in the Small HS family70and autophagy71— which are not a pre-dominant response observed here in high mutational load tumors. Whether different combinations of strategies are used by high mutational load cancer cells use to overcome their mutational load or whether all the strategies identified here are needed to maintain proteostasis is unclear. Differences in these proteostasis strategies could be due to different selection pressures during somatic evolution, the degree of mutational load required to induce a stress response, differences in energetic costs of protein maintenance, or the interplay that exists between apoptosis and proteostasis. Further studies are needed to elucidate the precise dynamics and physiological consequences of inefficient negative selection in somatic evolution, how this impacts cellular growth, and the mechanisms somatic cells use to maintain robustness to proteostasis disruption.
We thank Kathleen Houlahan, Chuan Li, José Aguilar-Rodríguez and other members of Petrov and Curtis labs for their helpful comments and discussions. S.T. was supported in part by an NIH training grant T32-HG000044-21. This work was supported in part by the National Institutes of Health (NIH) Director’s Pioneer Award: DP1CA238296 and the National Cancer Institute (NCI) Cancer Target Discovery and Development Center (U01CA217851) to C.C and by NIH grants R35GM118165 to D.A.P and GM74074 to J.F.
Conflicts of Interest
C.C. is an advisor and stockholder in Grail, Ravel, DeepCell and an advisor to Genentech, Bristol Myers Squibb, 3T Biosciences and NanoString. D.A.P. is a founder of, and stockholder equity in, D2G Oncology Inc.
Data availability and resources
Whole-exome, somatic mutation calls of 10,486 cancer patients across 33 cancer types in The Cancer Genome Atlas (TCGA) were downloaded from the Multi-Center Mutation Calling in Multiple Cancers (MC3) project16(https://gdc.cancer.gov/about-data/publications/mc3-2017). For the same patients in TCGA, RNA-seq data of log2 transformed RSEM normalized counts were downloaded from the UCSC Xena Browser72 (https://xenabrowser.net/datapages/) and copy number alterations (CNAs), including amplifications and deletions, called via ABSOLUTE were downloaded from COSMIC (v91)73 (https://cancer.sanger.ac.uk/cosmic/download). Tumor purity estimates for TCGA were downloaded from the Genomic Database Commons (GDC)74 (https://gdc.cancer.gov/about-data/publications/pancanatlas). Data for all cancer cell lines in the Cancer Cell Line Encyclopedia (CCLE) were downloaded from DepMap30 (https://depmap.org/portal/download/all/). Specifically, mutation calls (Version 21Q3) from whole-exome sequencing data, copy number alternations quantified by ABSOLUTE (Version CCLE 2019), log2 transformed TPM normalized counts (Version 21Q3) from RNA-seq data, proteomics data quantified by mass spectrometry75, shRNA data from project Achilles32 normalized using DEMETER (DEMETER2 Data v6), and primary drug sensitivity screens of replicate collapsed log fold changes relative to DMSO from project PRISM34 (Version 19Q4) were used.
The lmerTest and lmer package in R was used to apply a separate generalized linear mixed model (GLMM) for each gene in the genome to identify groups of genes whose expression is up-regulated in response to mutational load in TCGA. For each gene, expression values across all patients were z-score normalized in all analyses to ensure fair comparisons across genes. Known co-variates of tumor purity and cancer type were included in the GLMM. Tumor purity and mutational load were modeled as fixed effects, whereas cancer type was modeled as a random effect (i.e. random intercept) since it varies across groups of patients and can be interpreted as repeated measurements across groups. For all analyses, mutational load was defined as log10 of the number of synonymous, nonsynonymous and nonsense mutations per tumor. For each gene, the parameters used in the GLMM were as follows,
where Y is a vector of expression values of each tumor, β0 is the fixed intercept, β1 is the fixed slope for the predictor variable X1 which is a vector of mutational load values for each tumor, β2 is the fixed slope for the predictor variable X2 which is a vector of the purity of each tumor, v is the random intercept for each cancer type, and e is a Gaussian error term. To examine expression responses to mutational load within a given protein complex and cancer type, the same normalization procedures were applied as above within cancer types and a separate GLM for each cancer type was ran as follows,
where Y is a vector of expression values of each tumor in a given cancer type, β0 is the fixed intercept, β1 is the fixed slope for the predictor variable X1 which is a vector of mutational load values for each tumor, β2 is the fixed slope for the predictor variable X2 which is a vector of the purity of each tumor, β3 is a change in the intercept for X3 which is a categorical variable of individual genes within each proteostasis complex and e is a Gaussian error term.
Unlike TCGA, samples within each cancer type in CCLE can be small and are unbalanced (i.e. some cancer types have <10 samples and others have >100 samples). In these cases, mixed effects models may not be able to estimate among-population variance accurately31. Thus, for all regression-based analyses in CCLE, a simple generalized linear model (GLM) was used instead. Cell viability values across all cell lines were z-score normalized by gene in all analyses to ensure fair comparisons across genes. To assess whether the same sets of genes are up-regulated in response to mutational load in CCLE using the GLM, a similar procedure to the GLMM was performed. A separate GLM was applied for each gene with the following parameters,
where Y is a vector normalized expression values of each cell line, β0 is the fixed intercept, β1 is the fixed slope for the predictor variable X1 which is a vector of mutational load values for each tumor, and e is a Gaussian error term. To assess whether protein abundances are similarly up-regulated in response to mutational load in CCLE in proteostasis complexes, a separate GLM was applied to each gene with the following parameters,
where Y is a vector of protein abundance values within each cell line, β0 is the fixed intercept, β1 is the fixed slope for the predictor variable X1 which is a vector of mutational load values for each tumor, and e is a Gaussian error term. A similar GLM framework as above was used to estimate the association of mutational load and cell viability after shRNA knock-down of individual genes in proteostasis complexes with the following parameters,
where Y is a vector of normalized cell viability estimates after expression knock-down of an individual gene across all cell lines, β0 is the fixed reference intercept, β1 is the fixed slope for the predictor variable X1 which is a vector of mutational load values for each cell line, β2 is a change in the intercept for X2 which is a categorical variable of individual genes within each proteostasis complex, and e is a Gaussian error term. To estimate the association of mutational load and cell viability after pharmacologic inhibition of proteostasis machinery, the following GLM was applied to each relevant drug in PRISM:
where Y is a vector normalized cell viability estimates after drug inhibition across all cell ines, β0 is the fixed intercept, β1 is the fixed slope for the predictor variable X1 which is a vector of mutational load values for each tumor, and e is a Gaussian error term.
For both the GLM and GLMM, model assumptions of homogeneity of variance were verified by plotting residuals versus fitted values in the model and residuals versus each covariate in the model. Multi-collinearity with other mutational classes (e.g. such as copy number alterations, CNAs) were considered but not found to correlate with point mutations (Supplemental Figure 1). A jackknife re-sampling procedure was used for outlier analysis and to determine whether specific cancer types are driving patterns of association within the GLM and GLMM. Briefly, each cancer type was removed from the regression model one at a time, and regression coefficients were re-estimated. Overall, regression coefficients were fairly stable across cancer types and trends remained the same (Supplemental Figure 5 and 8).
Proteostasis gene sets
Gene set enrichment analysis
All gene set enrichment analysis was performed using gprofiler2 with default parameters. For all sets of genes, significance was determined after correcting for multiple hypothesis testing (FDR < 0.05). For gene set enrichment analysis used to identify genes up-regulated in TCGA in response to mutational load, all terms in CORUM database were reported and enrichment terms in the KEGG database of diseases not related to cancer (e.g. ‘Influenza A’) were omitted from the main figures for clarity and space. For gene sets used to identify terms differentially splice in between high and low mutational load tumors, all terms in the CORUM and the REACTOME database were reported in the main figures. The full set of enrichment terms for all analyses is reported in Supplemental Table 1.
Alternative splicing analysis
Alternative splicing events were quantified through a previously established metric called PSI. PSI is calculated as the number of reads that overlap the alternative splicing event (e.g. for intron retention, either at intronic regions or those at the boundary of exon to intron junctions) divided by the total number of reads that support and don’t support the alternative splicing event. PSI summarizes alternative splicing events at specific exonic boundaries in the entire transcript population without needing to know the complete underlying composition of each full length-transcript.
Alternative splicing calls for all tumors in TCGA were downloaded from TCGA SpliceSeq25. Default splice event filters (percentage of samples with PSI values >75%) from the database were applied. To test whether gene expression is down-regulated in high mutational load tumors through alternative splicing, we calculated whether alternative splicing events were present based on different threshold values of percent spliced in (PSI) from 90% to 50%. (Supplemental Figure 4). For each alternative splicing event in a gene, we quantified whether the expression of the same gene was under-expressed. Each gene was counted as under-expressed if it was one standard deviation below the mean expression within each cancer type. Genes that contained a point mutation within the same alternative splicing event were removed to control for eQTL effects. We note that intron retention events removed from this analysis represent only ∼1% of intron retention events across all tumors and similar trends are found when this filtering scheme is not applied (Supplemental Figure 3). In addition, we evaluated whether this trend is robust to other alternative splicing events (i.e., Alternate Donor Sites, Alternate Promoters, Alternate Terminators, Exon Skipping Events, ME=Mutually Exclusive Exon; Supplemental Figure 4).
To investigate which genes are differentially spliced in between low and high mutational load tumors for specific alternative splicing events (i.e. intron retention), a t-test was used to calculate whether PSI values were significantly different in tumors with < 10 protein-coding mutations compared to tumors with > 1000 protein-coding mutations. Each alternative splicing event within a gene was required to have less than 25% of missing PSI values and a mean difference between the two groups of >0.01 to be considered. This approach identified 606 and 201 significant genes that have more and fewer intron retention events in high mutational load tumors, respectively, after correcting for multiple hypothesis testing (FDR < 0.05).
Drug category annotation and enrichment analysis
A separate GLM was ran for all drugs in the PRISM database to evaluate whether they are associated with mutational load and cell viability. All drugs that were negatively associated with mutational load and viability were queried on PubMed based on their reported mechanism of action in PRISM and grouped into broad categories (Supplemental Table 1). Categories of drug mechanism of action were first chosen based on their role in metabolism and known hallmarks of cancer. Additional categories not directly related to known cancer associated functional groups were made for drugs that could not otherwise be grouped (i.e. ‘Ion Channel Regulation’, Viral Replication Inhibitor’, etc.). Drugs with ambiguous mechanism of action (e.g. ‘cosmetic’, ‘coloring agent’) were grouped into ‘Other’. The abstracts of up to 10 associated papers were used to examine for evidence connecting drug mechanisms of action to 33 broad categories. In total, 700 drug mechanism of action were grouped and annotated into 33 broad categories. These broad categories were used to assess whether high mutational load cancer cell lines are generically vulnerable to drugs or whether certain categories are more likely to contain drugs effective against high mutational load cell lines. To control for differences in the number of drugs within each category, 50 drugs were randomly sampled, and the fraction of drugs significantly associated with mutational load in each category was calculated 100 times to generate confidence intervals.
Code and software availability
All code used for analysis will be made publicly available on Github under the open-source MIT License upon publication.
- 1.Mutational heterogeneity in cancer and the search for new cancer-associated genesNature https://doi.org/10.1038/nature12213
- 2.Cancer Evolution: No Room for Negative SelectionCell https://doi.org/10.1016/j.cell.2017.10.039
- 3.Universal Patterns of Selection in Cancer and Somatic TissuesCell https://doi.org/10.1016/j.cell.2017.09.042
- 4.On measuring selection in cancer from subclonal mutation frequenciesPLoS Comput. Biol https://doi.org/10.1371/journal.pcbi.1007368
- 5.The damaging effect of passenger mutations on cancer progressionCancer Res https://doi.org/10.1158/0008-5472.CAN-15-3283-T
- 6.Impact of deleterious passenger mutations on cancer progressionProc. Natl. Acad. Sci. U. S. A https://doi.org/10.1073/pnas.1213968110
- 7.Negative selection in tumor genome evolution acts on essential cellular functions and the immunopeptidomeGenome Biol https://doi.org/10.1186/s13059-018-1434-0
- 8.Identification of neutral tumor evolution across cancer typesNat. Genet https://doi.org/10.1038/ng.3489
- 9.Mol. Biol. Evolhttps://doi.org/10.1093/molbev/msy100
- 10.Most cancers carry a substantial deleterious load due to Hill-Robertson interferencebioRxiv https://doi.org/10.1101/764340
- 11.The distribution of fitness effects of new mutationsNature Reviews Genetics https://doi.org/10.1038/nrg2146
- 12.Universal Patterns of Selection in Cancer and Somatic TissuesCell https://doi.org/10.1016/j.cell.2017.09.042
- 13.Mistranslation-induced protein misfolding as a dominant constraint on coding-sequence evolutionCell 134:341–52
- 14.The effect of linkage on limits to artificial selectionGenet. Res. (Camb) https://doi.org/10.1017/S001667230800949X
- 15.The cancer genome atlas pan-cancer analysis projectNat. Genet https://doi.org/10.1038/ng.2764
- 16.Scalable Open Science Approach for Mutation Calling of Tumor Exomes Using Multiple Genomic PipelinesCell Syst 6:271–281
- 17.CORUM: The comprehensive resource of mammalian protein complexes - 2019Nucleic Acids Res https://doi.org/10.1093/nar/gky973
- 18.Using the KEGG database resourceCurr. Protoc. Bioinforma https://doi.org/10.1002/0471250953.bi0112s38
- 19.gprofiler2 -- an R package for gene list functional enrichment analysis and namespace conversion toolset g: ProfilerF1000Research https://doi.org/10.12688/f1000research.24956.2
- 20.Cellular stress and RNA splicingTrends in Biochemical Sciences https://doi.org/10.1016/j.tibs.2008.11.004
- 21.The emerging role of premessenger RNA splicing in stress responses: Sending alternative messages and silent messengersRNA Biology https://doi.org/10.4161/rna.8.5.16016
- 22.The functional consequences of intron retention: Alternative splicing coupled to NMD as a regulator of gene expressionBioEssays https://doi.org/10.1002/bies.201300156
- 23.The rules and impact of nonsense-mediated mRNA decay in human cancersNat. Genet https://doi.org/10.1038/ng.3664
- 24.Unproductive splicing of SR genes associated with highly conserved and ultraconserved DNA elementsNature https://doi.org/10.1038/nature05676
- 25.TCGASpliceSeq a compendium of alternative mRNA splicing in cancerNucleic Acids Res https://doi.org/10.1093/nar/gkv1288
- 26.The small heat shock proteins and their role in human diseaseFEBS Journal https://doi.org/10.1111/j.1742-4658.2005.04708.x
- 27.Cytosolic proteostasis through importing of misfolded proteins into mitochondriaNature https://doi.org/10.1038/nature21695
- 28.Ribosomal mistranslation leads to silencing of the unfolded protein response and increased mitochondrial biogenesisCommun. Biol https://doi.org/10.1038/s42003-019-0626-9
- 29.The stop-and-go traffic regulating protein biogenesis: How translation kinetics controls proteostasisJournal of Biological Chemistry https://doi.org/10.1074/jbc.REV118.002814
- 30.The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivityNature https://doi.org/10.1038/nature11003
- 31.A brief introduction to mixed effects modelling and multi-model inference in ecologyPeerJ https://doi.org/10.7717/peerj.4794
- 32.Defining a Cancer Dependency MapCell https://doi.org/10.1016/j.cell.2017.06.010
- 33.Aggregate reactivation mediated by the Hsp100 chaperonesArchives of Biochemistry and Biophysics https://doi.org/10.1016/j.abb.2012.01.012
- 34.Discovering the anticancer potential of non-oncology drugs by systematic viability profilingNat. Cancer https://doi.org/10.1038/s43018-019-0018-6
- 35.Synonymous mutations make dramatic contributions to fitness when growth is limited by a weak-link enzymePLoS Genet https://doi.org/10.1371/journal.pgen.1007615
- 36.Silent substitutions predictably alter translation elongation rates and protein folding efficienciesJ. Mol. Biol https://doi.org/10.1016/j.jmb.2012.06.010
- 37.Transient ribosomal attenuation coordinates protein synthesis and co-translational foldingNat. Struct. Mol. Biol https://doi.org/10.1038/nsmb.1554
- 38.Synonymous codon substitutions perturb cotranslational protein folding in vivo and impair cell fitnessProc. Natl. Acad. Sci. U. S. A https://doi.org/10.1073/pnas.1907126117
- 39.Synonymous but not the same: The causes and consequences of codon biasNature Reviews Genetics https://doi.org/10.1038/nrg2899
- 40.Why molecular chaperones buffer mutational damage: A case study with a yeast Hsp40/70 systemGenetics https://doi.org/10.1534/genetics.106.061564
- 41.Genomic buffering mitigates the effects of deleterious mutations in bacteriaNat. Genet https://doi.org/10.1038/ng1676
- 42.Endosymbiotic bacteria: GroEL buffers against deleterious mutationsNature https://doi.org/10.1038/417398a
- 43.Mechanisms of genetic robustness in RNA virusesEMBO Reports https://doi.org/10.1038/sj.embor.7400636
- 44.Quantitative microarray profiling provides evidence against widespread coupling of alternative splicing with nonsense-mediated mRNA decay to control gene expressionGenes Dev https://doi.org/10.1101/gad.1382806
- 45.Widespread predicted nonsense-mediated mRNA decay of alternatively-spliced transcripts of human normal and disease genesin Bioinformatics https://doi.org/10.1093/bioinformatics/btg1015
- 46.Alternative splicing and nonsense-mediated mRNA decay regulate mammalian ribosomal gene expressionNucleic Acids Res https://doi.org/10.1093/nar/gki905
- 47.Unproductively spliced ribosomal protein mRNAs are natural targets of mRNA surveillance in C. elegansGenes Dev https://doi.org/10.1101/gad.819900
- 48.Introns within ribosomal protein genes regulate the production and function of yeast ribosomesCell https://doi.org/10.1016/j.cell.2011.08.044
- 49.Evolution of Robustness to Protein Mistranslation by Accelerated Protein TurnoverPLoS Biol https://doi.org/10.1371/journal.pbio.1002291
- 50.Systematic characterization of mutations altering protein degradation in human cancersMol. Cell https://doi.org/10.1016/j.molcel.2021.01.020
- 51.Systematic analysis of alterations in the ubiquitin proteolysis system reveals its contribution to driver mutations in cancerNat. Cancer https://doi.org/10.1038/s43018-019-0001-2
- 52.Duplicate genes and robustness to transient gene knock-downs in Caenorhabditis elegansProc. R. Soc. B Biol. Sci https://doi.org/10.1098/rspb.2003.2560
- 53.Role of duplicate genes in genetic robustness against null mutationsNature
- 54.Interplay between whole-genome doubling and the accumulation of deleterious alterations in cancer evolutionNat. Genet https://doi.org/10.1038/s41588-020-0584-7
- 55.Effects of aneuploidy on cellular physiology and cell division in haploid yeastScience (80- https://doi.org/10.1126/science.1142210
- 56.Hsp90 molecular chaperone inhibitors: Are we there yet?Clinical Cancer Research https://doi.org/10.1158/1078-0432.CCR-11-1000
- 57.Hsp90 inhibitors in the clinicHandbook of experimental pharmacology
- 58.Update on Hsp90 Inhibitors in Clinical TrialCurr. Top. Med. Chem https://doi.org/10.2174/156802609789895728
- 59.Proteasome inhibitors in cancer therapyNature Reviews Clinical Oncology https://doi.org/10.1038/nrclinonc.2016.206
- 60.Next-generation proteasome inhibitors for cancer therapyTranslational Research https://doi.org/10.1016/j.trsl.2018.03.002
- 61.The proteasome load versus capacity balance determines apoptotic sensitivity of multiple myeloma cells to proteasome inhibitionBlood https://doi.org/10.1182/blood-2008-08-172734
- 62.Response of myeloma to the proteasome inhibitor bortezomib is correlated with the unfolded protein response regulator XBP-1Haematologica https://doi.org/10.3324/haematol.2011.043331
- 63.Somatic mutant clones colonize the human esophagus with ageScience (80- https://doi.org/10.1126/science.aau3879
- 64.High burden and pervasive positive selection of somatic mutations in normal human skinScience (80- https://doi.org/10.1126/science.aaa6806
- 65.Cancer-Associated Mutations but No Cancer: Insights into the Early Steps of Carcinogenesis and Implications for Early Cancer DetectionTrends in Cancer https://doi.org/10.1016/j.trecan.2019.07.007
- 66.The hallmarks of agingCell https://doi.org/10.1016/j.cell.2013.05.039
- 67.Increased intron retention is a post-transcriptional signature associated with progressive aging and Alzheimer’s diseaseAging Cell https://doi.org/10.1111/acel.12928
- 68.The role of ubiquitin-proteasome system in ageingGeneral and Comparative Endocrinology https://doi.org/10.1016/j.ygcen.2011.02.005
- 69.Aging and molecular chaperonesExp. Gerontol https://doi.org/10.1016/S0531-5565(03)00185-2
- 70.Small heat shock proteins in ageing and age-related diseasesCell Stress and Chaperones https://doi.org/10.1007/s12192-016-0761-x
- 71.Autophagy in healthy aging and diseaseNat. Aging https://doi.org/10.1038/s43587-021-00098-4
- 72.Visualizing and interpreting cancer genomics data via the Xena platformNature Biotechnology https://doi.org/10.1038/s41587-020-0546-8
- 73.The COSMIC (Catalogue of Somatic Mutations in Cancer) database and websiteBr. J. Cancer https://doi.org/10.1038/sj.bjc.6601894
- 74.Toward a Shared Vision for Cancer Genomic DataN. Engl. J. Med 375:1109–1112
- 75.Quantitative Proteomics of the Cancer Cell Line EncyclopediaCell https://doi.org/10.1016/j.cell.2019.12.023
- 76.A systematic atlas of chaperome deregulation topologies across the human cancer landscapePLoS Comput. Biol https://doi.org/10.1371/journal.pcbi.1005890