Cancers adapt to their mutational load by buffering protein misfolding stress

Susanne Tilk; Judith Frydman; Christina Curtis; Dmitri Petrov

doi:10.7554/eLife.87301.1

eLife assessment

Tilk and colleagues present a computational analysis of tumor transcriptomes to investigate the hypothesis that the large number of somatic mutations in some tumors is detrimental such that these detrimental effects are mitigated by an up-regulation by pathways and mechanisms that prevent protein misfolding. The authors address this question by fitting a model that explains the log expression of a gene as a linear function of the log number of mutations in the tumor and show that specific categories of genes (proteasome, chaperones, ...) tend to be upregulated in tumors with a large number of somatic mutations. Some of the associations presented could arise through confounding, but overall the authors present solid evidence that mutational load is associated with higher expression of genes involved in mitigation of protein misfolding – an important finding with general implications for our understanding of cancer evolution.

https://doi.org/10.7554/eLife.87301.1.sa2

Significance of findings

important: Findings that have theoretical or practical implications beyond a single subfield

landmark
fundamental
important
valuable
useful

Strength of evidence

solid: Methods, data and analyses broadly support the claims with only minor weaknesses

exceptional
compelling
convincing
solid
incomplete
inadequate

During the peer-review process the editor and reviewers write an eLife assessment that summarises the significance of the findings reported in the article (on a scale ranging from landmark to useful) and the strength of the evidence (on a scale ranging from exceptional to inadequate). Learn more about eLife assessments

Abstract

In asexual populations that don’t undergo recombination, such as cancer, deleterious mutations are expected to accrue readily due to genome-wide linkage between mutations. Despite this mutational load of often thousands of deleterious mutations, many tumors thrive. How tumors survive the damaging consequences of this mutational load is not well understood. Here, we investigate the functional consequences of mutational load in 10,295 human tumors by quantifying their phenotypic response through changes in gene expression. Using a generalized linear mixed model (GLMM), we find that high mutational load tumors up-regulate proteostasis machinery related to the mitigation and prevention of protein misfolding. We replicate these expression responses in cancer cell lines and show that the viability in high mutational load cancer cells is strongly dependent on complexes that degrade and refold proteins. This indicates that upregulation of proteostasis machinery is causally important for high mutational burden tumors and uncovers new therapeutic vulnerabilities.

Statement of Significance

Cancers can successfully survive an accumulation of thousands of protein-damaging mutations. Here, we show that high mutational load tumors mitigate these damaging consequences by up-regulating complexes that buffer against protein misfolding stress – providing novel therapeutic vulnerabilities and suggesting that disruption of proteostasis is a hallmark of somatic evolution.

Introduction

Cancer develops from an accumulation of somatic mutations over time. While a small subset of these mutations drive tumor progression, the vast majority of remaining mutations, known as passengers, don’t help and might hinder cancer growth. The role that passengers play in tumor progression has traditionally received little attention despite their abundance and variation across cancer types. The number of passengers in a tumor can vary by over four orders of magnitude, even within the same cancer type, from just a few to tens of thousands of point mutations¹.

Whether these passengers are neutral or damaging to tumors has long been a matter of debate^2–10. Some have argued that passengers are functionally unimportant to tumors given that most non-synonymous mutations are not removed by negative selection in somatic tissues^2,3. This is in direct contrast to the human germ-line, where non-synonymous mutations are functionally damaging to most genes¹¹ and signals of negative selection are pervasive³. The common explanation for why damaging protein-coding mutations are removed in the human-germline but maintained in somatic tissues is that most genes are only important for multi-cellular function at the organismal level (e.g. during development), but not during somatic growth^2,12.

However, the notion that non-synonymous mutations are only selectively neutral in somatic tissues is surprising given their known functional consequences in the germ-line. Non-synonymous mutations are known to be damaging in the human germ-line due to their effects on protein folding and stability¹³, which ought to be shared between somatic and germline evolution. An alternative explanation is that non-synonymous mutations are indeed damaging in somatic evolution, but negative selection is too inefficient at removing them due to linkage effects driven by the lack of recombination in somatic cells¹⁰. Without recombination to break apart combinations of mutations, selection must act on beneficial drivers and deleterious passengers that arise in the same genome together. This makes it less efficient for selection to individually favor beneficial drivers or remove deleterious passengers¹⁴. As a result, a substantial number of weakly damaging passengers can accrue in cancer due to inefficient negative selection over time. In support of this model, tumors with very small numbers of passengers – where linkage effects are expected to be negligible – have recently been shown to exhibit signatures of negative selection and weed out damaging non-synonymous mutations¹⁰. In contrast, the remaining majority (>95%) of tumors, which contain much larger numbers of linked mutations, display patterns of inefficient negative selection. This provides evidence in favor of the inefficient selection model and implies that most tumors carry a correspondingly large deleterious mutational load.

If individual passengers are in fact substantially damaging in cancer, successful tumors with thousands of linked mutations must find ways to maintain their viability by mitigating this large mutational load. While paths to mitigation are difficult to predict for non-coding mutations, tumors with mutations in protein-coding genes are expected to minimize the damaging phenotypic effects of protein mis-folding stress. Here, we investigate this hypothesis by analyzing tumor tissues with paired mutational and gene expression profiles to assess how the physiological state of cancer cells change as they accumulate protein coding mutations. Using a general linear mixed effects regression model (GLMM), we leverage variation across 10,295 tumors from 33 cancer types and find that complexes that re-fold proteins (chaperones), degrade proteins (proteasome) and splice mRNA (spliceosome) are up-regulated in high mutation load tumors. We validate these results by showing that similar physiological responses occur in high mutational load cancer cell lines as well. Finally, we establish a causal connection by showing that high mutational load cell lines are particularly sensitive when proteasome and chaperone function is disrupted through downregulation of expression via short-hairpin RNA (shRNA) knock-down or targeted therapies. Collectively, these data indicate that the viability of high mutational load tumors is strongly dependent on the up-regulation of complexes that degrade and refold proteins, revealing a generic vulnerability of cancer that can potentially be therapeutically exploited.

Results

Quantifying transcriptional response to mutational load in human tumors

We first performed a genome-wide screen to systematically identify which genes are transcriptionally upregulated in response to mutational load in human tumors. To do so, we utilized publicly available whole-exome and gene expression data from 10,295 human tumors across 33 cancer types from The Cancer Genome Atlas (TCGA)^15,16. We considered multiple classes of mutations to define mutational load and investigated their degree of collinearity, focusing on protein-coding regions since the use of whole-exome data limits the ability to accurately assess mutations in non-coding regions. We find that there is a high degree of collinearity among synonymous, non-synonymous and nonsense point mutations in protein coding genes (R > 0.9) but weak collinearity between point mutations and copy number alterations (R < 0.05) (Supplemental Figure 1). Thus, we decided to focus on the aggregate effects of protein-coding mutations and for all analyses defined mutational load as log₁₀ of the total number of point mutations in protein-coding genes. For simplicity, we used all mutations rather than focusing only on passenger mutations since identifying genuine drivers against a background of linked passenger events can be difficult, especially for tumors with many mutations.

Since gene expression can vary across tumors due to many factors, such as cancer type, tumor purity and other unknown factors, we utilized a generalized linear mixed model (GLMM) to measure the association of mutational load and gene expression while accounting for these potential confounders (Fig. 1A). Within the GLMM, tumor purity and mutational load were modeled as fixed effects whereas cancer type was modeled as a random effect since it varies across groups of patients and can be interpreted as repeated measurements across groups. The following GLMM was applied separately to each gene,

where Y is a vector of normalized expression values across all tumors, β₀ is the fixed intercept, β₁ is the fixed slope for the predictor variable X₁ which is a vector of mutational load values for each tumor, β₂ is the fixed slope for the predictor variable X₂ which is a vector of the purity of each tumor, v is the random intercept for each cancer type, and e is a Gaussian error term (Methods).

General linear mixed effects model (GLMM) identifies protein complexes and pathways up-regulated in response to mutational load in human tumors.
**(A)** Overview of the GLMM used to measure the association of mutation load with gene expression while controlling for potential co-variates (purity and cancer type). Genes with a significant, positive β₁ regression coefficient and false discovery rate (FDR) < 0.05 are used for gene set enrichment analysis. **(B-C)** Bar plots of protein complexes from the CORUM database (left) and pathways from the KEGG database (right) that are significantly enriched (p < 0.05) in response to mutational load. Length of bars denote negative log₁₀ of adjusted p-value and colors denote broad functional groups enriched in both databases.

Using this approach, we applied the GLMM to all tumors in TCGA and identified 5,330 genes that are significantly up-regulated in response to mutational load (β₁ > 0, FDR < 0.05). Next, we linked these genes to cellular function by performing gene set enrichment to known protein complexes (CORUM database¹⁷, Fig. 1B) and pathways (KEGG database¹⁸, Fig. 1C) using gprofiler2¹⁹. As expected for tumors with many mutations, pathways and protein complexes related to cell cycle, DNA replication and DNA repair were enriched in tumors with a high mutational load. However, some of the most significant enrichment terms were for protein complexes and pathways that regulate translation (mitochondrial ribosomes), protein degradation (proteasome complex), and protein folding (CCT complex/HSP60), consistent with the hypothesis that high mutational load tumors experience protein misfolding stress. Surprisingly, we also found that the spliceosome, a large protein complex that regulates alternative splicing in cells, is up-regulated in response to mutational load. This suggests that transcription itself could also be regulated in response to protein misfolding stress, as seen in other studies^20,21. In addition, we confirmed that the same proteostasis complexes are identified when performing gene set enrichment analysis only genes with the largest effect sizes from the transcriptional screen (in the upper quartile of β₁ regression coefficients), which contain half the number of significant genes as identified previously (N=2,152 vs 5,330; Supplemental Figure 2).

Gene silencing through alternative splicing in high mutational load tumors

We next investigated in detail how these protein complexes could mitigate the damaging effects of protein misfolding in high mutational load tumors by examining the role of the spliceosome in gene silencing. We hypothesized that the up-regulation of the spliceosome in high mutational load tumors prevents further protein misfolding by regulating pre-mRNA transcripts to be degraded rather than translated. The down-regulation of gene expression via alternative splicing events, such as intron retention, is one known mechanism to silence genes by funneling transcripts to mRNA decay pathways.^22–24

To test whether gene expression is down-regulated in high mutational load tumors through intron retention, we utilized previously called alternative splicing events in TCGA²⁵. Alternative splicing events within this dataset were quantified through a metric called percent spliced in or PSI. PSI is calculated as the number of reads that overlap the alternative splicing event (e.g. for intron retention, either at intronic regions or those at the boundary of exon to intron junctions) divided by the total number of reads that support and don’t support the alternative splicing event. Thus, PSI estimates the probability of alternative splicing events only at specific exonic boundaries in the entire transcript population without requiring information on the complete underlying composition of each full length-transcript.

Using these alternative splicing calls, we reasoned that if a transcript contains an intron retention event and is downregulated in expression, the transcript is more likely to have been degraded by mRNA decay pathways. For all genes, we first quantified whether intron retention events were present based on a threshold value >80% PSI. For each gene with an intron retention event, we quantified whether the expression of the same gene was under-expressed. Each gene was counted as under-expressed if it was one standard deviation below the mean expression within the same cancer type. To control for mutations that might affect patterns of expression, (i.e., expression quantitative trait loci or eQTL effects), alternative splicing events that contained a point mutation within the same gene were removed from the analysis (which only represent ∼1% of intron retention events across all tumors; Methods). We find that relative to all transcripts with intron retention events, the number of transcripts that are under-expressed increases with tumor mutational load (Fig. 2A), suggesting that the degree of intron-retention driven mRNA decay is elevated in high mutational load tumors. This trend is robust to other PSI value thresholds (>50-90% PSI), even for other alternative splicing events (e.g., exon skipping, mutually exclusive exons, etc.) and when not filtering for potential eQTL effects (Supplemental Figure 3 and 4).

Gene silencing is elevated in high mutational load tumors likely through the coupling of intron retention with mRNA decay.
**(A)** Counts of the number of under-expressed transcripts with intron retention events, relative to counts of all intron retention events in tumors binned by the total number of protein-coding mutations. Intron retention events with PSI > 80% are counted. Error bars are 95% confidence intervals determined by bootstrap sampling. **(B)** Barplot of significant protein complexes in the CORUM database (in red) and Reactome pathway database (in blue) with more (bottom) and less (top) intron retention events in high mutational load tumors compared to low mutational load tumors.

We next investigated which genes are more likely to be silenced through mRNA decay between low and high mutational load tumors. For each intron retention event, we calculated whether PSI values were significantly different in low mutational load tumors (<10 total protein-coding mutations) compared to high mutational load tumors (>1000 total protein-coding mutations) using a t-test. This approach identified 606 and 201 genes that have more and less intron retention events in high mutational load tumors, respectively. Using gene set enrichment analysis, we find that cytoplasmic ribosomes contain more intron retention events in high mutational load tumors, potentially leading to their down-regulation through mRNA decay to prevent further protein mis-folding (Fig. 2B). Genes that contain fewer intron retention events in high mutational load tumors, which are less likely to undergo mRNA decay, are primarily related to mRNA splicing.

Regulation of translation, protein folding and protein degradation in high mutational load tumors

Next, we investigated in detail how the remaining proteostasis complexes that were significant in our genome-wide screen, which regulate protein synthesis, degradation and folding, could mitigate protein misfolding in high mutational load tumors. To do so, we expanded our gene sets to include other chaperone families, all ribosomal complexes and proteasomal subunits (Fig. 3A). Using the GLMM framework detailed above, we find that the expression of nearly all individual genes in chaperone families that participate in protein folding (HSP60, HSP70 and HSP90), protein disaggregation (HSP100), and have organelle-specific roles (ER and mitochondrial) are significantly up-regulated in response to mutational load. Interestingly, however, small heat shock proteins, which don’t participate in protein folding or disaggregation, are significantly down-regulated in response to increased protein coding mutations. The role of small heat shock proteins is primarily to hold unfolded proteins in a reversible state for re-folding or degradation by other chaperones²⁶ and thus, could possibly be down-regulated due to their inefficiency in mitigating protein misfolding.

Protein folding, degradation, and synthesis are regulated in both high mutational load tumors (TCGA) and cell lines (CCLE).
Box plots of β₁ regression coefficients (top panels) and negative log₁₀ adjusted p-values (bottom panels) measuring the association of mutation load and the expression of individual genes in chaperone (purple), proteasome (yellow), and ribosome (green) complexes. Shown are regression coefficients from human tumors (TCGA) on the left and cell lines (CCLE) on the right. Percentages and grey lines on top panels show the quantile distribution of regression coefficients measuring the association of mutational load and expression for all genes in the genome within each dataset. Vertical grey line on bottom panels shows threshold of significance (p = 0.05).

We further examined differences in expression of different structural components of the proteasome, a large protein complex responsible for degradation of intracellular proteins. Consistent with the over-expression of chaperone families that mitigate protein mis-folding, both the 19s regulatory particle (which recognizes and imports proteins for degradation) and the 20s core (which cleaves peptides) of the proteasome are up-regulated in response to mutational load in TCGA (Fig. 3A). In addition, we find that specifically mitochondrial — but not cytoplasmic — ribosome complexes are up-regulated in high mutational load tumors. As previously reported in yeast²⁷ and human cells²⁸, mitochondrial ribosome biogenesis has been shown to occur under conditions of chronic protein misfolding as a mechanism of compartmentalization and degradation of proteins. In contrast, translation of proteins through cytosolic ribosome biogenesis has been previously characterized to be attenuated and slowed to prevent further protein mis-folding²⁹. This decrease in expression of cytoplasmic ribosomes is also consistent with observed patterns of alternative splicing coupled to mRNA decay pathways in high mutational load tumors (Fig. 2B).

Finally, we performed a jackknife re-sampling procedure to confirm that specific cancer types aren’t driving patterns of association within the GLMM. This was achieved by removing each cancer type from the regression model one at a time, and re-calculating regression coefficients on the remaining set of samples. Overall, regression coefficients were stable across cancer types and trends were unchanged (Supplemental Figure 5). In addition, we also performed linear regression within cancer types and found similar expression responses to mutational load across proteostasis complexes (Supplemental Figure 6). Finally, we also confirmed that patient age was not driving patterns of association of mutational load and gene expression within the GLMM (Supplemental Figure 7). Taken together, this suggests that protein re-folding, protein disaggregation, protein degradation, and down-regulation of cytoplasmic translation are potential mechanisms to mitigate and prevent protein misfolding in high mutational load tumors.

Validating proteostasis expression responses in cancer cell lines and establishing a causal connection through perturbation experiments

We next sought to validate these results by first examining whether the expression patterns observed in human tumors replicate within cancer cell lines from the Cancer Cell Line Encyclopedia (CCLE)³⁰. Unlike TCGA, samples within each cancer type in CCLE can be small and are unbalanced (i.e., some cancer types have <10 samples and others have >100 samples). Since GLMMs may not be able to estimate among-population variance accurately in these cases³¹, we utilized a simple generalized linear model (GLM) instead to measure the effect of mutational load on patterns of expression without over-constraining the model. Indeed, we find that expression patterns seen in human tumors broadly replicate in cancer cell lines (Fig. 3). Similar to the expression analysis in TCGA, we also confirmed through a jackknife re-sampling procedure that specific cancer types aren’t driving patterns of association within the GLM (Supplemental Figure 8). Finally, we further validated these trends by incorporating protein abundance estimates in CCLE, which contains the largest dataset available of RNA (n=1377) and protein (n=373) abundances that are harmonized across samples. We find similar patterns of expression and protein abundances in response to mutational load in CCLE within proteostasis complexes (Supplemental Figure 9).

Overall, this indicated that the expression patterns observed are cell autonomous (i.e., independent of organismal effects such as the immune system, age or microenvironment) and consistent across high mutational load cancer cells. Importantly, it also demonstrates that cancer cell lines are a reasonable model to causally interrogate these effects further through functional and pharmacological perturbation experiments.

To establish a causal relationship between the over-expression of proteostasis machinery and maintenance of cell viability under high mutational load, we utilized expression knock-down (shRNA) estimates from project Achilles³² for the same cancer cell lines as in CCLE. We sought to measure how mutational load impacts cell viability when protein complexes and gene families undergo a loss of function through expression knock-down. Since the shRNA screen was performed on an individual gene basis, we utilized a GLM framework that aggregates expression knock-down estimates of all genes within a given proteostasis gene family to jointly measure how mutational load impacts cell viability after loss of function. Specifically, we included an additional categorical variable of the gene name within each gene family to allow for a change in the intercept within each gene in the GLM when measuring the association of mutational load and cell viability after expression knock-down. In addition, we similarly evaluated whether specific cancer types were driving patterns of association within the GLM through jackknife re-sampling by cancer type (Fig. 4A).

Viability in high mutational load cell lines decreases when proteostasis machinery is disrupted.
**(A)** Heatmap of β₁ regression coefficients jointly measuring the association of mutational load and cell viability after expression knockdown of individual genes in proteostasis complexes. **(B)** Heatmap of β₁ regression coefficients measuring the association mutational load and cell viability after inhibition of proteostasis machinery via drugs. Both panels show how stable regression estimates are when including all cancer types (‘All Cancers’) shown in black boxes and when removing each individual cancer type on the y-axis. Colors denote a positive (blue), zero (grey), or negative (red) relationship of mutational load and cell viability after expression knock-down or drug inhibition. Stars denote whether the relationship is significant (* = p < 0.05; ** = p <0.005; *** = p < 0.0005).

Overall, we find that elevated mutational load is associated with decreased cell viability when the function of most chaperone gene families are disrupted through expression knock-down (Fig. 4A). However, only chaperones within the HSP100 family, which have the unique ability to rescue and reactivate existing protein aggregates in cooperation with other chaperone families³³, show a significant negative relationship between mutational load and cell viability across almost all cancer types. Similarly, we find specificity in the vulnerability that mutational load generates when the function of the proteasome and different ribosomal complexes are disrupted (Fig. 4A). Mutational load significantly decreases cell viability only when expression knock-down of the 19s regulatory particle of the proteasome is disrupted, suggesting that targeting the protein import machinery of the proteasome is more effective than targeting the protein cleaving machinery in the 20s core. Finally, mutational load significantly increases cell viability when cytoplasmic ribosomes – which are already down-regulated in response to mutational load (Fig. 2B) – undergo a loss of function through expression knock-down. Conversely, expression knock-down of mitochondrial ribosomes significantly decreases viability with increased mutational load in cell lines, which is also consistent with the patterns of expression observed.

Since functional redundancy in the human genome can make expression knock-down estimates within individual genes noisy, we also examined how drugs targeting the function of whole complexes impacts viability with mutational load across all cancer types and when removing individual cancer types through jackknife re-sampling. To do so, we utilized drug sensitivity screening data in project PRISM³⁴ within CCLE and used a simple GLM to measure the association of mutational load and cell viability after drug inhibition. We find that treatment with the majority of proteasome inhibitors (6/8) and ubiquitin-specific proteasome inhibitors (2/3), which target protein degradation complexes, are significantly associated with a decrease in cell viability in high mutational load cell lines. Similarly, most HSP90 inhibitors decrease cell viability with mutational load (8/10), although only a few drugs show a significant relationship. This variability in the efficacy of drugs with similar mechanisms of action likely reflects that the efficacy to disrupt the function of proteostasis machinery is dependent on the specific molecular affinity of a compound to its target and downstream effectors. While these are the only relevant proteostasis drugs in the PRISM dataset that are currently available, we anticipate that drugs targeting other chaperone machinery or splicing complexes could also target other potential vulnerabilities in high mutational load cancers. Collectively, these results indicate that elevated expression of protein degradation and folding machinery is causally related to the maintenance of viability in in high mutational load cell lines, and likely in high mutational load tumors by extension.

Lastly, we find that most drugs in the PRISM database do not significantly decrease cell viability with mutational load (Fig. 5A), suggesting that high mutational load cancer cells are not generically vulnerable to all classes of drugs. Specifically, we find that drugs which inhibit transcription, cytoskeleton organization, protein degradation, chaperones, protein synthesis and promote apoptosis are most effective at targeting high mutational load cancer cells – delineating additional potential therapeutic vulnerabilities in high mutational burden tumors (Fig. 5B).

Targeting proteostasis machinery is a key vulnerability in high mutational load cell lines.
**(A)** Bar plot of the number of drugs in the PRISM database significantly (black) and not significantly (grey) associated with mutational load and cell viability using a simple generalized linear model (GLM). (B) Fraction of drugs in broad functional categories significantly negatively associated with mutational load and cell viability from the GLM. Confidence intervals were determined by randomly sampling 50 drugs in each functional category 100 times. Dashed line is the median of randomly sampled drugs across all categories.

Discussion

Here, we test the hypothesis that cancer cells regulate their proteostasis machinery to mitigate the damaging effects of passenger mutations, which can destabilize and misfold proteins. Misfolded proteins can arise from non-synonymous or nonsense passengers which cause abnormal amino acid modifications or pre-mature truncations in proteins. Even synonymous passengers, which are traditionally thought to be functionally silent, can lead to misfolding of proteins through changes in mRNA stability³⁵, translational pausing^36,37, and non-optimal codon usage.^38,39 As a result, protein misfolding can be damaging in cells not only due to a loss of function of the original protein, but also due to a gain in toxicity caused by the aggregation of aberrant peptides. It is intriguing to consider the possibility that the need to manage protein misfolding stress is a hallmark of somatic evolution in cancer.

To maintain viability by minimizing these cytotoxic effects, we find that high mutational load tumors – similar to yeast⁴⁰, bacteria^41,42, and viruses⁴³ – up-regulate the expression of chaperones, which allow mutated proteins that would otherwise be misfolded to retain function. We find evidence suggesting that specific chaperone families that actively participate in protein re-folding (HSP60, HSP90 and HSP70) or disaggregation (HSP100) are up-regulated in response to mutational load, while other chaperone machinery that salvage proteins (Small HS) are downregulated. In addition, we find degradation of mutated proteins through up-regulation of the proteasome to be another possible strategy high mutational load tumors use to mitigate protein misfolding stress.

Finally, we find additional mechanisms that high mutational load tumors use to not just mitigate but also prevent protein misfolding. By utilizing post-transcriptional processes that couple alternative splicing with mRNA decay pathways known to occur in normal human tissues^22,44,45, high mutational load tumors appear to selectively prevent protein production by regulating certain pre-mRNA transcripts to be degraded rather than translated. We find evidence suggesting that the targets of this coordinated un-productive splicing are primarily related to cytoplasmic ribosomal gene expression that controls the translation of proteins, consistent with observations in other organisms^46–48. Intriguingly, we find that while cytoplasmic ribosome expression is attenuated, mitochondrial ribosome biogenesis in human tumors is up-regulated in response to mutational load. This could both be another mechanism that high mutational load tumors use to compartmentalize and degrade proteins²⁷ and reflect the increased energetic demands of proteostasis maintenance⁴⁹.

The expression responses observed here are not only consistent with protein misfolding stress in other organisms, but also cross-validate in cancer cell lines, where we find similar expression responses to mutational load. This provides further evidence of a generic, cell intrinsic phenomenon occurring that cannot be explained by extrinsic organismal effects, such as aging, changes in the immune system or microenvironment. Furthermore, we move beyond correlations of gene expression responses to mutational load and establish a causal connection by demonstrating that mitigation of protein misfolding through protein degradation and re-folding is necessary for high mutational load cancer cells to maintain viability through perturbation experiments via knockdown experiments with shRNA and drug profiling.

The results presented here have many implications. First, they suggest that while there is direct selection during somatic evolution for pathogenic drivers that allow cancer cells to continually proliferate, damaging passengers that destabilize proteins must also cause cancer cells to experience second-order indirect selection for alterations that allow tumors to overcome this proteostasis imbalance. This could occur through phenotypic plasticity, shifts in methylation and chromatin structure, or through compensatory point mutations and duplications, consistent with other studies^50,51. Indeed, gene duplication, where one copy can still perform the required function while the other copy is non-functional, is another known mechanism that allows cells to maintain robustness to damaging mutations in many eukaryotic organisms^52,53. In support of this, whole genome-duplication, which is common in cancer, has recently been shown as another potential mechanism that tumor cells could use to maintain robustness to deleterious passengers⁵⁴. However, duplication events are also known to be deleterious due to gene dosage effects that cause protein imbalance⁵⁵, which could further exacerbate protein misfolding. Further experimental studies are needed to distinguish how cancer cells compensate for protein misfolding and the role that genome duplication may play in this process.

Second, the extra demands of proteostasis maintenance presents important vulnerabilities in high mutational load cancers that could be exploited. The clinical use of chaperone inhibitors for cancer treatment has been explored for over two decades^56–58 but no study, to our knowledge, has compared the efficacy of chaperone inhibitor use in tumors stratified by mutational load. Similarly, the clinical use of proteasome inhibitors, which are currently only approved for the treatment of multiple myeloma and mantle-cell lymphoma^59,60, has not been directed specifically to high mutational load tumors. While the efficacy of proteasome inhibitors in multiple myeloma patients is linked to the protein misfolding stress response^61,62, it is currently unknown whether high mutational load tumors are more susceptible to these inhibitors. Outside of drugs in the clinic, the need for cancers to compensate for protein misfolding could also present additional vulnerabilities due to evolutionary trade-offs, where the improvement in fitness of one trait comes at the expense of another. Previous work in yeast has identified strong trade-offs between the adaptive mechanisms that allow for the tolerance of mistranslation and survival under conditions of starvation⁴⁹. Whether similar conditions could be exploited in high mutational load cancer cells warrants additional further investigation.

Finally, our results contribute to an accumulating body of evidence that cancer and aging are different manifestations of related underlying evolutionary processes^63–65. The same forces of mutation and inefficient selection in somatic evolution generates a persistent problem of deleterious mutation accumulation in normal somatic tissues and during tumor growth. Disruption of proteostasis is a known hallmark of aging in normal tissues⁶⁶. Many transcriptional responses observed in high mutational load tumors — such as shifts in regulation of alternative splicing⁶⁷, protein degradation⁶⁸, and protein re-folding⁶⁹— are also observed in normal aging tissues which contain somatic mutations. Despite this, aging tissues appear to utilize different strategies to deal with proteostasis disruption — such as up-regulation of chaperones in the Small HS family⁷⁰and autophagy⁷¹— which are not a pre-dominant response observed here in high mutational load tumors. Whether different combinations of strategies are used by high mutational load cancer cells use to overcome their mutational load or whether all the strategies identified here are needed to maintain proteostasis is unclear. Differences in these proteostasis strategies could be due to different selection pressures during somatic evolution, the degree of mutational load required to induce a stress response, differences in energetic costs of protein maintenance, or the interplay that exists between apoptosis and proteostasis. Further studies are needed to elucidate the precise dynamics and physiological consequences of inefficient negative selection in somatic evolution, how this impacts cellular growth, and the mechanisms somatic cells use to maintain robustness to proteostasis disruption.

Acknowledgements

We thank Kathleen Houlahan, Chuan Li, José Aguilar-Rodríguez and other members of Petrov and Curtis labs for their helpful comments and discussions. S.T. was supported in part by an NIH training grant T32-HG000044-21. This work was supported in part by the National Institutes of Health (NIH) Director’s Pioneer Award: DP1CA238296 and the National Cancer Institute (NCI) Cancer Target Discovery and Development Center (U01CA217851) to C.C and by NIH grants R35GM118165 to D.A.P and GM74074 to J.F.

Conflicts of Interest

C.C. is an advisor and stockholder in Grail, Ravel, DeepCell and an advisor to Genentech, Bristol Myers Squibb, 3T Biosciences and NanoString. D.A.P. is a founder of, and stockholder equity in, D2G Oncology Inc.

Methods

Data availability and resources

Whole-exome, somatic mutation calls of 10,486 cancer patients across 33 cancer types in The Cancer Genome Atlas (TCGA) were downloaded from the Multi-Center Mutation Calling in Multiple Cancers (MC3) project¹⁶(https://gdc.cancer.gov/about-data/publications/mc3-2017). For the same patients in TCGA, RNA-seq data of log₂ transformed RSEM normalized counts were downloaded from the UCSC Xena Browser⁷² (https://xenabrowser.net/datapages/) and copy number alterations (CNAs), including amplifications and deletions, called via ABSOLUTE were downloaded from COSMIC (v91)⁷³ (https://cancer.sanger.ac.uk/cosmic/download). Tumor purity estimates for TCGA were downloaded from the Genomic Database Commons (GDC)⁷⁴ (https://gdc.cancer.gov/about-data/publications/pancanatlas). Data for all cancer cell lines in the Cancer Cell Line Encyclopedia (CCLE) were downloaded from DepMap³⁰ (https://depmap.org/portal/download/all/). Specifically, mutation calls (Version 21Q3) from whole-exome sequencing data, copy number alternations quantified by ABSOLUTE (Version CCLE 2019), log₂ transformed TPM normalized counts (Version 21Q3) from RNA-seq data, proteomics data quantified by mass spectrometry⁷⁵, shRNA data from project Achilles³² normalized using DEMETER (DEMETER2 Data v6), and primary drug sensitivity screens of replicate collapsed log fold changes relative to DMSO from project PRISM³⁴ (Version 19Q4) were used.

Statistical analysis

The lmerTest and lmer package in R was used to apply a separate generalized linear mixed model (GLMM) for each gene in the genome to identify groups of genes whose expression is up-regulated in response to mutational load in TCGA. For each gene, expression values across all patients were z-score normalized in all analyses to ensure fair comparisons across genes. Known co-variates of tumor purity and cancer type were included in the GLMM. Tumor purity and mutational load were modeled as fixed effects, whereas cancer type was modeled as a random effect (i.e. random intercept) since it varies across groups of patients and can be interpreted as repeated measurements across groups. For all analyses, mutational load was defined as log₁₀ of the number of synonymous, nonsynonymous and nonsense mutations per tumor. For each gene, the parameters used in the GLMM were as follows,

where Y is a vector of expression values of each tumor, β₀ is the fixed intercept, β₁ is the fixed slope for the predictor variable X₁ which is a vector of mutational load values for each tumor, β₂ is the fixed slope for the predictor variable X₂ which is a vector of the purity of each tumor, v is the random intercept for each cancer type, and e is a Gaussian error term. To examine expression responses to mutational load within a given protein complex and cancer type, the same normalization procedures were applied as above within cancer types and a separate GLM for each cancer type was ran as follows,

where Y is a vector of expression values of each tumor in a given cancer type, β₀ is the fixed intercept, β₁ is the fixed slope for the predictor variable X₁ which is a vector of mutational load values for each tumor, β₂ is the fixed slope for the predictor variable X₂ which is a vector of the purity of each tumor, β₃ is a change in the intercept for X₃ which is a categorical variable of individual genes within each proteostasis complex and e is a Gaussian error term.

Unlike TCGA, samples within each cancer type in CCLE can be small and are unbalanced (i.e. some cancer types have <10 samples and others have >100 samples). In these cases, mixed effects models may not be able to estimate among-population variance accurately³¹. Thus, for all regression-based analyses in CCLE, a simple generalized linear model (GLM) was used instead. Cell viability values across all cell lines were z-score normalized by gene in all analyses to ensure fair comparisons across genes. To assess whether the same sets of genes are up-regulated in response to mutational load in CCLE using the GLM, a similar procedure to the GLMM was performed. A separate GLM was applied for each gene with the following parameters,

where Y is a vector normalized expression values of each cell line, β₀ is the fixed intercept, β₁ is the fixed slope for the predictor variable X₁ which is a vector of mutational load values for each tumor, and e is a Gaussian error term. To assess whether protein abundances are similarly up-regulated in response to mutational load in CCLE in proteostasis complexes, a separate GLM was applied to each gene with the following parameters,

where Y is a vector of protein abundance values within each cell line, β₀ is the fixed intercept, β₁ is the fixed slope for the predictor variable X₁ which is a vector of mutational load values for each tumor, and e is a Gaussian error term. A similar GLM framework as above was used to estimate the association of mutational load and cell viability after shRNA knock-down of individual genes in proteostasis complexes with the following parameters,

where Y is a vector of normalized cell viability estimates after expression knock-down of an individual gene across all cell lines, β₀ is the fixed reference intercept, β₁ is the fixed slope for the predictor variable X₁ which is a vector of mutational load values for each cell line, β2 is a change in the intercept for X₂ which is a categorical variable of individual genes within each proteostasis complex, and e is a Gaussian error term. To estimate the association of mutational load and cell viability after pharmacologic inhibition of proteostasis machinery, the following GLM was applied to each relevant drug in PRISM:

where Y is a vector normalized cell viability estimates after drug inhibition across all cell ines, β₀ is the fixed intercept, β₁ is the fixed slope for the predictor variable X₁ which is a vector of mutational load values for each tumor, and e is a Gaussian error term.

Model validation

For both the GLM and GLMM, model assumptions of homogeneity of variance were verified by plotting residuals versus fitted values in the model and residuals versus each covariate in the model. Multi-collinearity with other mutational classes (e.g. such as copy number alterations, CNAs) were considered but not found to correlate with point mutations (Supplemental Figure 1). A jackknife re-sampling procedure was used for outlier analysis and to determine whether specific cancer types are driving patterns of association within the GLM and GLMM. Briefly, each cancer type was removed from the regression model one at a time, and regression coefficients were re-estimated. Overall, regression coefficients were fairly stable across cancer types and trends remained the same (Supplemental Figure 5 and 8).

Proteostasis gene sets

Genes for chaperone complexes were identified from⁷⁶ and genes that are co-chaperones were not considered. Proteasome and ribosomal complexes were identified from CORUM¹⁷.

Gene set enrichment analysis

All gene set enrichment analysis was performed using gprofiler2 with default parameters. For all sets of genes, significance was determined after correcting for multiple hypothesis testing (FDR < 0.05). For gene set enrichment analysis used to identify genes up-regulated in TCGA in response to mutational load, all terms in CORUM database were reported and enrichment terms in the KEGG database of diseases not related to cancer (e.g. ‘Influenza A’) were omitted from the main figures for clarity and space. For gene sets used to identify terms differentially splice in between high and low mutational load tumors, all terms in the CORUM and the REACTOME database were reported in the main figures. The full set of enrichment terms for all analyses is reported in Supplemental Table 1.

Alternative splicing analysis

Alternative splicing events were quantified through a previously established metric called PSI. PSI is calculated as the number of reads that overlap the alternative splicing event (e.g. for intron retention, either at intronic regions or those at the boundary of exon to intron junctions) divided by the total number of reads that support and don’t support the alternative splicing event. PSI summarizes alternative splicing events at specific exonic boundaries in the entire transcript population without needing to know the complete underlying composition of each full length-transcript.

Alternative splicing calls for all tumors in TCGA were downloaded from TCGA SpliceSeq²⁵. Default splice event filters (percentage of samples with PSI values >75%) from the database were applied. To test whether gene expression is down-regulated in high mutational load tumors through alternative splicing, we calculated whether alternative splicing events were present based on different threshold values of percent spliced in (PSI) from 90% to 50%. (Supplemental Figure 4). For each alternative splicing event in a gene, we quantified whether the expression of the same gene was under-expressed. Each gene was counted as under-expressed if it was one standard deviation below the mean expression within each cancer type. Genes that contained a point mutation within the same alternative splicing event were removed to control for eQTL effects. We note that intron retention events removed from this analysis represent only ∼1% of intron retention events across all tumors and similar trends are found when this filtering scheme is not applied (Supplemental Figure 3). In addition, we evaluated whether this trend is robust to other alternative splicing events (i.e., Alternate Donor Sites, Alternate Promoters, Alternate Terminators, Exon Skipping Events, ME=Mutually Exclusive Exon; Supplemental Figure 4).

To investigate which genes are differentially spliced in between low and high mutational load tumors for specific alternative splicing events (i.e. intron retention), a t-test was used to calculate whether PSI values were significantly different in tumors with < 10 protein-coding mutations compared to tumors with > 1000 protein-coding mutations. Each alternative splicing event within a gene was required to have less than 25% of missing PSI values and a mean difference between the two groups of >0.01 to be considered. This approach identified 606 and 201 significant genes that have more and fewer intron retention events in high mutational load tumors, respectively, after correcting for multiple hypothesis testing (FDR < 0.05).

Drug category annotation and enrichment analysis

A separate GLM was ran for all drugs in the PRISM database to evaluate whether they are associated with mutational load and cell viability. All drugs that were negatively associated with mutational load and viability were queried on PubMed based on their reported mechanism of action in PRISM and grouped into broad categories (Supplemental Table 1). Categories of drug mechanism of action were first chosen based on their role in metabolism and known hallmarks of cancer. Additional categories not directly related to known cancer associated functional groups were made for drugs that could not otherwise be grouped (i.e. ‘Ion Channel Regulation’, Viral Replication Inhibitor’, etc.). Drugs with ambiguous mechanism of action (e.g. ‘cosmetic’, ‘coloring agent’) were grouped into ‘Other’. The abstracts of up to 10 associated papers were used to examine for evidence connecting drug mechanisms of action to 33 broad categories. In total, 700 drug mechanism of action were grouped and annotated into 33 broad categories. These broad categories were used to assess whether high mutational load cancer cell lines are generically vulnerable to drugs or whether certain categories are more likely to contain drugs effective against high mutational load cell lines. To control for differences in the number of drugs within each category, 50 drugs were randomly sampled, and the fraction of drugs significantly associated with mutational load in each category was calculated 100 times to generate confidence intervals.

Code and software availability

All code used for analysis will be made publicly available on Github under the open-source MIT License upon publication.

Supplemental Figures

No collinearity of point mutations and copy number alterations in human tumors (TCGA) and cancer cell lines (CCLE).
Heatmap of Pearson’s correlation coefficients between different classes of mutations in A. CCLE (cancer cell lines) and B. TCGA (human tumors). Colors denote magnitude of correlation coefficients and whether the relationship is positive (red), negative (blue) or negligible (white). CNAs are defined as the combined number of amplifications and deletions, while SNVs are the combined number of all point mutations.

Genes significantly expressed from the transcriptional screen mostly fall into the upper quartile of effect sizes, which are enriched for proteostasis complexes.
A. Volcano plot of positive β₁ regression coefficients and negative log₁₀ adjusted p-values measuring the association of mutation load and the expression of individual genes from the transcriptional screen in Fig.1A. Colors denote the lower (Q1 in red), median (Q2 in green) and upper (Q3 in red) quartiles of each positive beta coefficient from the regression model. Genes that are significantly expressed from the transcriptional screen mostly fall into the upper quartile. B. Barplot of significant protein complexes in the CORUM database identified using gene set enrichment analysis only on genes that fall into the upper quartile of effect sizes. Genes in the upper quartile of effect sizes contain half of the genes that were identified as significant previously (n=2,152 vs n=5,330), yet still identify protein degradation, translation and chaperones as the top significant protein complexes.

Intron retention events that overlap with mutations do not account for the association of gene silencing in high mutational load tumors.
A. Counts of the number of intron retention events filtered (in red) due to overlap with a mutation present in the same gene (and thus corresponding to potential eQTLs) compared the number of remaining alternative splicing events with no overlap with a mutation (in blue). Alternative splicing events filtered represent ∼1% of all alternative splicing events across all tumors. **B-C**. Counts of the number of under-expressed transcripts with intron retention events, relative to counts of all intron retention events in tumors binned by the total number of protein-coding mutations. Shown are when trends when **(B)** not filtering alternative splicing events due to overlap with mutations and **(C)** when events are filtered (same as Fig. 2A). Intron retention events with PSI > 80% are counted. Error bars are 95% confidence intervals determined by bootstrap sampling. These results further support the prediction that gene silencing is elevated in high mutational load tumors and likely mediated by the coupling of intron retention with mRNA decay

The number of under-expressed transcripts increases with the mutational load of tumors for different PSI value thresholds and alternative splicing events.
A. Counts of the number of under-expressed transcripts with intron retention events, relative to counts of all intron retention events in tumors binned by the total number of protein-coding mutations. Intron retention events with different PSI thresholds are shown colored. B. Counts of the number of under-expressed transcripts that contain different classes alternative splicing events, relative to counts of all alternative splicing events of the same class in tumors binned by the total number of protein-coding mutations. Alternative splicing events of different classes are shown colored (AA=Alternate Acceptor Sites, AD=Alternate Donor Sites, AP=Alternate Promoter, AT=Alternate Terminator, ES=Exon Skip, ME=Mutually Exclusive Exons, RI= Retained Intron). Error bars are 95% confidence intervals determined by bootstrap sampling.

Association between expression in proteostasis complexes and mutational load is not driven by a single cancer type in TCGA.
Box plots of regression coefficients from the GLMM measuring the association of the expression of each individual gene with the mutational load of tumors in TCGA colored by different proteostasis complexes. Shown are regression estimates after removing each individual cancer type (x-axis) and re-running the GLMM.

Linear regression analysis within cancer types in TCGA captures similar expression responses to mutational load across proteostasis complexes.
Heatmap of β₁ regression coefficients measuring the effect of mutational load on gene expression in proteostasis complexes while controlling for tumor purity within cancer types which have enough samples to accurately measure effect sizes (N>150) and contain a sufficiently large enough mutational load to potentially generate a proteostasis response (median protein coding mutations > 25). ‘MutLoad’ shows log10 of the median number of protein coding mutations for each cancer type.

Association between the expression in proteostasis complexes and mutational load is not driven by patient age.
Boxplots of regression coefficients from the GLMM measuring the association of the expression of each individual gene with the mutational load of tumors from TCGA colored by different proteostasis complexes. Shown are regression coefficients when running the GLMM on tumors stratified by different age groups (x-axis).

Association between the expression in proteostasis complexes and mutational load is not driven by a single cancer type in CCLE.
Box plots of regression coefficients from the GLM measuring the association of the expression of each individual gene with the mutational load of tumors colored by different proteostasis complexes. Shown are regression estimates after removing each cancer type in CCLE (x-axis) and re-running the GLM.

Similar patterns of expression and protein abundances in response to mutational load in CCLE within genes that regulate protein folding, degradation, and synthesis.
Box plots of β₁ regression coefficients measuring the association of mutation load and protein abundance (right) or gene expression (left) of individual genes in chaperone (purple), proteasome (yellow), and ribosome (green) complexes. Shown are regression coefficients from cancer cell lines (CCLE), which contains the largest dataset available of RNA (n=1377) and protein (n=373) abundances which are harmonized across samples. Percentages and grey lines on top panels show the quantile distribution of regression coefficients measuring the association of mutational load and expression for all genes in the genome within each dataset.

References

1.
1. Lawrence M. S.
2. et al.
2013Mutational heterogeneity in cancer and the search for new cancer-associated genesNature https://doi.org/10.1038/nature12213 Google Scholar
2.
1. Bakhoum S. F.
2. Landau D. A.
2017Cancer Evolution: No Room for Negative SelectionCell https://doi.org/10.1016/j.cell.2017.10.039 Google Scholar
3.
1. Martincorena I.
2. et al.
2017Universal Patterns of Selection in Cancer and Somatic TissuesCell https://doi.org/10.1016/j.cell.2017.09.042 Google Scholar
4.
1. Bozic I.
2. Paterson C.
3. Waclaw B.
2019On measuring selection in cancer from subclonal mutation frequenciesPLoS Comput. Biol https://doi.org/10.1371/journal.pcbi.1007368 Google Scholar
5.
1. McFarland C. D.
2. et al.
2017The damaging effect of passenger mutations on cancer progressionCancer Res https://doi.org/10.1158/0008-5472.CAN-15-3283-T Google Scholar
6.
1. McFarland C. D.
2. Korolev K. S.
3. Kryukov G. V.
4. Sunyaev S. R.
5. Mirny L. A.
2013Impact of deleterious passenger mutations on cancer progressionProc. Natl. Acad. Sci. U. S. A https://doi.org/10.1073/pnas.1213968110 Google Scholar
7.
1. Zapata L.
2. et al.
2018Negative selection in tumor genome evolution acts on essential cellular functions and the immunopeptidomeGenome Biol https://doi.org/10.1186/s13059-018-1434-0 Google Scholar
8.
1. Williams M. J.
2. Werner B.
3. Barnes C. P.
4. Graham T. A.
5. Sottoriva A.
2016Identification of neutral tumor evolution across cancer typesNat. Genet https://doi.org/10.1038/ng.3489 Google Scholar
9.
1. Neutral Theory and the Somatic Evolution of Cancer
2018Mol. Biol. Evolhttps://doi.org/10.1093/molbev/msy100 Google Scholar
10.
1. Tilk S.
2. Curtis C.
3. Petrov D. A.
4. McFarland C. D.
2019Most cancers carry a substantial deleterious load due to Hill-Robertson interferencebioRxiv https://doi.org/10.1101/764340 Google Scholar
11.
1. Eyre-Walker A.
2. Keightley P. D.
2007The distribution of fitness effects of new mutationsNature Reviews Genetics https://doi.org/10.1038/nrg2146 Google Scholar
12.
1. Martincorena I.
2. et al.
2017Universal Patterns of Selection in Cancer and Somatic TissuesCell https://doi.org/10.1016/j.cell.2017.09.042 Google Scholar
13.
1. Drummond D. A.
2. Wilke C. O.
2008Mistranslation-induced protein misfolding as a dominant constraint on coding-sequence evolutionCell 134:341–52Google Scholar
14.
1. Hill W. G.
2. Robertson A.
2008The effect of linkage on limits to artificial selectionGenet. Res. (Camb) https://doi.org/10.1017/S001667230800949X Google Scholar
15.
1. Weinstein J. N.
2. et al.
2013The cancer genome atlas pan-cancer analysis projectNat. Genet https://doi.org/10.1038/ng.2764 Google Scholar
16.
1. Ellrott K.
2. et al.
2018Scalable Open Science Approach for Mutation Calling of Tumor Exomes Using Multiple Genomic PipelinesCell Syst 6:271–281Google Scholar
17.
1. Giurgiu M.
2. et al.
2019CORUM: The comprehensive resource of mammalian protein complexes - 2019Nucleic Acids Res https://doi.org/10.1093/nar/gky973 Google Scholar
18.
1. Tanabe M.
2. Kanehisa M.
2012Using the KEGG database resourceCurr. Protoc. Bioinforma https://doi.org/10.1002/0471250953.bi0112s38 Google Scholar
19.
1. Peterson H.
2. Kolberg L.
3. Raudvere U.
4. Kuzmin I.
5. Vilo J.
2020gprofiler2 -- an R package for gene list functional enrichment analysis and namespace conversion toolset g: ProfilerF1000Research https://doi.org/10.12688/f1000research.24956.2 Google Scholar
20.
1. Biamonti G.
2. Caceres J. F.
2009Cellular stress and RNA splicingTrends in Biochemical Sciences https://doi.org/10.1016/j.tibs.2008.11.004 Google Scholar
21.
1. Dutertre M.
2. Sanchez G.
3. Barbier J.
4. Corcos L.
5. Auboeuf D.
2011The emerging role of premessenger RNA splicing in stress responses: Sending alternative messages and silent messengersRNA Biology https://doi.org/10.4161/rna.8.5.16016 Google Scholar
22.
1. Ge Y.
2. Porse B. T.
2014The functional consequences of intron retention: Alternative splicing coupled to NMD as a regulator of gene expressionBioEssays https://doi.org/10.1002/bies.201300156 Google Scholar
23.
1. Lindeboom R. G. H.
2. Supek F.
3. Lehner B.
2016The rules and impact of nonsense-mediated mRNA decay in human cancersNat. Genet https://doi.org/10.1038/ng.3664 Google Scholar
24.
1. Lareau L. F.
2. Inada M.
3. Green R. E.
4. Wengrod J. C.
5. Brenner S. E.
2007Unproductive splicing of SR genes associated with highly conserved and ultraconserved DNA elementsNature https://doi.org/10.1038/nature05676 Google Scholar
25.
1. Ryan M.
2. et al.
2016TCGASpliceSeq a compendium of alternative mRNA splicing in cancerNucleic Acids Res https://doi.org/10.1093/nar/gkv1288 Google Scholar
26.
1. Sun Y.
2. MacRae T. H.
2005The small heat shock proteins and their role in human diseaseFEBS Journal https://doi.org/10.1111/j.1742-4658.2005.04708.x Google Scholar
27.
1. Ruan L.
2. et al.
2017Cytosolic proteostasis through importing of misfolded proteins into mitochondriaNature https://doi.org/10.1038/nature21695 Google Scholar
28.
1. Shcherbakov D.
2. et al.
2019Ribosomal mistranslation leads to silencing of the unfolded protein response and increased mitochondrial biogenesisCommun. Biol https://doi.org/10.1038/s42003-019-0626-9 Google Scholar
29.
1. Stein K. C.
2. Frydman J.
2019The stop-and-go traffic regulating protein biogenesis: How translation kinetics controls proteostasisJournal of Biological Chemistry https://doi.org/10.1074/jbc.REV118.002814 Google Scholar
30.
1. Barretina J.
2. et al.
2012The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivityNature https://doi.org/10.1038/nature11003 Google Scholar
31.
1. Harrison X. A.
2. et al.
2018A brief introduction to mixed effects modelling and multi-model inference in ecologyPeerJ https://doi.org/10.7717/peerj.4794 Google Scholar
32.
1. Tsherniak A.
2. et al.
2017Defining a Cancer Dependency MapCell https://doi.org/10.1016/j.cell.2017.06.010 Google Scholar
33.
1. Zolkiewski M.
2. Zhang T.
3. Nagy M.
2012Aggregate reactivation mediated by the Hsp100 chaperonesArchives of Biochemistry and Biophysics https://doi.org/10.1016/j.abb.2012.01.012 Google Scholar
34.
1. Corsello S. M.
2. et al.
2020Discovering the anticancer potential of non-oncology drugs by systematic viability profilingNat. Cancer https://doi.org/10.1038/s43018-019-0018-6 Google Scholar
35.
1. Kristofich J. C.
2. et al.
2018Synonymous mutations make dramatic contributions to fitness when growth is limited by a weak-link enzymePLoS Genet https://doi.org/10.1371/journal.pgen.1007615 Google Scholar
36.
1. Spencer P. S.
2. Siller E.
3. Anderson J. F.
4. Barral J. M.
2012Silent substitutions predictably alter translation elongation rates and protein folding efficienciesJ. Mol. Biol https://doi.org/10.1016/j.jmb.2012.06.010 Google Scholar
37.
1. Zhang G.
2. Hubalewska M.
3. Ignatova Z.
2009Transient ribosomal attenuation coordinates protein synthesis and co-translational foldingNat. Struct. Mol. Biol https://doi.org/10.1038/nsmb.1554 Google Scholar
38.
1. Walsh I. M.
2. Bowman M. A.
3. Soto Santarriaga I. F.
4. Rodriguez A.
5. Clark P. L.
2020Synonymous codon substitutions perturb cotranslational protein folding in vivo and impair cell fitnessProc. Natl. Acad. Sci. U. S. A https://doi.org/10.1073/pnas.1907126117 Google Scholar
39.
1. Plotkin J. B.
2. Kudla G.
2011Synonymous but not the same: The causes and consequences of codon biasNature Reviews Genetics https://doi.org/10.1038/nrg2899 Google Scholar
40.
1. Bobula J.
2. et al.
2006Why molecular chaperones buffer mutational damage: A case study with a yeast Hsp40/70 systemGenetics https://doi.org/10.1534/genetics.106.061564 Google Scholar
41.
1. Maisnier-Patin S.
2. et al.
2005Genomic buffering mitigates the effects of deleterious mutations in bacteriaNat. Genet https://doi.org/10.1038/ng1676 Google Scholar
42.
1. Fares M. A.
2. Ruiz-González M. X.
3. Moya A.
4. Elena S. F.
5. Barrio E.
2002Endosymbiotic bacteria: GroEL buffers against deleterious mutationsNature https://doi.org/10.1038/417398a Google Scholar
43.
1. Elena S. F.
2. Carrasco P.
3. Daròs J. A.
4. Sanjuán R.
2006Mechanisms of genetic robustness in RNA virusesEMBO Reports https://doi.org/10.1038/sj.embor.7400636 Google Scholar
44.
1. Pan Q.
2. et al.
2006Quantitative microarray profiling provides evidence against widespread coupling of alternative splicing with nonsense-mediated mRNA decay to control gene expressionGenes Dev https://doi.org/10.1101/gad.1382806 Google Scholar
45.
1. Green R. E.
2. et al.
2003Widespread predicted nonsense-mediated mRNA decay of alternatively-spliced transcripts of human normal and disease genesin Bioinformatics https://doi.org/10.1093/bioinformatics/btg1015 Google Scholar
46.
1. Cuccurese M.
2. Russo G.
3. Russo A.
4. Pietropaolo C.
2005Alternative splicing and nonsense-mediated mRNA decay regulate mammalian ribosomal gene expressionNucleic Acids Res https://doi.org/10.1093/nar/gki905 Google Scholar
47.
1. Mitrovich Q. M.
2. Anderson P.
2000Unproductively spliced ribosomal protein mRNAs are natural targets of mRNA surveillance in C. elegansGenes Dev https://doi.org/10.1101/gad.819900 Google Scholar
48.
1. Parenteau J.
2. et al.
2011Introns within ribosomal protein genes regulate the production and function of yeast ribosomesCell https://doi.org/10.1016/j.cell.2011.08.044 Google Scholar
49.
1. Kalapis D.
2. et al.
2015Evolution of Robustness to Protein Mistranslation by Accelerated Protein TurnoverPLoS Biol https://doi.org/10.1371/journal.pbio.1002291 Google Scholar
50.
1. Tokheim C.
2. et al.
2021Systematic characterization of mutations altering protein degradation in human cancersMol. Cell https://doi.org/10.1016/j.molcel.2021.01.020 Google Scholar
51.
1. Martínez-Jiménez F.
2. Muiños F.
3. López-Arribillaga E.
4. Lopez-Bigas N.
5. Gonzalez-Perez A.
2020Systematic analysis of alterations in the ubiquitin proteolysis system reveals its contribution to driver mutations in cancerNat. Cancer https://doi.org/10.1038/s43018-019-0001-2 Google Scholar
52.
1. Conant G. C.
2. Wagner A.
2004Duplicate genes and robustness to transient gene knock-downs in Caenorhabditis elegansProc. R. Soc. B Biol. Sci https://doi.org/10.1098/rspb.2003.2560 Google Scholar
53.
1. Gu Z.
2. et al.
2003Role of duplicate genes in genetic robustness against null mutationsNature Google Scholar
54.
1. López S.
2. et al.
2020Interplay between whole-genome doubling and the accumulation of deleterious alterations in cancer evolutionNat. Genet https://doi.org/10.1038/s41588-020-0584-7 Google Scholar
55.
1. Torres E. M.
2. et al.
2007Effects of aneuploidy on cellular physiology and cell division in haploid yeastScience (80- https://doi.org/10.1126/science.1142210 Google Scholar
56.
1. Neckers L.
2. Workman P.
2012Hsp90 molecular chaperone inhibitors: Are we there yet?Clinical Cancer Research https://doi.org/10.1158/1078-0432.CCR-11-1000 Google Scholar
57.
1. Pacey S.
2. Banerji U.
3. Judson I.
4. Workman P.
2006Hsp90 inhibitors in the clinicHandbook of experimental pharmacology Google Scholar
58.
1. Kim Y.
2. et al.
2009Update on Hsp90 Inhibitors in Clinical TrialCurr. Top. Med. Chem https://doi.org/10.2174/156802609789895728 Google Scholar
59.
1. Manasanch E. E.
2. Orlowski R. Z.
2017Proteasome inhibitors in cancer therapyNature Reviews Clinical Oncology https://doi.org/10.1038/nrclinonc.2016.206 Google Scholar
60.
1. Park J. E.
2. Miller Z.
3. Jun Y.
4. Lee W.
5. Kim K. B.
2018Next-generation proteasome inhibitors for cancer therapyTranslational Research https://doi.org/10.1016/j.trsl.2018.03.002 Google Scholar
61.
1. Bianchi G.
2. et al.
2009The proteasome load versus capacity balance determines apoptotic sensitivity of multiple myeloma cells to proteasome inhibitionBlood https://doi.org/10.1182/blood-2008-08-172734 Google Scholar
62.
1. Ling S. C. W.
2. et al.
2012Response of myeloma to the proteasome inhibitor bortezomib is correlated with the unfolded protein response regulator XBP-1Haematologica https://doi.org/10.3324/haematol.2011.043331 Google Scholar
63.
1. Martincorena I.
2. et al.
2018Somatic mutant clones colonize the human esophagus with ageScience (80- https://doi.org/10.1126/science.aau3879 Google Scholar
64.
1. Martincorena I.
2. et al.
2015High burden and pervasive positive selection of somatic mutations in normal human skinScience (80- https://doi.org/10.1126/science.aaa6806 Google Scholar
65.
1. Kennedy S. R.
2. Zhang Y.
3. Risques R. A.
2019Cancer-Associated Mutations but No Cancer: Insights into the Early Steps of Carcinogenesis and Implications for Early Cancer DetectionTrends in Cancer https://doi.org/10.1016/j.trecan.2019.07.007 Google Scholar
66.
1. López-Otín C.
2. Blasco M. A.
3. Partridge L.
4. Serrano M.
5. Kroemer G.
2013The hallmarks of agingCell https://doi.org/10.1016/j.cell.2013.05.039 Google Scholar
67.
1. Adusumalli S.
2. Ngian Z. K.
3. Lin W. Q.
4. Benoukraf T.
5. Ong C. T.
2019Increased intron retention is a post-transcriptional signature associated with progressive aging and Alzheimer’s diseaseAging Cell https://doi.org/10.1111/acel.12928 Google Scholar
68.
1. Löw P.
2011The role of ubiquitin-proteasome system in ageingGeneral and Comparative Endocrinology https://doi.org/10.1016/j.ygcen.2011.02.005 Google Scholar
69.
1. Soti C.
2. Csermely P.
2003Aging and molecular chaperonesExp. Gerontol https://doi.org/10.1016/S0531-5565(03)00185-2 Google Scholar
70.
1. Charmpilas N.
2. Kyriakakis E.
3. Tavernarakis N.
2017Small heat shock proteins in ageing and age-related diseasesCell Stress and Chaperones https://doi.org/10.1007/s12192-016-0761-x Google Scholar
71.
1. Aman Y.
2. et al.
2021Autophagy in healthy aging and diseaseNat. Aging https://doi.org/10.1038/s43587-021-00098-4 Google Scholar
72.
1. Goldman M. J.
2. et al.
2020Visualizing and interpreting cancer genomics data via the Xena platformNature Biotechnology https://doi.org/10.1038/s41587-020-0546-8 Google Scholar
73.
1. Bamford S.
2. et al.
2004The COSMIC (Catalogue of Somatic Mutations in Cancer) database and websiteBr. J. Cancer https://doi.org/10.1038/sj.bjc.6601894 Google Scholar
74.
1. Grossman R. L.
2. et al.
2016Toward a Shared Vision for Cancer Genomic DataN. Engl. J. Med 375:1109–1112Google Scholar
75.
1. Nusinow D. P.
2. et al.
2020Quantitative Proteomics of the Cancer Cell Line EncyclopediaCell https://doi.org/10.1016/j.cell.2019.12.023 Google Scholar
76.
1. Hadizadeh Esfahani A.
2. Sverchkova A.
3. Saez-Rodriguez J.
4. Schuppert A. A.
5. Brehme M.
2018A systematic atlas of chaperome deregulation topologies across the human cancer landscapePLoS Comput. Biol https://doi.org/10.1371/journal.pcbi.1005890 Google Scholar

Article and author information

Author information

Susanne Tilk
Department of Biology, Stanford University, Stanford, CA, USA
ORCID iD: 0000-0002-9156-9360
- Correspondence to: tilk@stanford.edu, cncurtis@stanford.edu, petrov@stanford.edu
Judith Frydman
Department of Biology, Stanford University, Stanford, CA, USA
ORCID iD: 0000-0003-2302-6943
Christina Curtis
Department of Medicine, Division of Oncology, Stanford University School of Medicine, Stanford, CA, USA, Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA, Stanford Cancer Institute, Stanford University School of Medicine, Stanford, CA, USA
ORCID iD: 0000-0003-0166-3802
- Correspondence to: tilk@stanford.edu, cncurtis@stanford.edu, petrov@stanford.edu
Dmitri Petrov
Department of Biology, Stanford University, Stanford, CA, USA, Stanford Cancer Institute, Stanford University School of Medicine, Stanford, CA, USA
ORCID iD: 0000-0002-3664-9130
- Correspondence to: tilk@stanford.edu, cncurtis@stanford.edu, petrov@stanford.edu

Version history

Preprint posted: February 28, 2023
Sent for peer review: March 12, 2023
Reviewed Preprint version 1: May 23, 2023
Version of Record published: November 25, 2024

Cite all versions

You can cite all versions using the DOI https://doi.org/10.7554/eLife.87301. This DOI represents all versions, and will always resolve to the latest one.

Copyright

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

views: 4,111
downloads: 219
citations: 6

Views, downloads and citations are aggregated across all versions of this paper published by eLife.

Significance of findings

Strength of evidence

Abstract

Statement of Significance

Introduction

Results

Quantifying transcriptional response to mutational load in human tumors

General linear mixed effects model (GLMM) identifies protein complexes and pathways up-regulated in response to mutational load in human tumors.

Gene silencing through alternative splicing in high mutational load tumors

Gene silencing is elevated in high mutational load tumors likely through the coupling of intron retention with mRNA decay.

Regulation of translation, protein folding and protein degradation in high mutational load tumors

Protein folding, degradation, and synthesis are regulated in both high mutational load tumors (TCGA) and cell lines (CCLE).

Validating proteostasis expression responses in cancer cell lines and establishing a causal connection through perturbation experiments

Viability in high mutational load cell lines decreases when proteostasis machinery is disrupted.

Targeting proteostasis machinery is a key vulnerability in high mutational load cell lines.

Discussion

Acknowledgements

Conflicts of Interest

Methods

Data availability and resources

Statistical analysis

Model validation

Proteostasis gene sets

Gene set enrichment analysis

Alternative splicing analysis

Drug category annotation and enrichment analysis

Code and software availability

Supplemental Figures

No collinearity of point mutations and copy number alterations in human tumors (TCGA) and cancer cell lines (CCLE).

Genes significantly expressed from the transcriptional screen mostly fall into the upper quartile of effect sizes, which are enriched for proteostasis complexes.

Intron retention events that overlap with mutations do not account for the association of gene silencing in high mutational load tumors.

The number of under-expressed transcripts increases with the mutational load of tumors for different PSI value thresholds and alternative splicing events.

Association between expression in proteostasis complexes and mutational load is not driven by a single cancer type in TCGA.

Linear regression analysis within cancer types in TCGA captures similar expression responses to mutational load across proteostasis complexes.

Association between the expression in proteostasis complexes and mutational load is not driven by patient age.

Association between the expression in proteostasis complexes and mutational load is not driven by a single cancer type in CCLE.

Similar patterns of expression and protein abundances in response to mutational load in CCLE within genes that regulate protein folding, degradation, and synthesis.

References

Article and author information

Author information

Susanne Tilk

Judith Frydman

Christina Curtis

Dmitri Petrov

Version history

Cite all versions

Copyright

Metrics