Optimal cancer evasion in a dynamic immune microenvironment generates diverse postescape tumor antigenicity profiles
Abstract
The failure of cancer treatments, including immunotherapy, continues to be a major obstacle in preventing durable remission. This failure often results from tumor evolution, both genotypic and phenotypic, away from sensitive cell states. Here, we propose a mathematical framework for studying the dynamics of adaptive immune evasion that tracks the number of tumorassociated antigens available for immune targeting. We solve for the unique optimal cancer evasion strategy using stochastic dynamic programming and demonstrate that this policy results in increased cancer evasion rates compared to a passive, fixed strategy. Our foundational model relates the likelihood and temporal dynamics of cancer evasion to features of the immune microenvironment, where tumor immunogenicity reflects a balance between cancer adaptation and host recognition. In contrast with a passive strategy, optimally adaptive evaders navigating varying selective environments result in substantially heterogeneous postescape tumor antigenicity, giving rise to immunogenically hot and cold tumors.
Editor's evaluation
This study presents a valuable mathematical model for the adaptive dynamics of cancer evolution in response to immune recognition. The mathematical analysis is rigorous and convincing, and overall the framework presented could be used in the future as a solid base for analytically tracking tumor evasion strategies. The work will be of interest to evolutionary cancer biologists and potentially may also have implications for the design of clinical interventions.
https://doi.org/10.7554/eLife.82786.sa0Introduction
Cancer dynamics, encompassing both genotypic evolution and phenotypic progression, lies at the heart of treatment failure and disease recurrence, and therefore represents a significant and stubborn therapeutic hurdle. Prior research efforts have made substantial progress in detailing the mathematics of acquired drug resistance (Iwasa et al., 2006; Michor et al., 2004; Komarova, 2006) and the complementary roles of phenotypic and genotypic changes (Gupta et al., 2019). Recently, there has been much renewed interest in therapies that utilize the adaptive immune system to confer durable remission (CouzinFrankel, 2013; Waldman et al., 2020). These latter breakthroughs have generated considerable interest in quantifying the cancerimmune interaction (Mayer et al., 2019; Sontag, 2017; George et al., 2017). As with targeted therapeutic resistance via compensatory evolution or adaptive rewiring (Bergholz and Zhao, 2021), tumors can similarly evade the immune system via either elimination or downregulation of tumorassociated antigens (TAAs) normally detectable by the T cell repertoire (Rosenthal et al., 2019). However, several key features distinguish immunespecific evasion from classical drug resistance (Komarova, 2006). Dynamical changes in cancer genotypes and phenotypes, while problematic for conventional therapies, create additional TAAs that may subsequently be recognized by distinct T cells (Yarchoan et al., 2017). Thus, the evolving diversity of the T cell repertoire, consisting of billions of unique clones each with a distinct T cell receptor, provides adaptive immunity and immunotherapy the unique advantage of repeated tumor recognition opportunities (George and Levine, 2021; Lakatos et al., 2020; Qi et al., 2014), making longterm evasion more challenging.
Previous research efforts have investigated the diversity of evolutionary trajectories and the extent of cancerimmune coevolution occurring in early disease progression (George and Levine, 2018; George and Levine, 2020). These works were based on increasing evidence of significant and sustained tumor evolution driven by immune surveillance (Turajlic et al., 2018; JamalHanjani et al., 2017). Immunosurveillance via distinct T cell clones imposes an adaptive, stochastic recognition environment on developing cancer populations (Desponds et al., 2016) that can result either in cancer elimination, escape, or equilibrium (Schreiber et al., 2002; Dunn et al., 2004). Equilibrium results in cancer coexistence with the immune system over large time scales (Turajlic et al., 2018), thereby motivating the need for a more complete understanding of the interplay between immune recognition and cancer evolution for effective therapeutic design. In addition to parsing this complexity, the precise extent to which a cancer population may actively evade repeated immune recognition attempts is at present unknown.
Previous modeling efforts have assumed that cancer adaptation occurs passively, that is, without behavior predicated on knowledge of the current immune microenvironment (IME). However, it is well known that cancer populations commonly undergo phenotypic changes capable of altering their immunogenicity (Tripathi et al., 2016); these changes could be coupled to sensing of the IME in a manner similar to cancer mechanical, chemical, and stress sensing (Lee et al., 2019; Damaghi et al., 2013; Rosenberg, 2001). Moreover, direct experimental evidence demonstrates genetic adaptation in bacterial systems capable of sensing stress and consequently varying the percell mutation rate (Al Mamun et al., 2012; Rosenberg and Queitsch, 2014); there appear to be similar stress pathways in cancer (Bindra et al., 2007). Therefore, an alternative to passive evolution is for cancer populations to actively sense and evade recognition in the current environment en route to metastasis in a manner that maximally benefits survival, which we refer to henceforth as the ‘optimal escape hypothesis.’ Understanding the extent and associated features of optimized tumor evasion is a crucial first step to identifying the best therapeutic approach, particularly for T cell immunotherapies that may be temporally varied.
Here, we introduce a mathematical framework, which we call ‘Tumor Evasion via adaptive Antigen Loss’ (TEAL), to quantify the aggressiveness of an evolutionary strategy executed by a cancer population faced with a varying recognition environment. This framework enables a dynamical analysis of both passive and optimized evasion strategies. The TEAL model describes a discretetime stochastic process tracking the number of targets available to a recognizing adaptive immune system. We apply dynamic programming (Bellman and Dreyfus, 1959; Ross, 2014) in order to solve the corresponding time homogeneous Bellman equation detailing the tumor optimal evasion strategy for a specific example of the assumed penalty for attempting to avoid immune detection. In doing so, we obtain an exact analytical characterization of the evasion policy that maximizes longrun population survival, which we show is the unique solution. We can then quantify the enhancement in survival for optimal threats relative to their passive counterparts under a variety of temporally varying recognition environments. Surprisingly, we find that optimized strategies exhibit substantial diversity in their dynamical behavior, distinguishing them from threats with a fixed evolutionary strategy. Notably, immune recognition efficiency and the IME microenvironment are predicted to influence the likelihood for tumors to either accumulate or lose therapeutically actionable TAAs prior to their escape. The TEAL model represents a first attempt to explicitly represent – and in the future test – the optimal escape hypothesis in order to frame cancer evasion as a dynamic and informed strategy aimed at maximizing population survival.
Model development
In greatest generality, our model consists of an evading clonal population that may be targeted over time by a recognizing system. We assume henceforth that the recognitionevasion pair consists of the T cell repertoire of the adaptive immune system and a cancer cell population, recognizable by a minimal collection of s_{n} TAAs present on the surface of cancer cells in sufficient abundance for recognition to occur over some time interval $n$. Our focus is on a clonal population, recognizing that subclonal TAA distributions in this model may be studied by considering independent processes in parallel for each clone.
Experimental evidence and prior modeling suggest that tumors may be kept in an ‘equilibrium’ state of small population size prior to either escape or elimination, with repeated epochs of recognition and evasion (Dunn et al., 2004; Turajlic et al., 2018; George and Levine, 2020). We adopt a coarsegrained strategy and assume that during each epoch, the immune system has an opportunity to independently recognize each of the s_{n} TAAs with probability $q$, and also the cancer population can lose recognized TAAs, each with probability ${\pi}_{n}$, which we refer to as the antigen loss rate. The antigen loss rate is either fixed or chosen by the cancer population using information available in the current period. If the immune system cannot detect any of the available TAAs in a given period, then the cancer population escapes detection. On the other hand, if ${r}_{n}>0$ antigens are detected by the adaptive immune system in this time frame, then the cancer population is effectively targeted. This leads to cancer elimination unless the population is able to lose each of the r_{n} recognized antigens during the same period. This loss of recognition would presumably arise in a subpopulation that would then expand at the expense of the successfully targeted cells. If evasion balances recognition and all detected antigens are lost, then equilibrium (nonescape, nonelimination) ensues, and the process repeats in the next period with a new number of target antigens given by a state transition equation
where $\beta $ represents the basal rate of new antigen accumulation, and f_{n} represents the addition of new TAA targets dependent on the rate of escape ${\pi}_{n}$ in the current state. We shall refer to f_{n} as the (intertemporal) penalty term, the idea being that changes that lead to antigen loss will out of necessity give rise to the creation of new TAAs, in the form of either overexpressed/mislocalized selfpeptides or tumor neoantigens.
The model therefore defines a discrete time process that involves changes to both the tumor and the immune system. The process ends in cancer elimination if the cancer population is unable to match all of the r_{n} recognized antigens at any period. The process ends in cancer escape if at any period the number of recognized antigens is zero (${r}_{n}=0$). This framework mirrors the outcomes resulting from known tumorimmune interactions, a process that leads via immunoediting to cancer escape, elimination, or equilibrium (Schreiber et al., 2002; Dunn et al., 2002; Dunn et al., 2004; Koebel et al., 2007). Here, tumor antigenicity is represented by the total number of postescape TAAs. We do not distinguish between different types of TAA loss, which may occur through a number of mechanisms, including somatic mutation, epigenetic regulation, or phenotypic alteration.
Passive evader
In the passive case, the cancer population does not change its evasion rate so that ${\pi}_{n}=p$ is fixed and independent of any of the parameters governing the recognition landscape. For this case, we shall also use the simple assumption that the net antigen accumulation and penalty $\beta +f$ is a fixed constant.
Optimal evader
In the optimized case, ${\pi}_{n}$ is chosen in order to maximize the overall evasion probability as a function of parameters realizable to the cancer at period $n$. We assume that s_{n} the number of TAAs as well as r_{n} the size of the recognized subset is knowable by the cancer prior to strategy selection. In addition, we postulate that the intertemporal penalty scales directly with $\pi}_{n$, a reasonable assumption given, for example, the direct relationship between mutagenesis and passenger mutation accumulation (Pon and Marra, 2015; McFarland et al., 2014). While many functional forms of ${f}_{n}({\pi}_{n},{r}_{n},{s}_{n})$ would be reasonable, we assume in general that the penalty is ${\pi}_{n}$linear:
To make our system analytically solvable, we use a specific choice in which h_{m} scales monotonically as a function of both r_{n} and s_{n} and ${h}_{m}\propto {r}_{n}$ in the large r_{n} limit (see ‘Methods’). Since the number of recognizable (and thus actively targeted) TAAs reflect, all else being equal, an active IME hostile to cancer, we assume that subsequent total TAA addition, $\beta +{f}_{n}$, are dependent on the current level of immune detection, thereby taking into account the increased cost of surviving in, for example, an inflammatory IME. The temporal dynamics of the TEAL process are illustrated in Figure 1A and Figure 1—figure supplement 1.
Varying environments
Using the above framework, we subject both passive and active cancer evasion tactics to temporally varying recognition profiles. We partition preescape dynamics into four cases based on immune recognition $q$ and basal TAA arrival $\beta $, from which we characterize the distribution of escape time, cumulative mutational burden, and predicted postescape tumor immunogenicity.
Results
The following section presents the main findings of our analysis (full mathematical details are provided in the ‘Methods’ section). For s_{n} available and r_{n} recognized TAAs, we have that ${r}_{n}\sim \text{Binom}({s}_{n},q)$. Conditional on recognition (${r}_{n}>0$), the number of downregulated antigens, ${\mathrm{\ell}}_{n}$, is given by ${\ell}_{n}\sim \text{Binom}({r}_{n},{\pi}_{n})$. Recognition therefore occurs with probability $\phantom{\rule{thickmathspace}{0ex}}\mathbb{P}\left({r}_{n}>0\right)=1(1q{)}^{{s}_{n}}$. Similarly, nonelimination occurs following recognition with probability $\phantom{\rule{thickmathspace}{0ex}}\mathbb{P}\left({\ell}_{n}={r}_{n}\right)={\pi}_{n}^{{r}_{n}}$. A decision tree for the TEAL process is illustrated in Figure 1B (passive and active decision trees used in the analysis are depicted in Figure 1—figure supplements 2–4).
Passive evasion strategy
For a passive evader, the TAA loss rate is fixed so that ${\pi}_{n}=p$. It can be shown (see Methods Section. Distribution of lost antigens) that the dynamics governed by Equation 1 in the passive case can be represented by their mean trajectories while the cancer population is in equilibrium, given by
where $\eta \equiv 1q(1p)$ is the probability of equilibrium (nonescape, nonelimination) between the cancer and immune compartments for a single TAA given the existence of at least one available TAA. These dynamics may be approximated by
where ${\mathbb{E}}_{n}[\cdot ]$ is the conditional expectation given the information available at time $n$. The approximation given by Equation 4 is a lower estimate of tumor antigenicity and is accurate as long as $p$ and $q$ are not both small and in particular for choices that give rise to large tie probability (Figure 1—figure supplements 6 and 10).
Optimal evasion strategy
In contrast to the above case where ${\pi}_{n}$ was fixed at $p$, Here, the antigen loss rate is variable and selected optimally given the current state of total s_{n} and recognized r_{n} antigens. The use of dynamic programming to address the optimal longterm evasion policy relies on a defined value function (Bellman and Dreyfus, 1959). We shall focus on the case where the cancer population is assigned normalized values of 1 at any period resulting in escape and 0 otherwise. The corresponding stationary Bellman equation takes the form
where the value function ${J}_{n}=J({s}_{n},\phantom{\rule{thinmathspace}{0ex}}{r}_{n},\phantom{\rule{thinmathspace}{0ex}}{\pi}_{n})$ represents the maximal attainable value at period $n$; (Methods Section Dynamic programming solution). It can be shown that
with
satisfies Equation 5. Here, $0<{\delta}_{n}\le 1$ is a free parameter that varies inversely with the risk aversion of the evader (larger values imply a bolder strategy). One advantage of the dynamic programming approach is that it reduces an infiniteperiod optimization problem to a sequence of singleperiod optimizations. The corresponding optimal policy is given by the sequence
Plots of ${\pi}_{n}^{*}$ are given for various r_{n} in Figure 1C and Figure 1—figure supplement 11. As expected, this closedform strategy results in increased values for the optimal antigen loss rate ${\pi}_{n}^{*}$, which increase for increasing $q$ and r_{n}. We take ${\delta}_{n}=1$ in subsequent analysis (so that the optimal strategy when ${s}_{n}={r}_{n}=1$ is ${\pi}_{n}^{*}=1$).
Active evasion strategies enhance population survival rates
For a fixed TAA arrival, Equations 3 and 4 describe a meanreverting process. Consequently, the mean number of TAAs approaches a stable equilibrium
as long as the cancer neither escapes nor is eliminated. In the optimal case, a similar equilibrium value ${s}_{\mathrm{\infty}}$ may be calculated:
In this case, stability is more complex: If immune recognition is sufficiently effective, meaning $q>{q}^{\ast}=1{e}^{1}$, then Equation 10 is a stable equilibrium exhibiting mean reversion similar to that of the passive case. On the other hand, recognition impairment ($q<{q}^{\ast}$) gives rise to an instability, which results in a system harboring an initial number of targets s_{0} being driven either to escape if $s}_{0}<{s}_{\mathrm{\infty}$ or to large accumulations (and likely elimination) if $s}_{0}>{s}_{\mathrm{\infty}$ (Figure 5—figure supplement 2).
We proceed by contrasting active and passive escape rates assuming no recognition impairment, and discuss the implications of immune impairment in a later section. Simulations of passive and optimized strategies with passive evasion rates matching mean optimal evasion rates ($p=\mathbb{E}\left[{\pi}_{n}^{\ast}\right]{}_{{s}_{\mathrm{\infty}}}$) are compared in Figure 2. Despite identical mean TAA evolution (Figure 2A) and comparable intertemporal penalties, the optimized strategy results in substantially higher cancer escape probability (150%) compared to the passive case. Moreover, optimized strategies generate wider escape time distributions, thus illustrating an adaptive evader’s sustained effort to thwart elimination prior to escape (Figure 2B).
Arbitrary recognition landscape
The above describes the dynamics of passive and optimized cancer coevolution during adaptive immune recognition with constant governing parameters. We can more generally apply this approach to understand how an evasion strategy affects the likelihood and timing of cancer escape under a variety of temporally varying recognition landscapes. Such landscapes could, for example, be imposed by a clinician temporally modulating an immunotherapeutic intervention and are routinely proposed in the setting of traditional therapies, where attempted strategies have included a variety of cyclical burst approaches (Foo and Michor, 2009; Eigl et al., 2005). A similar approach could be taken with regard to timing and dosage of adoptive T cell immunotherapy. An advantage of our dynamic programming approach is the ability to study optimal evasion strategies for arbitrary recognition landscapes (Figure 3A). We simulate TEAL dynamics and find that optimized immune evaders are more successful in evading detection than their passive counterparts across various recognition landscapes (Figure 3B). Evasion, when it occurs in the optimized case, does so largely after a sustained interaction with the recognizing threat (Figure 3C). Collectively, our results detail the dynamics of sustained cancerimmune coevolution via TAA loss in threats capable of adopting adaptive evasion strategies in the presence of complex treatment modulation (George and Levine, 2020; Turajlic et al., 2018).
Optimal evaders under effective immune recognition accrue mutations at a fixed rate
One consequence of mean reversion is that the rate of mutation accumulation over time, $\lambda (n)$, is linear in $n$ (Methods Section Mean optimal transitions):
The prediction of constant accumulation is consistent with empirically observed cancer mutation behavior (Lawrence et al., 2013; Alexandrov et al., 2013). This is not what holds in the impaired case (as will be discussed later), thus suggesting that early cancer progression often proceeds in an environment with effective immune recognition. Additionally, our formula shows that larger mutation rates can be caused by large evasion penalties or by reduced immune recognition. Of course, the TEAL model does not consider any specific features that determine the values of the effective parameters. Instead, its utility is in quantifying the overall effect of reducing antigen detection resulting from, for example, transitions to an immune impaired microenvironment.
Postescape tumor antigenicity determined by a balance between recognition aggressiveness and local penalties in the immune microenvironment
The prior section related recognition and penalty to observed mutation rates. We now consider their combined effects on tumor immunogenicity following immune escape. The TEAL model represents immunogenicity by the number of available TAAs at the time of cancer detection, an important predictor of immunotherapeutic efficacy (Martin et al., 2016; Samstein et al., 2019; Goodman et al., 2017). We apply the TEAL model to simulate evading cancer populations, focusing exclusively on trajectories that result in tumor escape, to characterize the distribution of available TAAs. This is performed first for increasing immune recognition rates $q$ (Figure 4A) and then for increasing penalty term $\beta $ (Figure 4B). Our results demonstrate that larger penalties result in higher postescape TAA levels, while efficient immune recognition depletes available TAAs. The presumptive reason for this latter observation is that escape in the presence of strong immune recognition biases the tumor to have low numbers of TAAs. This prediction agrees with recent empirical observations that strong immune selective pressure in early cancer development results in tumor neoantigen depletion and is prognostic of poor clinical outcome (Rosenthal et al., 2019; Lakatos et al., 2020).
Variation in the tumor microenvironment drives the generation of immune hot vs. cold tumors under optimal evasion
In the passive evader case, antigenicity fluctuates around a stable equilibrium that varies directly with penalty and inversely with recognition. The adaptive case gives rise to more complex behavior resulting from impairments in immune recognition or changes in penalty (Figure 5—figure supplements 1 and 2). These changes are important manifestations of disease progression, which may alter the immunogenic landscape via impairments in immune recognition, such as MHC downregulation, costimulation alteration, T cell exclusion, or the establishment of a protumor IME, via. for example. M2 macrophage polarization (Liu et al., 2021; Goswami et al., 2017). Although many factors may affect recognition rates, for simplicity we shall refer to larger vs. smaller immune recognition rates $q$ as infiltrated vs. excluded.
On the other hand, the generation of new TAA targets is expected to vary substantially across tumor type, for example, due to differing somatic mutation rates. Within a given tumor subtype, variations in the hostility of the IME, resulting from a large variety of possible mechanisms (metabolic, mechanical, cytokine, environment), require cancer populations to undergo greater degrees of adaptation to survive; in our approach, this greater degree of adaptation comes with a greater penalty. Consequently, we relate large vs. small local penalty terms $\beta $ to antitumor vs. protumor IMEs. Conceptually, the baseline state (infiltrated antitumor IME) may give rise to three alternative states (excluded antitumor IME, infiltrated protumor IME, or excluded protumor IME), based on progression.
Toward this end, we simulate the TEAL model under the above conditions and record postescape TAA distributions. As already explained, our results predict that infiltrated ($q>{q}^{\ast}$) environments lead to an absorbing equilibrium state in the intervening period prior to escape, while exclusion ($q<{q}^{\ast}$) results in unstable equilibria. Interestingly, the sign of this equilibrium, and hence the longterm immunogenic trajectory, depends on the sign of $\beta $ (Equations 88 and 89). The baseline infiltrated antitumor case ($q>{q}^{\ast}$, $\beta >0$) yields a positive and stable, meanreverting TAA steady state, generating immunogenically ‘warm’ tumors. Excluded antitumor IMEs ($q<{q}^{\ast}$, $\beta >0$) exhibit low recognition and large TAAs arrival, resulting in a unstable TAA steady state that leads to increased immunogenicity over time, resulting in ‘hot’ tumors. Furthermore, the infiltrated protumor ($q>{q}^{\ast}$, $\beta <0$) case demonstrates preserved recognition with low TAAs arrival and generates an unphysiological negative stable steady state, thereby predicting that trajectories reduce immunogenicity to zero over time, yielding ‘cold’ tumors. Lastly, excluded protumor IMEs ($q<{q}^{\ast}$, $\beta <0$), having compromises in both recognition and TAA arrival rate, result in an unstable state, above which trajectories accumulate additional TAAs over time, becoming immunogenically ‘hot,’ and below which the populations are predicted to reduce the number of recognizable TAAs over time, becoming ‘cold’ (Figure 5A and B). Substantial heterogeneity in the distributions of escape time predict sustained interactions in the unimpaired case (Figure 5—figure supplement 3). Tumor exclusion leads to hot tumors so that escape, should it occur, must do so on average prior to the accumulation of many TAAs. Conversely, protumor IME with immune recognition drives TAA depletion, so escape occurs relatively early. These results are summarized in Figure 5C.
Discussion
The underlying evolutionary dynamics of adaptive populations lies at the heart of many important clinical challenges, including antibiotic resistance, acquired drug resistance, immunotherapy failure, and tumor immune escape. Quantitative analytic modeling will continue to provide improved insight into these complex issues by generating fast and affordable predictions and a convenient theoretical framework for hypothesis testing. To date, virtually all of the current models of cancer evolution and the tumorimmune interaction have assumed passive acquired evolution without allowing the tumor to sense and optimally respond to the current fitness landscape in order to maximize future survival. The ‘optimal escape hypothesis’ is, in our opinion, worth exploring in light of the myriad examples of treatment failure and adaptive resistance.
Our analysis centered on the ability of cancer populations to adaptively respond to a measured immune state, and we have primarily focused on studying subsequent mutations resulting in the disruption of existing (targeted) tumorassociated antigenic targets and on the generation of new ones. It is important to note that independent empirical observations support the ability of cancer cells to sense their IME, and perhaps even the level of CD8+ killing that occurs therein. At the signaling level, IL6 secreted by CTLs, macrophages, and dendritic cells in response to immune recognition has been shown to directly activate ataxiatelangiectasia mutated (ATM), a factor implicated in response to DNA damage, and this has been associated with increased metastasis and multidrug resistance in lung cancer (Jiang et al., 2015; Yan et al., 2014). IFNgamma released by activated CD8+ tumorinfiltrating lymphocytes activates the cellintrinsic STING pathway in response to DNA damage in cancer, implicating an altered TME from activated CD8+ T cells that is measurable by the cancer (Xiong et al., 2022). Lastly, at the level of individual TCR interactions with recognized tumor cells, granzyme B release has been directly linked to DNA damage and associated CHK2 and p53 stress responses, and studies have demonstrated hSMG1 stressactivated proteins upregulated in cancer cells following granzyme B treatment (Meslin et al., 2011). Moreover, granzyme release in the microenvironment serves a signaling molecule promoting a proinflammatory response from other immune cells (Cullen et al., 2010). The relatively acute response and short halflives of downstream effectors (e.g., minutes for p53 and hours for CHK1) provide a tunable response based on the current level of immune targeting through stressinduced mutagenesis (Bindra et al., 2007; Rosenberg, 2001; Rosenberg and Queitsch, 2014) that in our analysis directly influences tumorassociated antigen availability.
Toward this end, we propose and analyze the TEAL model for studying and comparing passive and optimal escape mechanisms in the tumorimmune interaction. We focused our dynamic programming approach on a particular set of relations to provide analytical insight into this process. We do note, however, that the Bellman function approach to dynamic programming can be numerically implemented to obtain solutions for arbitrary functional forms of the penalty function, thereby enabling analysis of more complex assumptions where analytic progress becomes intractable. As expected, threats adopting optimal evasion strategies largely outperform their passive counterparts by increasing the rate of immune escape over prolonged cycles of cancerimmune coevolution. In the setting of the tumorimmune interaction, the resulting TAAs available for targeting, a proxy for clinical postdetection immunotherapeutic efficacy, are augmented when cancer populations accrue large penalties for evasion and, perhaps surprisingly, when immune recognition is impaired.
Evasion dynamics of passive and active evaders are similar in some ways while different in others. Similarities include the meanreverting stationary dynamics of both strategies under efficient immune recognition. However, the TEAL model predicts, for adaptive threats in an excluded protumor IME, the emergence of an unstable state, resulting in either accrual or depletion of TAAs in a manner that depends on the current TAA abundance. This splitting behavior into ‘hot’ and ‘cold’ tumors offers insight into the microenvironmental features generating spatial immunogenic diversity within solid tumors and is consistent with prior observations (Huss et al., 2021; Jia et al., 2022; Meiller et al., 2021; Lakatos et al., 2020). This argues that TAAdepleted tumors share in common the tendency for their evasion strategies to incur less antigenic penalties. Our results suggest the possibility of altering the tumor IME to increase the immunogenicity of immunecold tumors by making evasion more costly in a manner reminiscent of mutational meltdown (Gabriel et al., 1993). We remark that these dynamics are worth considering in the case of adoptive T cellbased immunotherapies, marked by their potential for exerting substantial coevolutionary pressure on a developing malignancy (George and Levine, 2021). We also predict that impaired immune recognition leads to TAA accumulation, consistent with experimental observations in lung cancer wherein patients with HLA loss of heterozygosity harbored larger mutational burdens, an indirect measure of TAAs of our model (McGranahan and Swanton, 2017). Lastly, active evader variable mutation rates also distinguish this case from passive evaders with fixed mutation rates, and this feature is analogous to that observed in bacterial colonies faced with antibiotic selective pressure (Windels et al., 2019).
More generally, the TEAL framework provides a mechanistic basis for several empirical observations. First, our results would suggest that the lower observed TAA availability of hematological malignancies vs. immuneprotected solid tumors, such as melanoma (Lawrence et al., 2013), occurs as a result of greater immune accessibility and possible immunoediting of liquid cancers. Second, our model predicts enhanced immune interactions, both natural and treatmentderived, resulting from increasing the cost of immune evasion in the evading cancer population in order to enrich the TAAs following escape. This supports the utility of neoadjuvant radiation therapy (McGranahan et al., 2016) or chemotherapy (Mouw et al., 2017) in inducing immunogenicity. Orthogonal efforts to quantify cancer evolution have similarly predicted the benefit of larger evasion rates resulting in mutational meltdown (McFarland et al., 2014). Integrated together, the TEAL model can predict the balance of generated TAAs given the relative influences of recognition and evasion penalty.
Tumor antigen depletion is a concerning consequence of immunotherapy since increased recognition is desirable and required for tumor elimination. In solid tumors, one contributor to this problem is T cell exclusion (Pai et al., 2020). However, should effective treatment and robust tumor recognition lead to relapse, the resulting tumor has a greater chance of being TAAdepleted (Rosenthal et al., 2019). Other strategies that fall in this group include those that effectively reduce recognition, like the presence of Tregulatory cells. Our results suggest that this detrimental effect of targeting can be offset by increasing the ‘hostility of the IME.’ Strategies encourage making tumor adaptation more penalizing, such as fostering an antitumor environment by, for example, M1 macrophage polarization, or the inactivation of tumorassociated macrophages (Liu et al., 2021; Goswami et al., 2017).
Of course, this foundational model is not without limitation. At present, we have assumed that the recognition agent is not employing an optimized strategy informed by optimal cancer evasion. Instead, we have detailed our results for arbitrarily imputed recognition landscapes, which is useful for predicting the response of an aggressive evader like cancer to particular immunotherapeutic interventions, such as hematopoietic stem cell transplant and adoptive T cell therapy, where the clinician has temporal control over treatment. Identification of such optimal treatment strategies upon quantification of disease evasion aggressiveness is of paramount importance. In this foundational model, we demonstrated the dynamics of immune recognition of an adaptive population of cancer cells expressing a purely clonal pattern of antigens. Our model implicitly equates antigen loss and the progression of a subpopulation currently adapted to evade immune targeting – either by direct pruning of the fittest subclone or by stochastic emergence and subsequent growth of a new one lacking the targeted antigens – as equivalent. Here, we tracked the fittest clone represented by a core set clonal antigens. We remark that heterogeneous populations each having a distinct subclonal signature can also be tracked, but the corresponding antigendriven selection and fitness cost to each clone would be coupled through shared antigens (see ‘Methods’). Finally, we note that this extended approach implicitly assumes that antigen detection rates over a given period are subclone sizeindependent, given that antigens are tracked over a period where each of the clones with comparable fitness would be detectable by the immune system during their growth trajectory en route to attempted escape.
Lastly, cancers characterized by coevolutionary dynamics resulting in large variability in population size prior to escape or elimination would require in general that recognition and evasion parameters depend on the current period. While possible to incorporate, we have for foundational understanding assumed these to be constant. In this discretetime evolutionary model, the intertemporal period considered represents the time period between the earliest moment that the adaptive immune system may identify a cancer clone and the latest point after which such a recognition event would no longer be able to prevent cancer escape (George and Levine, 2020). This effectively gives $q$ a probabilistic representation for the total rate of opportunity to recognize a given TAA during cancer progression. Implementing this model in cancer subtypespecific contexts thus requires a consideration of the percell division rates, for example.
We detailed strategies that affect the number of TAAs present following escape. In addition to quantity, variations in individual TAA antigenicity could affect overall immunogenicity, but we do not as yet take this into account. In future work, individual antigenicities could be built in by allowing individual TAA contributions to s_{n} and $q$ to depend on the particular TAA. Many additional features contribute to the immune landscape. Here, we focused on TAA availability and effects of general immune recognition rates and IME hostility on TAA accrual. Future efforts may incorporate additional cancerspecific features, including antigen presentation, immunomodulatory gene expression, and measured immune signatures present in the IME.
These optimized dynamics are proposed in the absence of the precise mechanistic details of cancer decisionmaking. Further studies linking changes in the evasion rates to cell signaling are necessary next steps at elucidating a possible mechanism of optimal evasion. Our framework serves as a tool for evaluating the extent of evasion aggressiveness in a variety of observed disease contexts, including cancer. Differentiating dynamics of passive and adaptive evasion mechanisms is a first step to understanding this difference, its importance underscored by the large implications such an understanding would have on our approach to treatment.
The TEAL model represents a framework broadly applicable for studying population behavior consistent with optimized collective decisionmaking, and subsequent experimental validation or refutation is of highest priority. Future direction aims to apply this framework for personalizing optimal interventions that maximize disease elimination probabilities. Consequently, stochastic analysis and optimal control theory are indispensable tools for better understanding the complex cancerimmune interaction. Defeating an evolving cancer population has provided a persistent challenge to researchers and clinicians, with the majority of progress heralded by fundamental discoveries on cancer behavior, and additional insights require a more detailed understanding of cancer evasion. The possibility that cancer populationlevel strategies are somewhat informed to the present recognition threat would have a radical effect on our own optimal treatment approach.
Methods
Passive evader in an adaptive environment
Let $\mathcal{S}}_{n$ denote the set of tumor antigens recognizable by the immune system and present at period $n$ on a population of cancer cells, and let ${s}_{n}={\mathcal{S}}_{n}$ count their number ($\mathcal{A}$ denotes the cardinality of set $\mathcal{A}$). From one period to the next, each of the s_{n} detectable antigens may be independently and identically detected by the immune system with probability $q$ per antigen. We let $\mathcal{R}}_{n}\subseteq {\mathcal{S}}_{n$ denote the collection of antigens that are recognized by the immune system at time $n$. As the immune system targets and begins to eliminate cells via the $\mathcal{R}}_{n$ antigens, the cancer population has an opportunity to lose or downregulate each of the ${r}_{n}={\mathcal{R}}_{n}$ recognized antigens with a similar independent and identical manner. The rate of antigen loss ${\pi}_{n}$ may in general vary as a function of time and environmental features (considered in Section Active evader in an adaptive environment). In this section, we assume it is passively fixed and denote this rate as p. We denote the collection of antigens that are lost by the cancer population at time $n$ by $\mathcal{L}}_{n}\subseteq {\mathcal{S}}_{n$. We track the number of recognized and lost antigens at time $n$ by r_{n} and ${\ell}_{n}={\mathcal{L}}_{n}$, respectively, so that $\ell}_{n}\le {r}_{n}\le {s}_{n$.
The system evolves as follows (Figure 1—figure supplements 1 and 2): If $\mathcal{R}}_{n}=\mathrm{\varnothing$, then the immune system is unable to recognize any tumor antigen at time $n$ and so the process ends in cancer escape. Since in this case the immune system loses, we denote this event by ${L}_{n}$. If $\mathcal{R}}_{n}\ne \mathrm{\varnothing$, then the immune system recognizes the threat by at least one TAA and one of two outcomes results: The first possibility is that the cancer population successfully downregulates or loses all of the targeted antigens, expressed as $\mathcal{L}}_{n}={\mathcal{R}}_{n$, and survives to the next time step. We call this a tie and denote the event by ${E}_{n}$. Alternatively, the cancer population is unable to lose every recognized antigen and subsequently becomes eliminated. This means the immune system has won so we denote this event by ${W}_{n}$. Although the recognition and evasion probabilities may in general be clonally and temporally dependent, we assume fixed probabilities for the recognition, $q$, and evasion, $p$, of individual antigens. In the event of a tie, ${s}_{n}{r}_{n}$ antigens remain, with the addition of a basal antigen arrival rate $\beta $ and a possibly noisy penalty term f_{n} to reflect the production of new antigens as the population evolves. For simplicity, we assume the $\beta $ to be constant and the f_{n} a sequence of independent, identically distributed (IID) random variables with mean $f$. While it is in general possible that the distributions of r_{n} and ${\mathrm{\ell}}_{n}$ be both state and timedependent, we focus on the foundational example above.
This process is identical to the following game between two players, hereafter referred to as the ‘Recognizer’ (immune system) and the ‘Evader’ (threat): the Recognizer starts off with a collection, $\mathcal{S}}_{0$, of s_{0} coins and begins her turn by flipping each coin with IID success probability $q$. If she has no success (${\mathcal{R}}_{0}=0$), she loses (denoted by event L_{0}) and the game ends. If ${r}_{0}>0$ of her coins land on heads, then the next turn goes to the Evader, who proceeds to flip his r_{0} coins with IID success probability $p$ in an attempt to match the Recognizer’s successful coin flips. The Evader must succeed in all coin flips ($\mathcal{L}}_{0}={\mathcal{R}}_{0$) for the turn to end in a tie (equilibrium between Evader and Recognizer), given by event E_{0}. Otherwise, he loses and the game ends with a Recognizer win, (event W_{0}). If a tie occurs then both players restart the game, but only after the removal from $\mathcal{S}}_{0$ of the r_{0} coins that landed on heads for both players as well as the addition of a random number f_{0} of new coins. The Evader wins by default if a new turn begins and there are no longer any remaining coins to flip.
Probability of equilibrium
It is immediately apparent that this game is unfair to the Evader if s_{0} is much larger than 1, unless the recognition probability $q$ is low and the evasion probability $p$ is high. We motivate the following analysis with this in mind and proceed to characterize the dynamics of this stochastic process. Clearly, the number of recognized and lost antigens during each period is binomially distributed, their respective distributions given by
The event that the immune and cancer systems are in equilibrium (nonescape and nonextinction) may be written as
One might expect that the number of antigens lost at time $n$ is affected by knowledge of whether or not the game continues to be played. The distribution of ${\mathrm{\ell}}_{n}$ conditioned on equilibrium may be characterized by conditioning on the number of recognized antigens at time $n$. To this end, let ${F}_{n,r}=[{r}_{n}=r]$ denote the event that $r$ antigens are recognized at period $n$, with
We remark that events ${\{{F}_{n,r}\}}_{r}$ are disjoint and exhaustive; in other words, for sample space $\mathrm{\Omega}$,
Additionally, we note that equilibrium cannot occur if no antigens are recognized (i.e., ${F}_{n,0}=[{\mathcal{R}}_{n}=\mathrm{\varnothing}]$). Lastly,
since if $r$ antigens are recognized then $\mathcal{L}}_{n}={\mathcal{R}}_{n$ occurs if and only if each of the ${l}_{n}={r}_{n}$ recognition positions are exactly matched with r_{n} evasions. We will make use of the following variables to simplify subsequent results:
Here, $\eta $ may be interpreted as the probability of the complement of the following event: ‘recognition occurs without matched evasion for a single antigen.’ In other words, $\eta $ is the probability that equilibrium exists at one antigen position provided that there is at least one available antigen for immune targeting. This event occurs in one of two disjoint ways for a single antigen: either there is no recognition, and so equilibrium occurs regardless of evasion, or there is recognition that must also be matched by evasion. The joint distribution of recognized and lost antigens is given by the probability mass function
The probability that equilibrium occurs and the process continues at period $n$ is given by
which is equal to the probability of equilibrium occurring at every position minus the probability that all of the s_{n} antigens are not recognized, since at least one recognized antigen is required for equilibrium to occur.
Breakeven probability
The process is usually more favorable for the Recognizer. The Recognizer loses at period $n$ if there are zero recognition events, and this occurs with probability
The Recognizer wins at period $n$ if she does not lose or tie, which occurs with probability
If $q$ and s_{n} are given, then the evasion probability $p$ required for equal probabilities of Recognizer failure and success, or the breakeven probability, is given by
and exists whenever ${p}_{\text{even}}>0$. We plot $p}_{\text{even}$ as a function of recognition probability $q$ for various numbers of TAAs, $s$ (Figure 1—figure supplement 5A). The ‘fairgame’ line indicates where the breakeven evasion probability is always equal to the recognition probability. Regions where the breakeven probability localizes above the fairgame line favor the Recognizer since there the evasion rates $p$ must be higher than recognition rates $q$ for the game to be fair. Alternatively, areas below the breakeven curve favor the Evader. It is clear from Figure 1—figure supplement 5B that the process favors recognition for a majority of parameter choices $(p,q)$ in all cases except for when $s=1$. Thus, the process is largely unfair and mostly favors the Recognizer over the Evader when $p=q$ so long as $s$ is not small. In order for the Evader to have a reasonable chance of success, either the evasion probability must be very large or the number of TAAs must remain small.
Tracking distinct clones
The above describes a clonal population harboring a core minimal set of TAAs for which recognition and downregulation ultimately determine cancer escape, elimination, or equilibrium. Our model can however be adapted to study the more general scenario involving a clonal hierarchy of heterogeneous cancer cells. We illustrate this by considering a population of cells with a set $C$ of $c=C$ core clonal TAAs, together with distinct groups of cells with subclonal collections of TAAs S_{1} and S_{2} (having size ${s}_{1}={S}_{1}$ and ${s}_{2}={S}_{2}$, respectively). The relevant populations therefore have antigen sets given by ${P}_{1}=C\cup {S}_{1}$ and ${P}_{2}=C\cup {S}_{2}$. The basic event considered in the foundational model, $[{r}_{n}>0]$, must now be replaced by the event that recognition occurs in both P_{1} and P_{2}; in the absence of recognition of both subclones, the cancer escapes. Recognition happens either if there is a recognition event $r$ in $C$ or if there are simultaneous recognition events r_{1} in S_{1} and r_{2} in S_{2}. Assuming that TAA recognition occurs independently as before with probability $q$, the total probability of relevant recognition, originally $(1{\gamma}^{{s}_{n}})$, is now given by $(1{\gamma}^{c})+{\gamma}^{c}(1{\gamma}^{{r}_{1}})(1{\gamma}^{{r}_{2}})$. The first term characterizes the coupling of the fate of both subclones should a common TAA be recognized, while the latter term represents the parallel recognition process required to control each subclone separately via subclonal TAA recognition. Lastly, assuming that recognition proceeds either by a shared TAA in $C$ or instead by subclonal TAAs in both S_{1} and S_{2}, then the probability of elimination and progression proceed identically as before. In the remainder of the discussion, we will, for baseline understanding, only track a core set of clonal antigens on the fittest clone.
Distribution of lost antigens
The process transitions at period $n$ if and only if equilibrium occurs, which means that the number of lost antigens match those recognized and are strictly positive. In other words,
The survival probability as a function of $q$ and $p$ are plotted for various choices of $s$ in Figure 1—figure supplement 6. From this, we find that equilibrium occurs with high probability for large evasion rates, $p$, as well as for recognition rates $q$ that vary inversely with the number of recognizable antigens. This coincides with conditions that do not disadvantage the Evader so that the equilibrium probability is maintained. We remark that recognition and evasion rates in general vary with the IME. We shall subsequently restrict our attention to large recognition probabilities ($p>1/2$).
Exact dynamics
Let ${I}_{F}$ denote the usual indicator random variable on event $F$:
If r_{n} is unknown, then the distribution of ${\mathrm{\ell}}_{n}$ follows that of r_{n} on a strictly positive outcome normalized to the probability of surviving:
In this case, the mean number of lost antigens conditioned on a tie becomes
Of course, for any realized number of recognized antigens r_{n} at period $n$ (event ${F}_{n,r}=[{r}_{n}=r]$), the number of lost antigens conditional on equilibrium ${\mathrm{\ell}}_{n}$ is completely determined since
so that the conditional mean number of lost antigens must match exactly those recognized:
Mean transition behavior
The state transition equation for this process is given by Equation 1:
where $\beta +{f}_{n}$ represents the arrival of new antigens through a basal production rate $\beta $ plus additional antigens ${\{{f}_{n}\}}_{n}$ that possibly depend on the evasion strategy employed. In our model, we will assume that the ${\{{f}_{n}\}}_{n}$ are IID random penalties with mean $\mathbb{E}\left[{f}_{n}\right]=f$ and finite variance (e.g., Poissondistributed). Given this, we will now characterize the mean transition behavior conditioned on equilibrium and the information available at the present moment. We write ${\mathbb{E}}_{n}[\cdot ]$ to denote the conditional expectation with respect to date$n$ information.
Exact dynamics
The mean number of detectable antigens evolves according to the difference equation (Equation 3):
which gives Equation 3 and follows since s_{n} is measurable at period $n$ and independent from $E}_{n$, while f_{n} is independent from period $n$ and ${E}_{n}$. This process is mean stationary at ${s}_{n}=\mu $ whenever
giving
Plots of fixed points of Equation 3 are illustrated in Figure 1—figure supplement 7 for $p>1/2$ and $q$ away from zero for small total mean antigen accumulation rates $\beta +f$. As expected, increases in $(\beta +f)$ result in higher equilibria. In the large $\mathbb{P}\left({E}_{n}\right)$ region of interest, increased $q$ results in a lower number of detectable antigens at equilibrium since more are recognized during each period.
Approximate dynamics
If r_{n} is explicitly given, then the mean transition equation simplifies to
since s_{n} is measurable at period $n$, while f_{n} is independent from period $n$ and ${E}_{n}\cap {F}_{n,r}$. We can use this to approximate the exact recognition dynamics described above by assuming ${r}_{n}={\mathbb{E}}_{n}\left[{r}_{n}\right]=q{s}_{n}$. In this case, we have Equation 4:
The equilibrium may be given explicitly as
We distinguish the approximate equilibrium $\stackrel{~}{\mu}$ from that of exact case μ, the latter incorporating a correction term arising from the fact that knowledge of equilibrium occurring requires a larger average value of r_{n} above $q{s}_{n}$ since equilibrium occurs only when ${r}_{n}>0$. We remark that the steady states given by Equations 30 and 32 are close to one another for small penalty (Figure 1—figure supplement 8) and parameter regions that overlap with those having large equilibrium probabilities ($p\sim 1$, $q>0.5$; Figure 1—figure supplement 6), which intuitively suggests that a process driven by its mean overlaps well with one conditional on equilibrium provided the escape and elimination probabilities are small. We obtain good agreement between averages of largescale simulations of the process, together with the predicted exact and approximate steady states for $p,q>0.5$ and small penalty (Figure 1—figure supplement 9). Of course, the mean dynamics are also approximate since $q{s}_{n}$ is in general nonintegervalued. With this in mind, we focus on the dynamics given by Equation 31.
Here, r_{n} is Binomially distributed conditional on the number of current antigens, so that
We define the following zeromean noise variable
and rewrite Equation 1 as
This is none other than a firstorder autoregressive, or AR(1), process with innovation terms ${\epsilon}_{n}$ comprised of endogenous noise due to the variance in the number of recognized antigens and exogenous noise generated by fluctuations in the random penalty term.
The process is stable for all but trivial choices of probability $\gamma $. The mean behavior evolves according to
which ultimately gives Equation 9:
thus showing agreement in mean with the fixed point given by Equation 32. Of course, ${s}_{n}=\stackrel{~}{\mu}=(\beta +f)/q$ satisfies the martingale property:
and the process tends toward steady state with expected intertemporal difference
The variance at stationarity, $V\text{ar}\left({s}_{n}\right)$, can be calculated by solving for the fixed point of
giving
Recognizer success probability
For the event ${W}_{n}$ (resp. ${L}_{n}$) that the Recognizer wins (resp. loses) at period $n$, and for the event ${E}_{n}$ of equilibrium at period $n$, we have
These relationships, along with the implicit evolution given by Equation 32, are used to approximate ultimate Recognizer success probabilities for all possible $p$ and $q$ against several choices of initial antigen number s_{0} and mean antigen arrival rate $\beta +f$, and are compared with simulations of using actual transitions via Equation 29 (Figure 1—figure supplement 10). We find good agreement between these methods in characterizing the final outcome over a variety of parameter choices, where accuracy is highest in the relevant parameter region of interest. In particular, the left column of Figure 1—figure supplement 10 details the likelihood that a (static) threat is controlled in the special case where no penalty is assumed.
Mutation accumulation rate and tumor antigen availability
The above analysis was motivated by a desire to explain both genetic and nongenetic possibilities leading to recognition evasion. We can consider applying this model to strictly describe genetic evasion in the form of somatic mutations leading either to the generation of (recognizable) tumorassociated antigens or to escape via the removal of these antigens. Using the above framework, mutations, denoted by $\lambda $, accumulate across each period in proportion to the sum of antigens downregulated to enhance escape and antigens gained via basal arrival and penalty. Thus their rate of accumulation may be expressed by
Together with the fact that ${\mathrm{\ell}}_{n}={r}_{n}$ during progression, we have for the mean rate of mutant accumulation
ultimately giving
which predicts that the rate of mutational acquisition is linear in time, consistent with empirical observation (Alexandrov et al., 2013; Lawrence et al., 2013). Heuristically, tumors that survive while accumulating an average of $\beta +f$ targetable alterations must balance those gains by $\beta +f$ additional evasion events. This theory predicts, perhaps surprisingly, that the mutation rate is a direct reflection of the penalty paid for cancer progression as a function of the basal antigen arrival rate and contributions from the local environment. Tumors having a more difficult time surviving in a hostile or restrictive environment would be predicted to have higher rates of mutation. In this context, high mutational signatures are predicted to be correlated with tumors that are more susceptible to recognition. For a passive Evader, our theory predicts that the observed mutation rate depends only on basal arrival and mean penalty term for cancer progression, unaffected by recognition rate. On the other hand, the stationary number of available antigens, approximated by $\stackrel{~}{\mu}=(\beta +f)/q$, varies directly with evasion penalty and inversely with antigen recognition rate. Moreover, mutation or adaptation accumulation is expected to converge to a stable steady state for all allowable recognition, evasion, and penalty rates.
Active evader in an adaptive environment
In the previous section, we considered the predicted dynamical behavior when the Evader is assumed to adopt a fixed strategy. In that case, if number of detectable antigens is moderately large (${s}_{0}\sim 10$), then the game is biased against the Evader for most combinations of evasion and recognition success probabilities (Section Breakeven probability). Additionally, mean transitions in the number of recognizable antigens obey an AR(1) process tending toward the quotient of the mean penalty and recognition rate (Section Mean transition behavior). Moreover, this behavior predicts that the observed mutation accumulation rate is linear in time and proportional to the sum of basal antigen creation rate and mean penalty term (Section Mutation accumulation rate and tumor antigen availability). Here, we allow for the Evader to optimally select his evasion rate ${\pi}_{n}$ at each period (Figure 1—figure supplement 3). Larger success rates come at the cost of adding back more recognition opportunities in the subsequent time step, so that the Evader employs a strategy to maximize his survival or likelihood of escape. This framework is motivated by the observation that cancer threats are known to accumulate perhaps mildly deleterious mutations that occur passively during evolution to obtain rare ‘driver’ mutations (McFarland et al., 2014). The novelty here is that we propose a unifying theoretical framework to investigate the resulting strategy employed by a cancer population if the choice of evasion is planned based on knowledge of the current antigen landscape and hostility, or number of recognized targets.
In contrast with the prior section, which considered temporal evolution as a function of fixed evasion rate $p$ and random penalty f_{n}, here, the evasion rate ${\pi}_{n}$ may depend on time, and for simplicity we consider deterministic penalties. In order to properly frame this problem in a manner suitable to handle via dynamic programming, we define the necessary parameters, expectation, and value functions below. We assume that the process evolves according to state transition equation,
and that conditional expectations are taken with respect to ${\mathcal{F}}_{n}$, the natural filtration (Karatzas and Shreve, 1998) with respect to the underlying process.
If at time $n$ knowledge of total s_{n} and recognized r_{n} targets is known, then the Evader’s objective is to select a policy $\pi \equiv \{{\pi}_{n},{\pi}_{n+1},\mathrm{\dots}\}$ that maximizes the sum of present and future rewards, $R({s}_{n},{r}_{n},{\pi}_{n})$, which in general depend on the current state, s_{n}, as well as the Recognizer, r_{n}, and Evader, ${\pi}_{n}$, actions. The value function is defined to be the maximal attainable sum of expected future rewards, given by
Problems that may be framed in this context have been wellstudied and utilize a rich theory of stochastic dynamic programming, originally proposed by Bellman, 1954; Bellman and Dreyfus, 1959. Bellman’s Principle of Optimality and Bellman equation for a stationary solution (independent of starting time) are given via backward induction by
Equation 49 states that the maximal attainable value at period $n$ is given by the sum of the maximal attainable value at the next time step, $J({s}_{n+1})$, and the $n$period reward of strategy ${\pi}_{n}$ obeying Equation 48. For the problem at hand, we assume that the Evader receives a normalized reward of either ${R}_{n}=1$ if it escapes at any time period (there is no temporal discount for escape at later periods), or ${R}_{n}=0$ if it is eliminated. In this case, we may draw a decision tree for the $n$period problem in terms of the value function $J$, current antigen number s_{n}, Recognizer antigen recognition miss probability $\gamma =1q$, number of recognized antigens r_{n}, and Evader strategy, ${\pi}_{n}$ (Figure 1—figure supplement 4). Here, ${\pi}_{n}$ represents the $n$period probability of antigen loss by the Evader.
Using the dynamic programming principle, the Bellman equation under uncertainty takes the form given by Equation 5:
Under a particular choice of assumed penalty and transition equation, we can calculate an exact, closedform solution to the dynamic program in Equation 5. This solution generates an optimal policy, given by ${\pi}^{*}=\{{\pi}_{1}^{*},{\pi}_{2}^{*},\mathrm{\dots},{\pi}_{n}^{*},\mathrm{\dots}\}$, a sequence of optimal decisions, in addition to the maximal value at each time assuming the optimal policy, given by ${J({s}_{n})}_{{\pi}_{n}^{*}}$.
Constitutive relations for intertemporal penalty
We make the following assumptions in our setting to make this problem more tractable. The first assumption is that the penalty function is timehomogeneous and deterministic:
Conditional on progressing to the next period, the transition equation takes the following form:
In cases where we wish to emphasize the dependence of the transition equation on $\pi}_{n$, we will denote ${s}_{n+1}.$ by $g({\pi}_{n})$ so that
The second assumption is that this penalty is ${\pi}_{n}$linear, given by Equation 2:
for positive h_{m}.
In order to analytically characterize the solution, we assume that r_{n} is known prior to choosing ${\pi}_{n}$ (${r}_{n}\in {\mathcal{F}}_{n}$). In the analogous coin game, the Evader is allowed to see the success of his opponent, the Recognizer, prior to choosing a strategy. In this case, the dynamic program has a solution if we also assume that the linear penalty term can be represented by
with $c\equiv \mathrm{ln}\gamma >0$ and $0<{\delta}_{n}\le 1$. This assumption implies that the marginal penalty of increasing ${\pi}_{n}$ is asymptotically proportional to the number of recognized antigens. This is reasonable to assume, for example, in cases where significant immune system recognition and tumor killing create an environment that makes subsequent adaptation more costly, resulting possibly from increased inflammation. The constant ${\delta}_{n}$, a free variable, is inversely related to aversion of the Evader strategy so that larger values imply a bolder evasion strategy for all else held constant. This parameter may in general vary temporally and as a function of disease subtype.
Dynamic programming solution
In the above case, we may find an exact solution to the optimal programming problem. Since ${r}_{n}\in {\mathcal{F}}_{n}$ (the filtration generated by the evolution of s_{n} and the Recognizer action at time $n$), the stationary Bellman equation takes the form
For simplicity in the subsequent definition, we drop the period index, rewriting Equation 54 as
Using $c\equiv \mathrm{ln}\gamma $, the firstorder condition (FOC) is
In expanded form, the FOC becomes
From Equation 2, we have that
We postulate that the solution takes the form of Equation 6:
so that
This, together with Equation 59, reduces Equation 58 to
Thus, the optimal Evader success probability, ${\pi}^{*}$, is given by
Under Evader optimal strategy, the transition equation in Equation 51 becomes
We next confirm that this satisfies the Bellman equation (Equation 55). The above solution implies
which ultimately yields
Equating coefficients and applying this logic to each policy gives Equation 7:
The optimal policy (Figure 1—figure supplement 11) is given by (Equation 8) the sequence
We henceforth refer to ${\delta}_{n}$ as the aversion parameter. Large values of ${\delta}_{n}$ imply low aversion. It can be interpreted as the selected strategy in the simplest case where ${\delta}_{n}=\delta >0$ and ${s}_{n}={r}_{n}=1$ since
Rearranging Equation 8 gives
Solution uniqueness
Proposition
The above value function is unique.
Proof
We consider value functions $V(s)$ in the space of functions that are continuous in $\pi $ and bounded in $s$. We take $V{}_{\mathrm{\infty}}\equiv \underset{s}{sup}V(s)$. From the previous section, we have identified such a function $J$ so that
Assume that $V(s)$ is another solution. For fixed s_{n}, let ${\pi}^{*}$ be such that
We can rewrite the following term:
where $\stackrel{~}{\gamma},\phantom{\rule{thinmathspace}{0ex}}{\gamma}^{{k}_{s}}<1$. Then
Note that
is increasing in $\pi $ (since $\stackrel{~}{\gamma}<1$) so that $C(\pi )\le 1{\gamma}^{k}\stackrel{~}{\gamma}\equiv K<1$. Thus,
By identical argument above, this time reversing the roles of $V$ and $J$ gives
and so
Therefore,
Thus,
□
Mean optimal transitions
From Equation 63, the mean optimal transitions are
The mean increment, $\mathrm{\Delta}{s}_{n}$, assuming the process is driven by ${r}_{n}\sim \text{Binomial}({s}_{n},q)$, becomes
We next consider two cases. In the first case, the basal antigen creation rate $\beta $ scales linearly with the number of currently recognized antigens, and in the second case we instead assume that it is fixed.
r_{n}linear basal antigen creation rate
This case considers $\beta =\alpha {r}_{n}$. Here, larger recognition in the current period results in larger exogenous penalty, and hence easier targeting, in the next period. Consequently, the number of detectable antigens in the future is directly influenced by both the tumor evasion strategy ${\pi}^{*}$ and the extent of that recognition resulting from immune targeting r_{n}. In this case (Figure 5—figure supplement 1), we have that
so that the process satisfies the Martingale condition
for critical alpha
Mutation accumulation rate
In the trivial case where, $\alpha ={\alpha}_{c}$, $s$ is constant and so mutation accumulation is predicted to be linear. Contributions by optimal evasion to the mutation rate are expected to exponentially decrease (resp. increase) over time if $\alpha <{\alpha}_{c}$ (resp. $\alpha >{\alpha}_{c}$).
In this case, dynamics and resultant mutation accumulation is determined by $\alpha $ relative to ${\alpha}_{c}$, and only those $\alpha $ close to the threshold generate behavior resembling linear mutation accumulation. Given this, the added penalty $\beta ({r}_{n})=\alpha {r}_{n}$ due to the number of recognized antigens appears to be a less reasonable assumption based on empirical mutation rates (Lawrence et al., 2013; Alexandrov et al., 2013). We next consider the case for which the basal antigen creation rate is independent of $r$.
r_{n}independent basal antigen creation rate
In this case, $\mathrm{\Delta}{s}_{n}$ from Equation 83 becomes
The recognition dynamics of this case are more complex and partition into three regimes based on recognition relative to a critical threshold ${q}^{*}=11/e$ (for which $c=1$ and Equation 87 $\mathrm{\Delta}{s}_{n}=\beta $): effective immune recognition, critical recognition, and impaired recognition.
Effective immune recognition
Here, $q>{q}^{\ast}$, giving $c>1$. In this case, the Recognizer exerts a large recognition rate on the evading tumor. If $\beta \le 0$, then the equilibrium, $s}^{\ast$ for which $\mathrm{\Delta}{s}_{n}=0$ is negative, and the s_{n} is driven to 0. If $\beta $ is a positive, then there exists a stable, positive antigen state:
Trajectories assuming a variety of initial conditions are given with ${s}^{*}=10$ in Figure 5—figure supplement 2A.
Impaired immune recognition
In contrast with effective recognition $q<{q}^{\ast}$, $c<1$, and in this case, the equilibrium points are unstable. Moreover, If $\beta \ge 0$, then by a similar reasoning as above, ${s}^{*}\le 0$ so that s_{n} is driven to become very large. Alternatively, if $\beta <0$ then the equilibrium state is
so that collectively the equilibrium value is given by Equation 10.
Critical immune recognition
At criticality $q={q}^{*}$, $c=1$, and Equation 83 simplifies to
In this special case, all randomness imparted to the process by r_{n} is eliminated by a critical offset in the number of recognized antigens and the net addition of new antigens so that the longterm behavior of the process is completely determined by $\beta $. Predictably, $\beta >0$ (resp. $\beta <0$) results in net expansion (resp. depletion) of antigens over time, and $\beta =0$ is stationary. The sign of $\beta $ may change as a function of the tumor IME. For example, immune exclusion and the resulting attenuated inflammation may both decrease $q$ and $\beta $ as well as genetic aberrations involving mismatch repair (MMR) deficiency and microsatellite instability. Other alterations, such as modulated MHC expression, or MHC loss of heterozygosity (LOH), may affect $q$ in isolation Rosenthal et al., 2019.
Mutation accumulation rate
Critical and impaired immune recognition dynamics follow a similar behavior to that detailed in Section Mean optimal transitions. The effective recognition case bears a resemblance to the approximate dynamics of the informed Evader in Section Mean transition behavior. Here, by a similar argument in Section Mutation accumulation rate and tumor antigen availability once equilibrium is achieved, we have that
Studying the process at ${s}_{0}={s}^{*}$ given by Equation 88, and ${f}_{n}^{*}={r}_{n}/c$, we have that
This implies Equation 11:
Therefore, linear mutation accumulation as a function of time ensues for an effective Recognizer as in the passive Evader case (Equation 46), this time as a function not only of the basal antigen creation rate $\beta >0$ but also of $q$ through $c$. We recall that under effective recognition, ${q}^{\ast}<q<1$ (equivalently $1<c<\mathrm{\infty}$), which ultimately gives via Equation 11
Dynamics summary
The assumption that the basal antigen production depends on recognition $\beta =\alpha {r}_{n}$ results in exponential growth or decay in the number of recognizable antigens (and therefore mutation rate), and it was only for a very narrow parameter value $\alpha \sim {\alpha}_{c}$ for which linear mutation accumulation could occur. It is for this reason that the r_{n}linear constitutive assumption is less realistic.
For basal antigen rates $\beta $ that are r_{n}independent, mutations are predicted to accumulate linearly under effective immune recognition, in a similar manner to that observed in the passive Evader case. In contrast with that case, however, an active Evader executes an optimal strategy to maximize the overall escape probability. This predicts that one effect of a dynamic evasion that optimally maximizes escape probability is a concomitant increase in the mutation accumulation rate relative to the passive case via a correction term $c/(c1)$. This enhancement becomes indistinguishable when recognition is very aggressive ($q\to 1$) and becomes large when $q$ approaches the critical detection rate.
Interestingly, the active evasion strategy predicts that mutation accumulation rates vary as a function of recognition pressure, in contrast with the passive evasion model. Additionally, disease progression may affect immune recognition (changes in $q$) and tumor evasion penalty (changes in $\beta $). While the number of recognizable TAAs for the passive case continues evolve according to the meanreverting process, there is a dramatic discontinuity in active systems whereby recognition rates below a critical threshold may result in unstable behavior prior to escape (Figure 5—figure supplement 2).
Optimal evasion strategy
From Equations 6–8, we have
and
Thus,
We note that for ${s}_{n}={s}_{n1}+(1/c1){r}_{n1}+\beta $, therefore
where
and
By iteratively applying Equation 98, we ultimately obtain the value function in terms of the history of the environmental landscape, ${\{{r}_{n}\}}_{n}$
We remark that this simplifies for constant ${\delta}_{n}=\delta $, which we will typically take as 1.
Critical recognition
At the critical value of recognition ${q}^{*}=11/e$ ($c=1$), the dynamics become deterministic. Here, the value of the present state depends only on the initial number of detectable antigens and number of periods that have elapsed and is independent of the history of recognized antigens ${\{{r}_{n}\}}_{n}$.
At criticality, the value of the present state depends only on the initial number of detectable antigens and number of periods that have elapsed, and not on the number of recognized antigens.
Noncritical recognition
We recall that the value function carries meaning as the maximal attainable expected future value. Under effective recognition ($c=1\Rightarrow {\gamma}^{\mathbb{C}r}$ is increasing in $r$), so that the value function (Equation 101) has an exponent that increases.
We are motivated to consider either mild or aggressive recognition of Section 5.2.4. We will assume that there is minimal aversion so that ${\delta}_{n}=1$.
Predicted dynamical behavior
From Section Mean optimal transitions, the dynamical behavior of the number of recognizable TAAs, or immunogenicity, of an active Evader is determined by $\beta $ and $q$. Disease progression may ultimately affect immune recognition (reducing $q$) and/or tumor basal tumor antigen creation (reducing $\beta $). $\beta $ is expected to vary widely across tumor types. Within a given tumor subtype, the extent of environmental hostility is expected to require additional tumor adaptation that may manifest as additional TAA targets. Therefore, larger (resp. smaller) evasion penalties $\beta $ correspond with antitumor (resp. protumor) IME. Similarly, larger (resp. smaller) $q$ corresponds to infiltrated (resp. excluded) environments, and from this we model four possible states: antitumorinfiltrated, antitumorexcluded, protumorinfiltrated, and protumorexcluded. The model predicts that infiltrated ($q>{q}^{\ast}$) environments lead to an absorbing equilibrium state in the intervening period prior to escape, while exclusion ($q<{q}^{\ast}$) result in unstable equilibria. Interestingly, the sign of the equilibrium, and hence the behavior, depends on $\beta $, and leads to dramatically diverse behavior in the antigenicity of a dominant tumor clone as it progresses via immune recognition. This case is meaningful as long as the intertemporal penalty assuming the optimal strategy occurs, $\beta +{f}_{n}^{*}$, remains nonnegative whenever there is at least one recognition event. This is equivalent to the condition that ${f}_{n}^{\ast}+\beta \ge 1/\mathrm{ln}{\gamma}^{1}+\beta >0$, which is assumed in all examples that follow. These results are summarized in Figure 5 and organized below. The corresponding immunogenicity and cumulative mutations following escape are given by Figure 4, with the timing of escape and example trajectories given by Figure 5—figure supplement 3.
Antitumorinfiltrated ($\mathit{q}\mathbf{>}{\mathit{q}}^{\mathbf{\ast}}$, $\mathit{\beta}\mathbf{>}\mathbf{0}$): This stable steady state is positive, so that the process is meanreverting, and generates immunogenically warm’ tumors.
Antitumorexcluded ($\mathit{q}\mathbf{<}{\mathit{q}}^{\mathbf{\ast}}$, $\mathit{\beta}\mathbf{>}\mathbf{0}$): Here, recognition is low, while the arrival of new TAAs is large. This unstable steady state is negative, so that all trajectories tend to increase their immunogenicity over time, resulting in ‘hot’ tumors.
Protumorinfiltrated ($\mathit{q}\mathbf{>}{\mathit{q}}^{\mathbf{\ast}}$, $\mathit{\beta}\mathbf{<}\mathbf{0}$): In this case, recognition is large while the arrival of new TAAs is low. This stable steady state is negative, so that all trajectories tend to reduce their immunogenicity to zero over time, yielding ‘cold’ tumors.
Protumorexcluded ($\mathit{q}\mathbf{<}{\mathit{q}}^{\mathbf{\ast}}$, $\mathit{\beta}\mathbf{<}\mathbf{0}$): Lastly, if both recognition and new TAA arrival rates are low, then there is a positive unstable state, above which trajectories accumulate additional TAAs over time, becoming ‘hot,’ and below which the populations are predicted to reduce the number of recognizable TAAs over time, becoming ‘cold.’
These predicted dynamics parallel the observation that tumors under active immunosurveillance via effective recognition undergo significant immunoediting. Our results predict that the resulting tumor becomes ‘warm’ or ‘cold’ depending on the extent of new TAA arrival during active evasion. On the one hand, impaired recognition leads to diverse behavior dependent on the rate at which new TAAs are acquired during active evasion. If this acquisition rate is large, then the tumor accumulates TAAs over time to become ‘hot.’ On the other hand, tumors subject to reduced selection pressures may evolve as immunehot or immunecold tumors, consistent with previous observations (Lakatos et al., 2020). Moreover, the effect of reducing immune recognition leads to an accumulation of TAAs over time, consistent with experimental observations in lung cancer wherein patients with HLA loss of heterozygosity harbored larger mutational burdens, an indirect measure of TAA number of our model (McGranahan and Swanton, 2017). Our predictions suggest that immunogenicity ultimately depends on the number of detectable TAAs at the time of impaired immune recognition, suggesting that TAAdepleted tumors share in common the tendency for their evasion strategies to incur less antigenic penalties. Our results would predict the utility of altering the tumor microenvironment to increase the immunogenicity of immunecold tumors by making evasion more costly in a manner reminiscent of mutational meltdown (Gabriel et al., 1993). We remark that these dynamics are worth considering in the case of adoptive T cellbased immunotherapies, which have a large potential for exerting substantial coevolutionary pressure on a developing malignancy (George and Levine, 2021).
Survival benefit of active evasion
From the above analysis, immunogenicity dynamics of an active Evader are closest to those of a meanreverting passive Evader under the protumorinfiltrated case. Given this, we study the dynamics under active and passive evasion as well as the distribution of escape times and probability of escape (Figure 2). For a reasonable comparison, we fix $q$ and ${s}^{*}$ for each case, and the passive evasion rate $p$ is chosen to match the stationary mean optimal evasion rate ${\pi}^{*}$. Our simulations result in escape occurring 1.6 times more frequently under active evasion. Moreover, active evasion exhibits a broader distribution of elimination and escape times (Mean Passive Escape = 6.0, Var Passive Escape = 25.0, Mean Passive Elimination = 6.1, Var Passive Elimination = 30.1; Mean Active Escape = 7.2, Var Active Escape = 35.8, Mean Active Elimination = 6.7, Var Active Elimination = 38.0). Our results demonstrate that active evasion allows an Evader to adapt to the observed recognition and, despite continual penalty, allows an Evader to ‘outwait’ a Recognizer in order to undergo escape.
Exogenous recognition
One powerful advantage of this approach is that the theoretical predictions are not limited by the underlying distribution of r_{n} driving the process. In fact, the optimal policies and value function can handle any temporally varying recognition landscape, ${\{{r}_{n}\}}_{n}$, so long as $0\le {r}_{n}\le {s}_{n}$. We consider the effects of step, cyclical, increasing, and decreasing recognition landscapes on the relative evasion probability for populations adopting either a passive or active strategy (Figure 3).
In addition to arbitrary recognition landscapes, our dynamic programming approach may be applied to understand the effects of immunotherapeutic intervention, whereby immune escape can be modeled as a range of possible behavior on the spectrum of passive evasion to the most aggressive (active) evasion. For example, the active evasion dynamics assuming an antitumorinfiltrated case are similar to those of passive evasion. In both cases, the process escapes with immunogenicity values that fluctuate around a stationary ${s}^{*}$. We can recover the recover the relationship between ${s}^{*}$ and mutation rate $\nu (n)=\mathrm{\Delta}\lambda /\mathrm{\Delta}n$ via Equations 32 and 46 for the passive case and Equation 88, Equation 11 for the active case. In both cases, the result is similar:
demonstrating that immunogenicity, and thus the success likelihood of immunotherapeutic intervention, varies directly with mutation rate and inversely with recognition rate. This theory predicts that escape to a cold tumor is more likely when ${s}^{*}$ is close to 0 and is akin to complete evasion as modeled in George and Levine, 2018, contrasting with temporary evasion that may be recognized subsequently George and Levine, 2020. All else equal, higher mutational rates can lead to higher predicted efficacy via higher ${s}^{*}$, but this is not the only way as concomitantly high rates of recognition can drive ${s}^{*}$ down, thereby reducing predicted efficacy. In Equation 103, it is clear that a better immunotherapy prognosis occurs when the mutational rate is higher and the recognition rate is also low since ${s}^{*}$ is predicted large in this case. Figure 5—figure supplement 4 summarizes the behavior of an adaptive Evader subject to a temporally varying recognition pressure.
Data availability
All data generated or analyzed in this study are included in the supplementary data files. Source code is publicly available as a git repository (George, 2022).
References

The theory of dynamic programmingBulletin of the American Mathematical Society 60:503–515.https://doi.org/10.1090/S000299041954098488

Functional approximations and dynamic programmingMathematical Tables and Other Aids to Computation 13:247.https://doi.org/10.2307/2002797

Regulation of DNA repair in hypoxic cancer cellsCancer Metastasis Reviews 26:249–260.https://doi.org/10.1007/s1055500790613

Granzymes in cancer and immunityCell Death and Differentiation 17:616–623.https://doi.org/10.1038/cdd.2009.206

Ph sensing and regulation in cancerFrontiers in Physiology 4:370.https://doi.org/10.3389/fphys.2013.00370

Cancer immunoediting: from immunosurveillance to tumor escapeNature Immunology 3:991–998.https://doi.org/10.1038/ni1102991

The three ES of cancer immunoeditingAnnual Review of Immunology 22:329–360.https://doi.org/10.1146/annurev.immunol.22.012703.104803

Muller’s ratchet and mutational meltdownsEvolution; International Journal of Organic Evolution 47:1744–1757.https://doi.org/10.1111/j.15585646.1993.tb01266.x

Stochastic modeling of tumor progression and immune evasionJournal of Theoretical Biology 458:148–155.https://doi.org/10.1016/j.jtbi.2018.09.012

Tumor mutational burden as an independent predictor of response to immunotherapy in diverse cancersMolecular Cancer Therapeutics 16:2598–2608.https://doi.org/10.1158/15357163.MCT170386

Immunological tumor heterogeneity and diagnostic profiling for advanced and immune therapiesADVANCES IN CELL AND GENE THERAPY 4:e113.https://doi.org/10.1002/acg2.113

Tracking the evolution of nonsmallcell lung cancerThe New England Journal of Medicine 376:2109–2121.https://doi.org/10.1056/NEJMoa1616288

Heterogeneity of the tumor immune microenvironment and its clinical relevanceExperimental Hematology & Oncology 11:24.https://doi.org/10.1186/s4016402200277y

Brownian Motion and Stochastic Calculus47–127, Brownian motion, Brownian Motion and Stochastic Calculus, New York, NY, Springer, 10.1007/9781461209492.

Stochastic modeling of drug resistance in cancerJournal of Theoretical Biology 239:351–366.https://doi.org/10.1016/j.jtbi.2005.08.003

Evolutionary dynamics of neoantigens in growing tumorsNature Genetics 52:1057–1066.https://doi.org/10.1038/s4158802006871

Cancer mechanobiology: microenvironmental sensing and metastasisACS Biomaterials Science & Engineering 5:3735–3752.https://doi.org/10.1021/acsbiomaterials.8b01230

New insights into M1/M2 macrophages: key modulators in cancer progressionCancer Cell International 21:389.https://doi.org/10.1186/s12935021020892

Hsmg1 is a granzyme Bassociated stressresponsive protein kinaseJournal of Molecular Medicine 89:411–421.https://doi.org/10.1007/s0010901007080

Dna damage and repair biomarkers of immunotherapy responseCancer Discovery 7:675–693.https://doi.org/10.1158/21598290.CD170226

BookThe paradox of cancer immune exclusion: Immune oncology next frontierIn: Marincola FM, Lee PP, editors. Tumor Microenvironment. Cham: Springer. pp. 173–195.https://doi.org/10.1007/9783030388621

Driver and passenger mutations in cancerAnnual Review of Pathology 10:25–50.https://doi.org/10.1146/annurevpathol012414040312

Evolving responsively: adaptive mutationNature Reviews. Genetics 2:504–515.https://doi.org/10.1038/35080556

Immunodominance and tumor escapeSeminars in Cancer Biology 12:25–31.https://doi.org/10.1006/scbi.2001.0401

A guide to cancer immunotherapy: from T cell basic science to clinical practiceNature Reviews. Immunology 20:651–668.https://doi.org/10.1038/s4157702003065

Targeting neoantigens to augment antitumour immunityNature Reviews. Cancer 17:209–222.https://doi.org/10.1038/nrc.2016.154
Decision letter

Yuval ElhanatiReviewing Editor; Memorial Sloan Kettering Cancer Center, United States

Aleksandra M WalczakSenior Editor; CNRS, France
Our editorial process produces two outputs: (i) public reviews designed to be posted alongside the preprint for the benefit of readers; (ii) feedback on the manuscript for the authors, including requests for revisions, shown below. We also include an acceptance summary that explains what the editors found interesting or important about the work.
Decision letter after peer review:
Thank you for submitting your article "Optimal Cancer Evasion in a Dynamic Immune Microenvironment Generates Diverse PostEscape Tumor Antigenicity Profiles" for consideration by eLife. Your article has been reviewed by 3 peer reviewers, one of whom is a member of our Board of Reviewing Editors, and the evaluation has been overseen by Aleksandra Walczak as the Senior Editor. The reviewers have opted to remain anonymous.
The reviewers have discussed their reviews with one another, and the Reviewing Editor has drafted this to help you prepare a revised submission.
Essential revisions:
As you can see below, the scope and mathematical effort in your manuscript were greatly appreciated by all reviewers. The model is mathematically rigorous and addresses an important and timely problem. There were some doubts about the applicability of your model to cancer evolution in real situations, but as a conceptual paper, it can contribute much to the advancement of the field. Even then, it needs some more work to be truly beneficial for the community.
1) Clarity of the paper should be improved. This includes a better discussion of the underlying assumptions, less technical terms and clearer ones, and a tighter exploration of the important parameters of the model, possibly with a phase diagram or similar graphical aid.
2) The process of Clonal cancer evolution should be better discussed in relation to the model, which describes the dynamics of mutations but does not follow clones.
3) The claim that the cancer cells are sensing the immune system is a bold and intriguing one, and as such need to be better supported, or otherwise, it is hard to justify.
Reviewer #1 (Recommendations for the authors):
The actual evolutionary process in the tumor happens on the level of cells and clones. Tumor clones proliferate and compete, each with its own fitness under the changing selection pressure from the immune system. However, the model here does not address tumor cells or clones, but rather the presented antigens as independent particles. While this is not an unreasonable choice, it does require better discussion and justification.
A major ingredient of the model is the penalty for immune evasion – in which while evading immune recognition of certain TAAs, a tumor will develop new ones as potential new targets for the immune system. This is presented in an opaque way, with several assumptions and specific mathematical forms. There should be some tradeoff between immune escape and loss of function, that would lead to such a penalty, and it should be made more concrete and transparent.
The idea that cancer can sense and respond to the immune system and the tumor microenvironment is exciting and intriguing but therefore requires stronger evidence. It would be valuable to make more connections to known observations to support this type of model. This is especially true since some of the model "predictions", like the correlation between lower immune surveillance and tumor mutational burden, are known from simpler principles.
And more generally, language and notation should be improved throughout the manuscript, making it easier to follow. Some jargon should be discouraged or explained, like "daten" or "mean evolution dynamics" (section 3.1). A phasespacelike diagram of the different parameters' regimens of the model (maybe β and q?) would also be extremely useful.
The definition of eta is confusing. The manuscript states: 'eta may be interpreted as the probability of the complement of the following event: "recognition occurs without matched evasion for a single antigen". In other words, eta is the probability of a tie at a single antigen position.' But the complement of that event also includes the probability of no recognition, on top of a tie, unless I misunderstood what tie means. Regardless, tie is a confusing term here and should at least be explained better.
Reviewer #2 (Recommendations for the authors):
Reviewer #3 (Recommendations for the authors):
Developing such conceptual models is important and has the potential to inspire the wider field. However, I fear that some of that full potential might not be reached without additional work rewriting the text for greater clarity and precision. The following are some suggestions that might be helpful in that regard.
To start with, I had a hard time following some of the terminology. There are some instances where different terms are used to refer to the same concept. For instance, in Figure 1C the terminology changes between the legend (evasion rate) and plot (optimal downregulation attempt probability). Furthermore, the terminology seems overloaded: π is referred to as an evasion probability, but maybe one would want to reserve this term for the complete evasion of immune recognition by cancer. π might be more simply referred to as a rate of antigen loss, or similar. I was also wondering whether it would make sense to directly include β in Eq.1 and separate it from the penalty term. I understood penalty to refer to the increase in antigen creation rate when π is higher. From the equation, β instead could be more aptly named a basal rate of antigen creation. Lastly, notations should be uniformized. Specifically, I noted that in the methods the rate at which TAAs are lost is denoted by p instead of pi, if I understand correctly, which can cause confusion for the reader.
https://doi.org/10.7554/eLife.82786.sa1Author response
Essential revisions:
As you can see below, the scope and mathematical effort in your manuscript were greatly appreciated by all reviewers. The model is mathematically rigorous and addresses an important and timely problem. There were some doubts about the applicability of your model to cancer evolution in real situations, but as a conceptual paper, it can contribute much to the advancement of the field. Even then, it needs some more work to be truly beneficial for the community.
1) Clarity of the paper should be improved. This includes a better discussion of the underlying assumptions, less technical terms and clearer ones, and a tighter exploration of the important parameters of the model, possibly with a phase diagram or similar graphical aid.
We share in the reviewer’s enthusiasm for our model’s mathematical rigor and conceptualization, and we are grateful for the helpful suggestions on improving model clarity.
Following the reviewers’ suggestions, we have made several significant changes to the manuscript to clarify the underlying assumptions. First, we have changed the language describing the evasion probability and the rate of TumorAssociated Antigen (TAA) loss (π_{n}). We have also restructured the presentation of the evasion penalty to separate out the decisiondependent (π_{n}) term, now given by f_{n}, from the decisionindependent contribution given by the β which is more aptly referred to as the basal antigen creation rate throughout the manuscript. The first use of fixed π_{n} equal to p has been clarified in the Methods section.
Two of the reviewers had suggested that there is a degree of confusion in discussing the tie probabilities in this process. In an effort to clarify our approach, we now replace that language with the idea of tumorimmune equilibrium in order to keep the discussion grounded in Schreiber’s widely accepted conceptual framework of the ‘Three E’s (Elimination, Escape, Equilibrium) of Immunoediting’. The language has now been changed in the Results and Methods sections and described explicitly in the Introduction:
“Immunosurveillance via distinct T cell clones imposes an adaptive, stochastic recognition environment on a developing cancer population (Desponds et al. PNAS 2016) that can result either in cancer elimination, escape, or equilibrium (Dunn et al. Annu Rev Immunol 2004). Equilibrium results in cancer coexistence with the immune system over large time scales (Turajlic et al. Cell 2018) thereby motivating the need for a more complete understanding of the interplay between immune recognition and cancer evolution for effective therapeutic design.”
We appreciate the editor’s suggestion of a phase diagram to graphically depict the important parameter regions in question and have now added this graphical aid to Figure 5 to clarify the relevant regimes corresponding to those cases.
2) The process of Clonal cancer evolution should be better discussed in relation to the model, which describes the dynamics of mutations but does not follow clones.
It has been correctly noted that our model does not attempt to deal with possible subclonal structure of the tumor. This possibility introduces additional complications into the model formulation and we wanted to first establish the basic idea of “adaptive evasion” in the simplest context. In an attempt to illustrate how reasonable future generalizations of our model could include nontrivial subclonal heterogeneity in tumor antigens tracked through time, we now describe how one would go about enhancing the existing model to address this. The following has been added to the Methods and Discussion sections:
Addition to Methods: “The above describes a clonal population harboring a core minimal set of TAAs for which recognition and downregulation ultimately determine cancer escape, elimination, or equilibrium. Our model can however be adapted to study the more general scenario involving a clonal hierarchy of heterogeneous cancer cells. We illustrate this by considering a population of cells with a set C of c =  C  core clonal TAAs, together with distinct groups of cells with subclonal collections of TAAs S_{1} and S_{2} (having size s_{1} = S_{1} and s_{2} = S_{2}, respectively). The relevant populations therefore have antigen sets given by P_{1} = CUS_{1} and P_{2} = CUS_{2}. The basic event considered in the foundational model, [r_{n} > 0], must now be replaced by the event that recognition occurs in both P_{1} and P_{2}; in the absence of recognition of both subclones, the cancer escapes. Recognition happens either if there is a recognition event r in C or if there are simultaneous recognition events r_{1} in S_{1} and r_{2} in S_{2}. Assuming that TAA recognition occurs independently as before with probability q, the total probability of relevant recognition, originally (1−γ^{sn}), is now given by (1−γ^{c}) +γ^{c}(1−γ^{r1})(1− γ^{r2}). The first term characterizes the coupling of the fate of both subclones should a common TAA be recognized, while the latter term represents the parallel recognition process required to control each subclone separately via subclonal TAA recognition. Lastly, assuming that recognition proceeds either by a shared TAA in C or instead by subclonal TAAs in both S_{1} and S_{2}, then the probability of elimination and progression proceed identically as before. In the remainder of the discussion, we will, for baseline understanding, only track a core set of clonal antigens on the fittest clone.”
Addition to Discussion: “In this foundational model, we demonstrated the dynamics of immune recognition of an adaptive population of cancer cells expressing a purely clonal pattern of antigens. Our model implicitly equates antigen loss and the progression of a subpopulation currently adapted to evade immune targeting – either by direct pruning of the fittest subclone or by stochastic emergence and subsequent growth of a new one lacking the targeted antigens – as equivalent. Here, we tracked the fittest clone represented by a core set clonal antigens. We remark that heterogeneous populations each having a distinct subclonal signature can also be tracked, but the corresponding antigendriven selection and fitness cost to each clone would be coupled through shared antigens (see Methods). Finally, we note that this extended approach implicitly assumes that antigen detection rates over a given period are subclone sizeindependent, given that antigens are tracked over a period where each of the clones with comparable fitness would be detectable by the immune system during their growth trajectory en route to attempted escape.”
3) The claim that the cancer cells are sensing the immune system is a bold and intriguing one, and as such need to be better supported, or otherwise, it is hard to justify.
We are grateful to the editor and reviewers for requesting that this point be made more explicitly. The rheostat for stress that was previously discussed in the initial draft was specific for stress responses that lead to increased mutagenesis in yeast and hypoxiadriven responses in cancer. While this is indicative of a cell’s capability of an adaptive response, it did not specifically relate this possibility to the actions of the immune system. What we have now included is a direct connection between Tcell activity and this response, which is known to occur both indirectly, in the setting of immunemediated cytokine release following T cell recognition, and as a result of T cell targeting directly. In an attempt to present concrete connections to empirical observations in support of this phenomenon, we have added the following paragraph to the Discussion section:
“Our analysis centered on the ability of cancer populations to adaptively respond to a measured immune state, and we have primarily focused on studying subsequent mutations resulting in the disruption of existing (targeted) tumorassociated antigenic targets and on the generation of new ones. It is important to note that independent empirical observations support the ability of cancer cells to sense their immune microenvironment (IME), and perhaps even the level of CD8^{+} killing that occurs therein. At the signaling level, IL6 secreted by CTLs, macrophages, and dendritic cells in response to immune recognition has been shown to directly activate ataxiatelangiectasia mutated (ATM), a factor implicated in response to DNA damage, and this has been associated with increased metastasis and multidrug resistance in lung cancer (Jiang et al. Oncotarget 2015; Yan et al. Cancer Science 2014). IFNγ released by activated CD8^{+} tumorinfiltrating lymphocytes activates the cellintrinsic STING pathway in response to DNA damage in cancer, implicating an altered TME from activated CD8^{+} Tcells that is measurable by the cancer (Xiong et al. Oncoimmunology 2022). Lastly, at the level of individual TCR interactions with recognized tumor cells, Granzyme B release has been directly linked to DNA damage and associated CHK2 and p53 stressresponses, and studies have demonstrated hSMG1 stressactivated proteins upregulated in cancer cells following granzyme B treatment (Meslin et al. J Mol Med 2011). Moreover, granzyme release in the microenvironment serves a signaling molecule promoting a proinflammatory response from other immune cells (Cullen et al. Cell Death Differentiation 2010). The relatively acute response and short halflives of downstream effectors (minutes for p53 and hours for CHK1, for example), provide a tunable response based on the current level of immune targeting through stressinduced mutagenesis (Bindra et al. Cancer Met Rev 2007; Rosenberg Nat Rev Genetics 2001; Rosenberg, Queitsch. Science 2014) that in our analysis directly influences tumorassociated antigen availability.”
Reviewer #1 (Recommendations for the authors):
The actual evolutionary process in the tumor happens on the level of cells and clones. Tumor clones proliferate and compete, each with its own fitness under the changing selection pressure from the immune system. However, the model here does not address tumor cells or clones, but rather the presented antigens as independent particles. While this is not an unreasonable choice, it does require better discussion and justification.
We thank the reviewer for their helpful comments. We remark that our model implicitly equates antigen loss and the progression of a subpopulation currently adapted to evade immune targeting – either by direct pruning of the fittest subclone or by stochastic emergence and subsequent growth of a new one lacking the targeted antigens – as equivalent. The foundational analysis was conducted on a clonal antigenic structure assuming a minimal collection of TAAs that required targeting for a change in equilibrium to either escape or elimination. Because we for foundational understanding studied the case where a single clonal signature was tracked in time, we underexplained the implementation of such a model in more complicated cases.
The next most complicated scenario involves a heterogeneous population of cancer cells with disjoint neoantigen profiles. In this case, a parallel process can be studied wherein the effects of recognition in one environment are decoupled from the other (relevant to, for example, spatially distinct subpopulations). This description however misses the case where such disparate populations evolve to express shared antigens, or in the case where there are both clonal and subclonal antigen targets. Here, our model can still be applied in parallel to study distinct clones but requires additional structure. Namely, in this case we would need to incorporate nontrivial coupling between the possible recognition/selection against certain antigens shared across clones. For example, control of a population with clonal antigens {a, b} but having unique subclones having either antigens {w,x} or {y,z} could be considered by studying the process in parallel, and control in the next periods would require recognition/selection against either (1) at least one of {w,x} and at least one of {y,z}, or (2) at least one of {a,b}. In this more general framework, the arrival of new subclones with distinct features from the parent clone in question could also be incorporated and studied across time periods. This strategy of subdividing more complicated evolutionary structures has now been further elaborated on in the Methods section, and we have expounded these points in the discussion (see additions given under Editor Comment 2).
A major ingredient of the model is the penalty for immune evasion – in which while evading immune recognition of certain TAAs, a tumor will develop new ones as potential new targets for the immune system. This is presented in an opaque way, with several assumptions and specific mathematical forms. There should be some tradeoff between immune escape and loss of function, that would lead to such a penalty, and it should be made more concrete and transparent.
Indeed, the clarity of this balance between TAAs both lost and generated was discussed in an excessively complicated way. In order to add more clarity, we have isolated the basal arrival of TAAs from the penalty term. In doing so, the assumptions on the functional form for penalty is simplified to be directly proportional to the rate of TAA loss and distinct from any r_{n},π_{n}independent TAA additions fully characterized by the β term. We have also clarified these points early in the model development section at the point where β and penalty terms f_{n} are described: XXX
The idea that cancer can sense and respond to the immune system and the tumor microenvironment is exciting and intriguing but therefore requires stronger evidence. It would be valuable to make more connections to known observations to support this type of model. This is especially true since some of the model "predictions", like the correlation between lower immune surveillance and tumor mutational burden, are known from simpler principles.
We thank the reviewer for this helpful suggestion since our original presentation omitted all but one sentence on the precise molecular details of how this may occur. The rheostat on stress that was previously discussed was specific for stress responses that lead to increased mutagenesis in yeast and hypoxiadriven responses in cancer. What should have been included was a direct connection between Tcell activity and this response, which is known to occur both indirectly, in the setting of immunemediated cytokine release following T cell recognition, and as a result of T cell targeting directly. In an attempt to make concrete connections to empirical observations in support of this phenomenon, we have added these points in the Discussion section (see additions given under Editor Comment 3).
And more generally, language and notation should be improved throughout the manuscript, making it easier to follow. Some jargon should be discouraged or explained, like "daten" or "mean evolution dynamics" (section 3.1). A phasespacelike diagram of the different parameters' regimens of the model (maybe β and q?) would also be extremely useful.
We thank the reviewer for suggesting helpful additions to aid in presentation clarify. Specifically, we have removed excessive jargon (‘daten’) and have expanded upon what we mean by mean evolution dynamics. The thought of adding a phasespace diagram for a visual map of the model’s key regimes is a great suggestion, and we have now added this to Figure 5.
The definition of eta is confusing. The manuscript states: 'eta may be interpreted as the probability of the complement of the following event: "recognition occurs without matched evasion for a single antigen". In other words, eta is the probability of a tie at a single antigen position.' But the complement of that event also includes the probability of no recognition, on top of a tie, unless I misunderstood what tie means. Regardless, tie is a confusing term here and should at least be explained better.
ղ = 1−q(1−p) is by definition the probability complement of q(1−p), which is the probability for a single recognition event without antigen loss (at that one position). The reviewer is correct that this event can occur either via no antigen recognition (so a ‘tie’ occurs at that position by default), or by recognition at that position being matched by antigen loss.
To make this clearer and to remove the ‘tie’ interpretation out of the description, we now explain that balanced recognition and evasion is referred to as ‘equilibrium’, which needs to occur for any and all antigens that are recognized for the process to avoid escape or elimination, but that we can also characterize the probability that a single antigen is in equilibrium. In this way, ղ represents the probability that equilibrium exists in one antigen position provided that there is at least one available antigen for immune targeting.
Reviewer #3 (Recommendations for the authors):
Developing such conceptual models is important and has the potential to inspire the wider field. However, I fear that some of that full potential might not be reached without additional work rewriting the text for greater clarity and precision. The following are some suggestions that might be helpful in that regard.
To start with, I had a hard time following some of the terminology. There are some instances where different terms are used to refer to the same concept. For instance, in Figure 1C the terminology changes between the legend (evasion rate) and plot (optimal downregulation attempt probability). Furthermore, the terminology seems overloaded: π is referred to as an evasion probability, but maybe one would want to reserve this term for the complete evasion of immune recognition by cancer. π might be more simply referred to as a rate of antigen loss, or similar. I was also wondering whether it would make sense to directly include β in Eq.1 and separate it from the penalty term. I understood penalty to refer to the increase in antigen creation rate when π is higher. From the equation, β instead could be more aptly named a basal rate of antigen creation. Lastly, notations should be uniformized. Specifically, I noted that in the methods the rate at which TAAs are lost is denoted by p instead of pi, if I understand correctly, which can cause confusion for the reader.
We thank the reviewer for their thorough review and thoughtful suggestions. We have addressed some the issues above, which we believe enhances the clarity of the new draft. In taking the reviewer’s suggestion, we now describe π as the rate of antigen loss, reserving ‘evasion probability’ for discussions involving complete immune evasion.
We very much like the suggestion of isolating/emphasizing the β basal rate of antigen creation from the penalty (in the strict sense) incurred for choosing larger π_{n}. This is a clearer way to present the Results section that already assumed β to be r_{n},π_{n}independent. Toward this end, we have added this to Eq. 1 and discuss it thereafter, and we have made the additional updates to the other Results sections where β and f_{n} are mentioned.
Our reason for describing the TAA loss rate with p and π_{n} separately is to emphasize when it is simply a passive feature (π_{n} = p fixed) versus when it is actively chosen. This was distinguished in the Model Development section but was previously confusing given the initial use of p followed by π_{n}. For clarity, we have now elaborated at the first mention of the antigen loss rate: “The rate of antigen loss π_{n} may in general vary as a function of time and environmental features (considered in Sec. 5.2). In this section, we assume it is passively fixed and denote this rate as p.
https://doi.org/10.7554/eLife.82786.sa2Article and author information
Author details
Funding
Cancer Prevention Research Institute of Texas (RR210080)
 Jason T George
National Science Foundation (PHY2019745)
 Herbert Levine
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Acknowledgements
JTG thanks Kerry E Back, Philip A Ernst, Thomas J George, and Richard A Tapia for their helpful discussions on stochastic dynamic programming and optimization. JTG was supported by the Cancer Prevention Research Institute of Texas (RR210080). JTG is a CPRIT Scholar in Cancer Research. HL is supported by the National Science Foundation (NSF) grant NSF PHY2019745.
Senior Editor
 Aleksandra M Walczak, CNRS, France
Reviewing Editor
 Yuval Elhanati, Memorial Sloan Kettering Cancer Center, United States
Version history
 Preprint posted: August 5, 2022 (view preprint)
 Received: August 17, 2022
 Accepted: March 24, 2023
 Version of Record published: April 25, 2023 (version 1)
Copyright
© 2023, George and Levine
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics

 764
 Page views

 82
 Downloads

 1
 Citations
Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading

 Biochemistry and Chemical Biology
 Cancer Biology
The Polycomb Repressive Complex 2 (PRC2) methylates H3K27 to regulate development and cell fate by transcriptional silencing. Alteration of PRC2 is associated with various cancers. Here, we show that mouse Kdm1a deletion causes dramatic reduction of PRC2 proteins, whereas mouse null mutation of L3mbtl3 or Dcaf5 results in PRC2 accumulation and increased H3K27 trimethylation. The catalytic subunit of PRC2, EZH2, is methylated at lysine 20 (K20), promoting EZH2 proteolysis by L3MBTL3 and the CLR4^{DCAF5 }ubiquitin ligase. KDM1A (LSD1) demethylates the methylated K20 to stabilize EZH2. K20 methylation is inhibited by AKTmediated phosphorylation of serine 21 in EZH2. Mouse Ezh2^{K20R/K20R} mutants develop hepatosplenomegaly associated with high GFI1B expression, and Ezh2^{K20R/K20R} mutant bone marrows expand hematopoietic stem cells and downstream hematopoietic populations. Our studies reveal that EZH2 is regulated by methylationdependent proteolysis, which is negatively controlled by AKTmediated S21 phosphorylation to establish a methylationphosphorylation switch to control the PRC2 activity and hematopoiesis.

 Cancer Biology
Mammalian ferredoxin 1 and 2 (FDX1/2) belong to an evolutionary conserved family of ironsulfur cluster containing proteins and act as electron shutters between ferredoxin reductase (FDXR) and numerous proteins involved in critical biological pathways. FDX1 is involved in biogenesis of steroids and bile acids, Vitamin A/D metabolism, and lipoylation of tricarboxylic acid (TCA) cycle enzymes. FDX1 has been extensively characterized biochemically but its role in physiology and lipid metabolism has not been explored. In this study, we generated Fdx1deficient mice and showed that knockout of both alleles of the Fdx1 gene led to embryonic lethality. We also showed that like Fdxr^{+/}+/, Fdx1^{+/}+/ had a shorter life span and were prone to steatohepatitis. However, unlike Fdxr^{+/}+/, Fdx1^{+/}+/ were not prone to spontaneous tumors. Additionally, we showed that FDX1 deficiency led to lipid droplet accumulation possibly via the ABCA1SREBP1/2 pathway. Specifically, untargeted lipidomic analysis showed that FDX1 deficiency led to alterations in several classes of lipids, including cholesterol, triacylglycerides, acylcarnitines, ceramides, phospholipids and lysophospholipids. Taken together, our data indicate that FDX1 is essential for mammalian embryonic development and lipid homeostasis at both cellular and organismal levels.