Design of an optimal combination therapy with broadly neutralizing antibodies to suppress HIV1
Abstract
Infusion of broadly neutralizing antibodies (bNAbs) has shown promise as an alternative to antiretroviral therapy against HIV. A key challenge is to suppress viral escape, which is more effectively achieved with a combination of bNAbs. Here, we propose a computational approach to predict the efficacy of a bNAb therapy based on the population genetics of HIV escape, which we parametrize using highthroughput HIV sequence data from bNAbnaive patients. By quantifying the mutational target size and the fitness cost of HIV1 escape from bNAbs, we predict the distribution of rebound times in three clinical trials. We show that a cocktail of three bNAbs is necessary to effectively suppress viral escape, and predict the optimal composition of such bNAb cocktail. Our results offer a rational therapy design for HIV, and show how genetic data can be used to predict treatment outcomes and design new approaches to pathogenic control.
Editor's evaluation
This paper will be of interest to scientists within the fields of statistical and biological physics, immunology, and vaccinology. The mathematical/statistical framework is rigorously constructed based on key concepts from population genetics and highthroughput viral genetic sequence data. The results provide important insights into the failures of past treatment regimens with broadly neutralizing antibodies to suppress viral escape in clinical trial participants. The results also present exciting and highly testable predictions of improved treatment strategies for combatting HIV through passive bnAb immunization.
https://doi.org/10.7554/eLife.76004.sa0Introduction
Recent discoveries of highly potent broadly neutralizing antibodies (bNAbs) provide new opportunities to successfully prevent, treat, and potentially cure infections from evolving viruses such as HIV1 (Walker et al., 2011; Walker et al., 2009; Liao et al., 2013; Mouquet and Nussenzweig, 2013; Klein et al., 2013; Kwong et al., 2013; Caskey et al., 2015; Caskey et al., 2017; Baron et al., 2018; Sok and Burton, 2018; Zwick et al., 2001; Burton et al., 2012, influenza Sparrow et al., 2016, and the Dengue virus Ekiert and Wilson, 2012; Durham et al., 2019). bNAbs target vulnerable regions of a virus, such as the CD4 binding site of HIV env protein, where escape mutations can be costly for the virus (Walker et al., 2009; Chen et al., 2009; Zhou et al., 2010; Walker et al., 2011; Liao et al., 2013; West et al., 2014; Burton and Hangartner, 2016). As a result, eliciting bNAbs is the goal of a universal vaccine against the otherwise rapidly evolving HIV1. Apart from vaccination, bNAbs can also offer significant advances in therapy against both HIV1 and influenza (West et al., 2014; Caskey et al., 2016; Gruell and Klein, 2018; Durham et al., 2019). Specifically, augmenting current antiretroviral therapy (ART) drugs with bNAbs may provide the next generation of HIV therapies (Horwitz et al., 2013; Gruell and Klein, 2018).
Recent studies have used bNAb therapies to curb infections by the Simian immunodeficiency virus (SHIV) in nonhuman primates (Shingai et al., 2013; Barouch et al., 2013; Julg et al., 2017), and HIV1 infections in human clinical trials (Caskey et al., 2015; Bar et al., 2016; Caskey et al., 2017; Baron et al., 2018). Monotherapy trials with potent bNAbs, including 3BNC117 (Caskey et al., 2015, VRC01 Bar et al., 2016, and 10–1074 Caskey et al., 2017) indicate that administering bNAbs is safe and can suppress viral load in patients. Nonetheless, in each trial, escape mutants emerge resulting in a viral rebound after about 20 days past infusion of the bNAb. However, in trials that administered a combination of 10–1074 and 3BNC117, viral rebound was substantially suppressed (Shingai et al., 2013; Baron et al., 2018). The success of combination therapy is not surprising. For example, combinations of drugs has been repeatedly used against infectious agents, including current HIV ART cocktails and combination antibiotic treatments against Tuberculosis (Lienhardt et al., 2012).
The principle behind combination therapy with either drugs or antibodies is clear: It is harder for a pathogen population to acquire resistance against multiple treatment targets simultaneously than to acquiring resistance against each target separately. But deciding on combination therapy means navigating an enormous number of possible treatment options.
Experimental data from neutralization assays against pseudoviruses together with modeling and machine learning techniques have been used to statistically characterize the efficacy of bNAbs and their combinations against different variants of HIV (Wagh et al., 2016; Yu et al., 2019). Using these neutralization models, an optimal combination therapy was proposed based on their breadth, potency of neutralization, and other relevant measures (Wagh et al., 2016; Yu et al., 2019). In another study, pharmacokinetic dynamics was coupled with druginteraction models to determine an optimal dosing strategies (Mayer et al., 2022). These modeling approaches shed light on how combinations of bNAbs that can neutralize a panel of viruses. However, the key obstacle in bNAb therapy is the possibility of viral escape.
To characterize viral escape, mechanistic models, partly inspired by previous work on HIV escape from the antiretro viral therapy (ART), have been developed to explain the dynamics of viremia in patients, following passive infusion of bNAbs. By making a fit to the trial data, these models are used to infer parameters related to the efficacy of a bNAb in clearing virions, reducing viral load and infectivity, and also to infer the characteristics of the HIV and the Tcell populations, such as the initial viral population size, the death rates of uninfected and infected Tcells, and the number of virions released by an infected Tcell (Perelson et al., 1996; Ribeiro and Bonhoeffer, 2000; Rong et al., 2010; Rong et al., 2007; Tomaras et al., 2008; Lu et al., 2016; Reeves et al., 2020; Saha and Dixit, 2020; CardozoOjeda and Perelson, 2021; Stephenson et al., 2021). These detailed mechanistic models cannot easily generalize from one trial to another in order to predict the efficacy of a new bNAb mono or combination therapy.
Evolution of the HIV1 population is another key factor to consider in modeling the dynamics and escape of viruses in response to therapy. Studies on population genetics of HIV1 have found rapid intrapatient evolution and turnover of the virus (Lemey et al., 2006; Zanini et al., 2015) and have indicated that the efficacy of drugs in ART can severely impact the mode of viral evolution and escape (Feder et al., 2016). Despite the complex evolutionary dynamics of HIV1 within patients due to individualized immune pressure (Nourmohammad et al., 2019), genetic linkage (Zanini et al., 2015), recombination (Neher and Leitner, 2010; Zanini et al., 2015), and epistasis between loci (Bonhoeffer et al., 2004; Zhang et al., 2020), the genetic composition of a population can still provide valuable information about the evolutionary significance of specific mutations, especially in highly vulnerable regions of the virus. For example, analysis of genomic covariation in the Gag protein of HIV1 has been successful in predicting fitness effect of mutations in relatively conserved regions of the virus, which could inform the design of rational Tcell therapies that target these vulnerable regions (Ferguson et al., 2013). In the context of bNAb therapy trials, an evolutionary model accounting for the intrinsic fitness cost associated with escape variants against a specific bNAb has been used to characterize HIV1 dynamics and escape following bNAb infusion (Meijers et al., 2021).
In this study, we present a coarsegrained evolutionary model of viral response to bNAb infusion that uses genetic data of HIV1 in untreated patient to predict bNAb therapy outcome by characterizing the chances of viral escape from a given bNAb in patients. Specifically, we develop a statistical inference framework that uses the high throughput longitudinal survey of viral sequences collected from 11 ARTnaive patients over about 10 years of infection (Zanini et al., 2015) to characterize the evolutionary fate of escape mutations and to predict patient outcomes in recent mono and combination therapy trials with 10–1074 and 3BNC117 bNAbs (Caskey et al., 2015; Caskey et al., 2017; Baron et al., 2018), and a trial with PGT121 bNAb (Stephenson et al., 2021). Using the accumulated intrapatient genetic variation from deep sequencing of HIV1 populations in ART–naive patients (Zanini et al., 2015), we can estimate the diversity and the fitness effects of mutations at sites mediating escape. These variables parametrize our individualbased model for viral dynamics and characterize the expected path for a potential escape of HIV1 populations in response to bNAb therapies in patients enrolled in the clinical trials. Although our coarsegrained model does not accurately reproduce the detailed dynamics of viremia in each patient and lacks the mechanistic insight of richer models proposed by Perelson et al., 1996; Rong et al., 2010; Rong et al., 2007; Tomaras et al., 2008; CardozoOjeda and Perelson, 2021; Stephenson et al., 2021, it still accurately predicts the distribution of viral rebound times in response to passive bNAb infusions — a key measure the efficacy for a bNAb clinical trial.
Our prediction for the viral rebound time in response to a bNAb relies on only a few patientspecific parameters (i.e. the genetic diversity of patients prior to treatment), and is primarily done based on the inferred genetic parameters from the deep sequencing of HIV1 populations in a separate cohort of ARTnaive patient. Therefore, our model could be used to guide therapy trial design with bNAbs against which viral escape variants are previously characterized. To this end, we use our approach to assess a broader panel of nine bNAbs, for which escape sites can be identified from prior deep mutational scanning experiments (Dingens et al., 2019), to characterize the therapeutic efficacy of each of these bNAbs and to propose optimal combination therapies that can efficiently curb an HIV1 infection. Our results showcase how the wealth of genetic data can be leveraged to guide rational therapy approaches against HIV. Importantly, this approach is potentially applicable to therapy designs against other evolving pathogens, such as chronic viruses like HCV, resistant bacteria, or cancer tumor cells.
Model
HIV1 response to therapy
After infusion of bNAbs in a patient, the antibodies bind and neutralize the susceptible strains of HIV. The neutralized subpopulation of HIV1 no longer infects Tcells, and the plasma RNA copynumber associated with this neutralized population decays. The dynamics of viremia in HIV1 patients off ART following a bNAb therapy with 3BNC117 (Caskey et al., 2015), 10–1074 (Caskey et al., 2017), and their combination (Baron et al., 2018 ) are shown in Figure 3—figure supplements 1–3. With competition of the neutralized strains removed, the resistant subpopulation grows until the viral load typically recovers to a level close to the pretreatment state (i.e. the carrying capacity); see Figure 1A. The time it takes for the viral load to recover is the rebound time—a key quantity that characterizes treatment efficacy within a patient. Although the details of the viremia dynamics, especially at beginning and at the end of the therapy, may be complex (Lu et al., 2016; Reeves et al., 2020; Saha and Dixit, 2020; Meijers et al., 2021), the rebound time can be approximately modeled using a logistic growth after bNAb infusion (t > 0),
with the initial condition set for pretreatment fraction of resistant subpopulation $x={N}_{r}(0)/({N}_{r}(0)+{N}_{s}(0))$, where ${N}_{r}(0)$ and ${N}_{s}(0)$ denote the size of resistant and susceptible subpopulations at time $t=0$, respectively. Here, $\gamma $ is the growth rate of the resistant population, $r$ is the neutralization rate impacting the susceptible subpopulation, and ${N}_{k}$ is the carrying capacity (Figure 1A, Methods). In our analysis, we set $\gamma =1/3{\text{days}}^{1}$ or a doubling time of $\sim 2$ days, the known HIV1 growth rate in patients (Perelson et al., 1996). We infer the neutralization rate $r$ as a global parameter for each trial, since it depends on the neutralization efficacy of a bNAb at the concentration used in the trial. We will infer the patientspecific pretreatment fraction of resistant subpopulation $x$, using a population genetics based approach based on which we characterize the mutational target size and selection cost of escape in the absence of a bNAb (see below). The maximumlikelihood fits of $N(t)$ to the viremia measurements in Figures 2 and 3, Figure 3—figure supplements 1–3 specifies the initial resistant fraction x_{p} and the rebound time ${T}_{p}$ in each patient $p$, which in this simple model, is given by ${T}_{p}={\gamma}^{1}\mathrm{log}{x}_{p}$ (Methods).
The rebound times following passive infusion of 3BNC117 (Caskey et al., 2015 and 10–1077 Caskey et al., 2017) bNAbs range from 1 to 4 weeks, with a small fraction of patients exceeding the monitoring time window in the studies (late rebounds past 56 days). The distribution of rebound times summarizes the escape response of the virus to a therapy and directly relates to the distribution for the pretreatment fraction of resistant variants $P(x)$ across patients $P(T)\sim {x}^{1}P(x)$.
Stochastic evolutionary dynamics of HIV1 subject to bNAb therapy
The fate of an HIV1 population subject to bNAb therapy depends on the composition of the pretreatment population with resistant and susceptible variants, and the establishment of resistant variants following the treatment. To capture these effects, we construct an individualbased stochastic model for viral rebound (Figure 1B). We specify a coarsegrained phenotypic model, where a viral strain of type $a$ is defined by a binary state vector ${\overrightarrow{\rho}}^{a}=[{\rho}_{1}^{a},\mathrm{\dots},{\rho}_{\mathrm{\ell}}^{a}]$, with $\mathrm{\ell}$ entries for potentially escapemediating epitope sites; the binary entry of the state vector at the epitope site $i$ represents the presence (${\rho}_{i}^{a}=1$) or absence (${\rho}_{i}^{a}=0$) of an escape mediating mutation against a specified bNAb at this site of variant $a$. We assume that a variant is resistant to a given antibody if at least one of the entries of its corresponding state vector is nonzero. For multivalent treatment, a virus must be resistant to all antibodies comprising the treatment to have a positive growth after infusion.
At each generation, a virus with phenotype $a$ can undergo one of three processes: birth, death and mutation to another type $b$, with rates ${\beta}_{a}$, ${\delta}_{a}$, and ${\mu}_{a\to b}$, respectively (Figure 1B). The net growth rate of the viral subpopulation with phenotype $a$ is the birth rate minus the death rate, ${\gamma}_{a}={\beta}_{a}{\delta}_{a}$ (Figure 1C). The total rate of events (birth and death) per virion $\lambda ={\beta}_{i}+{\delta}_{i}$ modulates the amount of stochasticity in this birthdeath process (Methods), which we assume to be constant across phenotypic variants. The continuous limit for this birthdeath process results in a stochastic evolutionary dynamics for the subpopulation size ${N}_{a}$,
where $\eta (t)$ is a Gaussian random variable with mean $\u27e8\eta (t)\u27e9=0$ and correlation $\u27e8\eta (t)\eta ({t}^{\prime})\u27e9=\delta (t{t}^{\prime})$ (Methods). Here, f_{a} denotes the intrinsic fitness of variant $a$ and its net growth rate $\gamma $ is mediated by a competitive pressure $\varphi =\frac{1}{{N}_{k}}{\sum}_{b}{N}_{b}{f}_{b}$ with the rest of the population constrained by the carrying capacity ${N}_{k}$, such that ${\gamma}_{a}={f}_{a}\varphi $. In the presence of a bNAb, birth is effectively halted for susceptible variants and their death rate is set by the neutralization rate of the antibody, resulting in a net growth rate, ${\gamma}_{\text{sus.}}=r$. At the carrying capacity, the competitive pressure is the mean population fitness $\varphi =\overline{f}$, making the net growth rate of the whole population zero (Figure 1C). When susceptible variants are neutralized by a bNAb, the competitive pressure $\varphi $ drops, and as a result, the resistant variants can rebound to carrying capacity, at growth rates near their intrinsic fitness.
To connect the birthdeath model (Equation 2) with data, we should relate the simulation parameters of a birthdeath process to molecular observables. We have already made a connection between the birth and death rates of a variant and its intrinsic fitness in Equation 2. In addition, the neutral diversity $\theta $ of a population at steadystate can be expressed as $\theta =2{N}_{k}\mu /\lambda $, where $\mu \approx {10}^{5}/\text{day}$ is the pernucleotide mutation rate, which we infer from intrapatient longitudinal HIV1 sequence data (Zanini et al., 2015) (Methods). For consistency, we set the total rate of events $\lambda $ to be at least as large as the fastest process in the dynamics, which in this case is the growth rate of resistant viruses $\gamma \approx {(3\text{days})}^{1}$, we choose $\lambda ={(0.5\text{days})}^{1}$ (Methods). Therefore, the key parameters of the birthdeath model, that is, $\beta ,\delta ,\text{and}{N}_{k}$ can be expressed in terms of the intrinsic fitness of the variants f_{a} and the neutral diversity $\theta $, which we will infer from data.
Results
Population genetics of HIV1 escape from bNAbs
HIV1 escape from different bNAbs has been a subject of interest for vaccine and therapy design, and a number of escape variants against different bNAbs have been identified in clinical trials or in infected individuals (Lynch et al., 2015; Caskey et al., 2015; Scheid et al., 2016; Caskey et al., 2017; Baron et al., 2018). This invivo data is often complemented with information from cocrystallized structures of bNAbs with the HIV1 envelope protein (Pancera et al., 2017), and invitro deep mutational scanning (DMS) experiments, in which the relative change in the growth rate of tens of thousands of viral mutants are measured in the presence of different bNAbs (Dingens et al., 2019; Dingens et al., 2017; Schommers et al., 2020). We identify escape mutations against each of the bNAbs in this study by using information from clinical trials, the characterized binding sites, and the DMS assays (Methods); the list of escape mutations against each bNAb is given in Appendix 1—table 1.
The rise and establishment of an escape variant against a specific bNAb depend on three key factors, (i) neutral genetic diversity of the viral population, (ii) the mutational target size for escape from the bNAb (i.e. the number of paths leading to escape, weighted by their respective probabilities), and (iii) the intrinsic fitness associated with such mutations. Although viremia traces in clinical trials and growth experiments can be used to model the escape dynamics (Haddox et al., 2018; Lynch et al., 2015; Lu et al., 2016; Reeves et al., 2020; Saha and Dixit, 2020; Meijers et al., 2021), they do not offer a comprehensive statistical description for HIV1 escape as they are limited by the number of enrolled individuals. Alternatively, mutation and fitness characteristics of such escapemediating variants can be inferred from a broader cohort of untreated and bNAbnaive patients (Illingworth et al., 2020; Louie et al., 2018). We will infer statistical parameters for our coarsegrained fitness model (Figure 1E) from the large amount of highthroughput HIV1 sequence data from Zanini et al., 2015 (see Figure 2A and B for details) and use them to parameterize the birthdeath model (Figure 1B).
Diversity of the viral population
The neutral genetic diversity $\theta =2{N}_{k}\mu /\lambda $ (i.e. the number of segregating alleles) is an observable that relates to key population genetics parameters, that is, the pernucleotide mutation rate $\mu $, the population carrying capacity ${N}_{k}$, and the total number of events per virus in the birthdeath process $\lambda $, which determines the noise amplitude in the evolutionary dynamics (Methods). ${N}_{k}$ and $\lambda $ together determine the effective population size ${N}_{e}={N}_{k}/\lambda $. We use synonymous changes as a proxy for diversity associated with the neutral variation in an HIV1 population at a given time point within a patient. By developing a maximumlikelihood approach based on the multiplicities of different synonymous variants, we can accurately infer the neutral diversity of a population from the large survey of synonymous sites in the HIV1 genome (Methods and Figure 2—figure supplement 1). Importantly, we infer the neutral diversity of transition ${\theta}_{\text{ts}}$ and transversion ${\theta}_{\text{tv}}$ mutations separately, and consistent with previous work (Feder et al., 2016; Zanini et al., 2017), find that transitions occur with a rate of about 8 times larger than transversions (Figure 2C, Figure 2—figure supplement 1BE).
Our inference indicates that the neutral diversity grows over the course of an infection in untreated HIV1 patients from Zanini et al., 2015 (Figure 2D). The patients enrolled in the three bNAb trials (Caskey et al., 2015; Caskey et al., 2017; Baron et al., 2018) show a broad range of neutral diversity prior to bNAb therapy (Figure 2D). In addition to the circulating viruses in a patient’s sera, the viral reservoir, which consists of replicationcompetent HIV1 in latently infected cells or unsampled tissue, can also contribute to a bNAb escape in a patient. Evidence that the latent reservoir can contribute to HIV escape from bnAbs is directly visible in trials as the failure of pretrial sequencing to exclude patients who do not harbor escape variants. We model the effect of the reservoir as augmenting the neutral diversity by a constant multiplicative factor ${r}_{\text{resv.}}$, so that patients with more diverse sera, representing usually longer infections, are also expected to have correspondingly more diverse reservoir populations. By fitting the observed rebound data, we infer the reservoir factor ${r}_{\text{resv.}}\simeq 2.07$ (Methods, Figure 4—figure supplement 1). We use the augmented genetic diversity of HIV1 prior to the bNAb therapy in each trial to generate the rebound time and the probability of HIV1 escape in patients.
Mutational target size for escape
We define the mutational target size for escape from a bNAb as the number of trajectories that connect the susceptible codon to codons associated with escape variants, weighed by their probability of occurrences (Methods). The connecting paths with only single nucleotide transitions or transversions dominate the escape and can be represented as connected graphs shown in Figure 1D. To characterize the target size of escape for each bNAb, we determine the forward mutation rate $\mu \equiv {\mu}_{\text{sus.}\to \text{res.}}$ from the susceptible codons to the resistant (escape) codons, and the reverse mutation rate ${\mu}^{\u2020}\equiv {\mu}_{\text{sus.}\leftarrow \text{res.}}$ back to the susceptible variant (Figure 1D, Methods). The mutational target sizes vary across bNAbs, with HIV1 escape being most restricted from 10E8 ($\mu /{\mu}_{\text{ts}}=1.8$ where ${\mu}_{\text{ts}}$ is the singlenucleotide transition substitution rate) and most accessible in the presence of 10–1074 ($\mu /{\mu}_{\text{ts}}=4.9$); see Figure 4C and Appendix 1—table 1 for the list of mutational target size for escape against all bNAbs in this study.
Fitness effect of escape mutations
Since bNAbs target highly conserved regions of the virus, we expect HIV1 escape mutations to be intrinsically deleterious for the virus (Ferguson et al., 2013; Meijers et al., 2021), and incur a fitness cost relative to pretreatment baseline f_{0}. We assume that fitness cost associated with escape mutations are additive and backgroundindependent so the fitness of a variant $a$ in the absence of bNAb follows, ${f}_{a}={f}_{0}{\sum}_{i}{\mathrm{\Delta}}_{i}{\sigma}_{i}^{a}$, where ${\mathrm{\Delta}}_{i}$ is the cost associated with the presence of a escape mutation at site $i$ (i.e., for ${\sigma}_{i}^{a}=1$); see Figure 1D.
Interestingly, we observe the escape variants against different bNAbs to be circulating in the HIV1 populations from the cohort of ART and bNAbnaive patients (Zanini et al., 2015, Figure 2B). We use this data (Zanini et al., 2015) and extract the multiplicity of susceptible and escape variants in HIV1 populations at each sampled time point from a given patient. We use a single locus approximation under strong selection to represent the stationary distribution of the underlying frequency of escape alleles $x$ in each patient from (Zanini et al., 2015), $P(x;\sigma ,\theta ,{\theta}^{\u2020})\sim {x}^{1+\theta}{(1x)}^{1+{\theta}^{\u2020}}\mathrm{exp}[\sigma x]$, given the (scaled) fitness difference between the susceptible and the escape variants $\sigma =2{N}_{e}({f}_{\text{sus}}{f}_{\text{mut}})$; see Methods.
Based on the statistics of escape and susceptible variants in all patients, we define a likelihood function that determines a Bayesian posterior for selection $\sigma $ associated with escape at each site (Methods). We found that it is statistically more robust to infer the strength of selection relative to a reference diversity measure $\sigma /{\theta}_{\text{ts}}=({f}_{\text{sus.}}{f}_{\text{res.}})/{\mu}_{\text{ts}}$, for which we choose the transition rate (Methods). This approach generates unbiased selection estimates in simulations and is robust to effects of linkage and recombination (Methods and Model robustness Figure 4—figure supplement 1, Figure 4—figure supplement 2). The inferred values of the scaled fitness costs $\sigma /{\theta}_{\text{ts}}$ are shown for the escapemediating sites of the trial bNAbs in Figure 2E, and are reported in Appendix 1—table 1.
Predicting the efficacy of bNAb therapy in clinical trials
Monotherapy trials with 10–1074 (Caskey et al., 2017, 3BNC117 Caskey et al., 2015,PGT121 Stephenson et al., 2021), and the combination therapy with 10–1074+3BNC117 in Baron et al., 2018 have shown variable outcomes. In some patients, bNAb therapy did not suppress the viral load, whereas in others suppression was efficient and no rebound was observed up to 56 days after infusion (end of surveillance in these trials); see Figure 3A for examples of patients with different rebound times, Figure 3—figure supplements 1–3 for the viremia traces in all patients, and Figure 3B and C for the distributions and the summary statistics of the rebound times in patients in different trials.
Although we infer a large intrinsic fitness cost for a virus to harbor an escape allele (Figure 2E), these variants can emerge or already be present due to the large intrapatient diversity of HIV1 populations (Figure 2C), or a larger mutational target size for these escape variants. Deep sequencing data in untreated (likely bNAbnaive) patients shows circulation of resistant variants against a panel of bNAbs in the majority of patients (Figure 2B). Our goal is to predict the efficacy of a bNAb trial, using the fitness effect and the mutational target size for escape from a given bNAb, both of which we infer from the highthroughput HIV1 sequence data collected from bNAbnaive patients in Zanini et al., 2015 (Figure 2E, Appendix 1—table 1). In addition, we modulate these measures with the patientspecific neutral diversity $\theta $ inferred from whole genome sequencing of HIV1 populations in each patient prior to bNAb therapy (Figure 2D). These quantities parametrize the birthdeath process for viral escape in a bNAb therapy (Figure 1A), which we use to characterize the distribution of rebound times in a given trial (Methods).
For both the 3BNC117 and the 10–1074 trials (Caskey et al., 2015; Caskey et al., 2017), we see an excellent agreement between our predictions of the rebound time distribution and data; see Figure 3B, and Methods and Figure 4—figure supplement 3 for statistical accuracy of this comparison. For the PGT121 trial, the genomic data from patients’ HIV1 populations are insufficient and therefore, we used the neutral diversity ${\theta}_{\text{ts}}$ estimated from the other three trials to predict the associated rebound time distribution (Appendix 4). Still, we see a good agreement between our predictions of the rebound time distribution and data; see Appendix 4—figure 1.
By assuming an additive fitness effect for escape from 10 to 1074 and 3BNC117, we also accurately predict the distribution of rebound times in the combination therapy (Baron et al., 2018, Figure 3B). The agreement of our results with data for combination therapy is consistent with the fact that the escape mediating sites from 10 to 1074 and 3BNC117 are spaced farther apart on the genome than 100 bp, beyond which linkage disequilibrium diminishes due to frequent recombination in HIV (Zanini et al., 2015). Importantly, in all the trials, our evolutionary model accurately predicts the fraction of participants for whom we should expect a late viral rebound (more than 56 days past bNAb infusion)—the quantity that determines the efficacy of a treatment.
Apart from the overall statistics of the rebound times, our stochastic model also enables us to characterize the relative contributions of the pretreatment standing variation of the HIV1 population versus the spontaneous mutations emerging during a trial to viral escape from a given bNAb. Given the large population size of HIV1 and a high mutation rate ($\mu ={10}^{5}$ per generation), spontaneous mutations generate a fraction $x}^{(\mu )$ of resistant variants during a trial, which we can express as,
In the best case scenario, there are no resistant virions prior to treatment i.e., $x(0)=0$. Since the neutralization rate $r$ and the growth rate $\gamma $ are comparable, this deterministic approach predicts that mutations can generate a resistant fraction of ${x}^{(\mu )}\approx {10}^{5}$ during a trial. However, stochastic effects from random birth and death events play an important role in the fate and establishment of these resistant variants. The probability of extinction for a variant at frequency $x$ can be approximated as $p(\text{extinct})\approx 1{e}^{x/{x}_{\text{ext}}}$ (see Extinction Probability); here ${x}_{\text{ext}}=\frac{{\mu}_{ts}}{\gamma {\theta}_{ts}}\approx {10}^{4}$ and a variant with fraction $x$ that falls below this critical value is likely to go extinct (Figure 3D). Since the total integrated mutational flux fraction during a trial is ${x}^{(\mu )}\sim {10}^{5}$, mutational flux rarely decides the outcome of patient treatment. Indeed, we infer that spontaneous mutations contribute to less than 4% of escape events in all the three trials and escape is primarily attributed to the standing variation from the serum or the reservoirs prior to therapy (Figure 3E). A similar conclusion was previously drawn based on a mechanistic model of escape in VRC01 therapy trials (Saha and Dixit, 2020).
Devising optimal bNAb therapy cocktails
Clinical trials with bNAbs have been instrumental in demonstrating the potential role of bNAbs as therapy agents and in measuring the efficacy of each bNAb to suppress HIV. Still, these clinical trials can only test a small fraction of the potential therapies that can be devised. It is therefore important that trials test therapies that have been optimized based on surrogate estimates of treatment efficacy. The accuracy of our predictions for the rebound time of a HIV1 population subject to bNAb therapy suggests a promising approach to the rational design of therapies based on genetic data of HIV populations collected from bNAbnaive patients.
Here, we use viral genome sequences to infer the efficacy of therapies with bNAbs, for which clinal trials are not yet performed. To do so, we first need to identify the routes of HIV escape from these bNAbs. We use deep mutational scanning data (DMS) on HIV1 subject to 9 different bNAbs from Dingens et al., 2019 together with information from literature to identify the escape mediating variants from each of these bNAbs (Methods and Table S1). We then determine the mutational target size and the fitness effect of these escape variants using highthroughput sequences of HIV1 in bNAbnaive patients from Zanini et al., 2015; these inferred values are reported in Figure 4A and B and Table S1. Using the inferred fitness and mutational parameters and by setting the pretreatment neutral diversity $\theta $ to be comparable to that of the patients in the previous three trials with 10–1074 and 3BNC117 (Figure 2C), we simulate treatment outcomes for these 9 bNAbs and their twofold and threefold combinations.
Interestingly, we infer that escape from monotherapies is almost certain and a combination of at least three antibodies is necessary to limit the probability of early rebound (<56 days) to below 1% (Figure 4C). When considering each of the nine antibodies, interesting patterns emerge. We find that the mutational target size and the fitness cost of escape, estimated as the harmonicmean selection cost of individual sites, obey a roughly linear relationship (Figure 4B). As all these bNAbs have similar overall breadth (i.e. they neutralize over 70% of panel strains), this result suggests that for an antibody to be broad, its escape mediating variants should either be rare (i.e. small mutational target size) or intrinsically costly (i.e. incurring a high fitness cost), but it is not necessary to satisfy both of these requirements. For instance, we find that resistant variants against 10E8 weaker negative selection but escape target size is small, while PGT151 has a larger escape target size but makes up for it by having resistant variants with unavoidably high fitness cost.
The fitnesslimited versus the mutationlimited strategies have different implications for the design of combination cocktails. The small mutational target size of 10E8 makes it the best candidate antibody for monotherapy among the antibodies we consider because the escape variants against this antibody are less likely to circulate in a patient’s serum prior to treatment. However, in combination, 10E8 appears less often in top ranked therapies than PGT151. PGT151 is unremarkable on its own because of a relatively large target size, but the high cost of escape makes it especially promising in combination therapies. Overall, fitnesslimited bNAbs like PGT151 are more effective against high diversity viral populations, while mutationlimited bNAbs such as 10E8 are more effective against low diversity viral populations. Indeed, the best ranked therapy, namely the combination of PG9, PGT151, and VRC01, combines antibodies that target different regions of the virus and also have both types of fitness and mutationlimited strategies for coverage against the full variability of viral diversities found in pretreatment individuals (Figure 4C) participating in the clinical trials.
Escape from bNAbs in multivalent therapy is also influenced by the nonindependence of escape pathways. Several of the bNAbs (e.g. 10–1074 and PG121) in this study target the same (or structurally adjacent) epitopes and escape from them is mediated by the same mutations. The sharing of mutational pathways invalidates the assumption of independence and may make our predictions too optimistic. However, the best performing antibody combinations in Figure 4C target multiple epitopes to reduce the chances of collective escape. Therefore, the assumptions of independent fitness effects and independent mutational pathways in this case are not consequential for the main predictions of our model (i.e., the choice of bNAb combinations). In vitro data shown in Kong et al., 2015 suggest that additive and independent effects are the norm, but that small but consistent synergistic effects may imply that our assumption of sitewise independence is conservative. More data on HIV escape from combinations of bNAbs would be informative for further modeling efforts and relevant for longterm therapy design.
Discussion
HIV therapy with passive bNAb infusion has become a promising alternative to antiretroviral drugs for suppressing and preventing the disease in patients without a need for daily administration. The current obstacle is the frequent escape of the virus seen in mono and even combination bNAb therapy trials (Caskey et al., 2015; Bar et al., 2016; Caskey et al., 2017; Baron et al., 2018). The key is to identify bNAb cocktails that can target multiple vulnerable regions on the virus in order to reduce the likelihood for the rise of resistant variants with escapemediating mutations in all these regions. Identifying an optimal bNAb cocktail can be a combinatorially difficult problem, and designing patient trials for all the potential combinations is a costly pursuit.
Here, we have proposed a computational approach to predict the efficacy of a bNAb therapy trial based on population genetics of HIV escape, which we parametrize using highthroughput HIV1 sequence data collected from a separate cohort of bNAbnaive patients (Zanini et al., 2015). Specifically, we infer the mutational target size for escape and the fitness cost associated with escapemediating mutations in the absence of a given bNAb. These quantities together with the neutral diversity of HIV1 within a patient parametrize our stochastic model for HIV dynamics subject to bNAb infusion, based on which we can accurately predict the distribution of rebound times for HIV in therapy trials with 10–1074, 3BNC117 and their combination, as well as a trial with PGT121 bNAb. Consistent with previous work on VRC01 (Saha and Dixit, 2020), we found that viral rebounds in bNAb trials are primarily mediated by the escape variants present either in the patients’ sera or their latent reservoirs prior to treatment, and that the escape is not likely to be driven by the emergence of spontaneous mutations that establish during the therapy.
One key measure of success for a bNAb trial is the suppression of early viral rebound. Our model can accurately predict the rebound times of HIV1 subject to three distinct therapies (Caskey et al., 2015; Caskey et al., 2017; Baron et al., 2018), based on the fitness and the mutational characteristics of escape variants inferred from highthroughput HIV1 sequence data. This approach enables us to characterize routes of HIV1 escape from other bNAbs, for which therapy trials are not available, and to design optimal therapies. We used deep mutational scanning data Dingens et al., 2019 to identify escapemediating variants against 9 different bNAbs for HIV. Our genetic analysis shows that bNAbs gain breadth and limit viral escape either due to their small mutational target size for escape or because of the large intrinsic fitness cost incurred by escape mutations. bNAbs with mutationlimited strategy are more effective at preventing escape in patients with low viral genetic diversity, while bNAbs with selectionlimited strategy are more effective at high viral diversity.
Combination therapy with more than two bnAbs (or drugs in ART) has long been shown to be more effective in suppressing early viral rebound, both in theory and practice (Perelson et al., 1996; Feder et al., 2016; Wagh et al., 2016; Klein et al., 2012; Mendoza et al., 2018; Yu et al., 2019). In addition to corroborating this conclusion quantitatively, we provide a method for assessing new bnAbs for which escape mutations are known. Our method can be understood as a tool to navigate the combinatorial explosion of higher order cocktails for which we cannot possibly test all combinations. By assessing the evolvability of resistance against different combinations we can identify the best therapies to target for clinical trial. Specifically, we show that to suppress the chance of viral rebound to below 1%, we show that a combotherapy with 3 bNAbs with a mixture of mutation and selectionlimited strategies that target different regions of the viral envelope is necessary. Such combination can counter the full variation of viral diversity observed in patients. We found that PG9, PG151, and VRC01, which respectively target V2 loop, Interface, and CD4 binding site of HIV envelope, form an optimal combination for a 3bNAb therapy to limit HIV escape in patients infected with clade B of the virus.
The statistical agreement between our coarsegrained model and the observed distribution of the viral rebound times in trials (Figure 3B) implies that many of the mechanistic details are of secondary importance in predicting viral escape. Nonetheless, our approach falls short of reproducing the detailed characteristics of viremia traces in patients, especially at very short or very long times, during which the dynamics of Tcell response or the decay of bNAbs could play a role (Lu et al., 2016; Reeves et al., 2020; Saha and Dixit, 2020). The relationship between the shortterm suppression of the virus, which is the focus of this analysis, and the longterm treatment success is complicated by reestablishment of HIV from the latent reservoirs and different modes of intrahost HIV evolution (Liu et al., 2019; Margolis et al., 2017).
One strategy to achieve a longer term treatment success is by combining bNAb therapy with ART. One main advantage of bNAb therapy is the fact that it can be administered once every few months, in contrast to ART, which should be taken daily and missing a dose could lead to viral rebound. Although multivalent bNAb therapy reduces the chances of shortterm viral escape, viral escape remains a real obstacle for longer term success of a treatment with bNAbs. Alternatively, (fewer) bNAbs can be administered in combination with ART (Horwitz et al., 2013; Gruell and Klein, 2018), whereby ART could lower the replication rate of the HIV population, reducing the viral diversity and the chances of viral escape. Specifically, we can expect that emergence and establishment of rare (i.e., strongly deleterious) escape variants against bNAbs to be less likely in ART+ patients, which suggests that fitnesslimited bNAbs should be more effective in conjunction with ART. More data would be necessary to understand the longterm efficacy of such augmented therapy, and specifically the role of viral reservoirs in this context. A modeling approach could then shed light on how ART administration and bNAb therapy could be combined to efficiently achieve viral suppression.
Limitations
We rest our analysis primarily on the predictive power of the observed variant frequencies in the untreated patients. Our model weighs these frequencies with respect to the viral diversity in a mathematically and biologically consistent way. However, we ignore the dynamics of antibody concentration and IC_{50} neutralization during treatments, the details of Tcell dynamics during infection (Perelson, 2002), and also the evolutionary features of the genetic data, such as epistasis between loci (Bonhoeffer et al., 2004; Zhang et al., 2020, genetic linkage Zanini et al., 2015, and codon usage bias Meintjes and Rodrigo, 2005).
In our model of viral escape, we neglect the possibility of incomplete escape of the virus due to the reduced neutralization efficacy of bNAbs as their concentrations decay during trials. In Appendix 3, we show that this simplifying assumption is valid as long as the IC_{50} is not the same order of magnitude as the initial dosage concentration of the infused bNAb. Notably, the data from therapy trials used in this study fall into the regime for which we can neglect the impact of incomplete neutralization (Appendix 3—figure 2). However, taking into account the dependence of viral fitness on bNAb concentration and its neutralization efficacy, as in the model proposed by Meijers et al., 2021, could improve the longterm predictive power of our approach.
Our predictions are limited by our ability to identify the escape variants for each bNAb, either based on the in vivo trial data and patient surveillance, or invitro assays such as DMS experiments. In Appendix 4, we compare the accuracy of our predictions for the rebound time distributions of HIV using DMSinferred versus trialinferred escape variants against the 10–1074 and the PGT121 bNAbs, and we find good agreements between the two approaches in both cases. However, it should be noted that identifying escape variants from DMS experiments lead to a more optimistic prediction for treatment success during the first 8 weeks of trials (i.e. they suggest a later rebound). One reason for this discrepancy may be related to the distinct genetic composition of viruses in the two approaches. DMS experiments characterize HIV escape by introducing mutations on a single genetic background (Dingens et al., 2019; Dingens et al., 2017; Schommers et al., 2020) (e.g. the HIV1 strain BF520.W14M.C2 in Dingens et al., 2017), whereas clinical data contain diverse populations of viruses between and within individuals. It is more likely for diverse viral populations to contain variants in which positive epistasis between escape mutations and the background genome is present. Therefore, it is reasonable to expect that genetic data originating in clinical trials show more pathways of escape, resulting in a faster viral rebound following bNAb infusion. Nevertheless, DMS experiments are less costly for identifying escape variants compared to trials, and they provide a baseline to assess the efficacy of different bNAbs for therapy and further invivo investigations. More experiments would be necessary for a more systematic understanding of the limitations of each approach.
It should be noted that our analysis in Figure 4C only focuses on one aspect of therapy optimization, that is, the suppression of escape. Other factors, including potency (neutralization efficacy) and halflife of the bNAb, or the patient’s toleration to bNAbs at different dosages should also be taken into account for therapy design. For example, the bNAb 10E8, which we identified as of the most promising monotherapy candidates in Figure 4C, is shown to be poorly tolerated by patients with short halflife (Kwon et al., 2016), making it undesirable for therapy purposes. Thus, the bNAb candidates shown in Figure 4C should be taken as a guideline to be complemented with further assessment of efficacy and safety for therapy design.
Outlook
Our approach showcases that, when feasible, combining highthroughput genetic data with ecological and population genetics models can have surprisingly broad applicability, and their interpretability can shed light into the complex dynamics of pathogens subject to therapy. Application of similar methods to therapy design to curb the escape of cancer tumors against immune or chemotherapy, the resistance in bacteria against antibiotics, or the escape of seasonal influenza against vaccination is a promising avenue for future work. However, we expect that more sophisticated methods for inferring fitness from evolutionary trajectories may be necessary to capture the dynamical response of these populations.
Methods
Data and code accessibility
The code for the algorithms used in this work and the data are available on GitHub at https://github.com/StatPhysBio/HIVTreatmentOptimization (copy archived at swh:1:rev:0194cf6e554996a066633e99dd53cd5901da552e, Chelate, 2022a) and in the Julia package https://github.com/StatPhysBio/EscapeSimulator (copy archived at swh:1:rev:9a343f598820bafddfc7ea4547cefa90bf96fd6e, Chelate, 2022b).
Description of molecular data
Data from bNAb trials
In this study, we considered four clinical trials for passive therapy with bNAbs:
Monoclonal therapy with 3BNC117 bNAb Caskey et al., 2015 and sequences reported in Schoofs et al., 2016. There were 16 patients enrolled, 13 of whom were off antiretroviral therapy (ART).
Monoclonal therapy with 10–1074 bNAb Caskey et al., 2017, with 19 patients enrolled, 16 of whom were off ART.
Combination therapy with 10–1074+3BNC117 bNAb Baron et al., 2018, with 7 patients off ART.
Monoclonal therapy with PGT121 with 17 patients off ART. Stephenson et al., 2021. Far fewer viral sequences were available for this trial, and therefore, we could not reliably infer the neural diversity of viruses within patients, as we did for other trials. We only used this trial to compare our predictions based on the DMSinferred with the trialinferred escape pathways in Appendix 4.
All sequenced patients across all trials were infected with distinct HIV1 clade B viral strains. We limited our analyses to those patients not on ART at the time of treatment initiation. In these studies, the injected bNAb level falls off over time within patients and therefore, we only considered dynamics within an 8week window since infusion. This assures that rebound is not confounded by a drop in bNAb below sensitivestrain neutralizing levels of $\text{IC}}_{{50}_{S}}<2\mu \text{g/m$.
We used singlegenome sequence data of env collected from all patients in each trial to characterize the diversity of HIV1 population within each patient shown in Figure 2 available from Zanini et al., 2015 and accessible through European Nucleotide Archive (Accession no: PRJEB9618). The patient sequence data for each trial is available through Caskey et al., 2015 GeneBank accession number: KX016803, Caskey et al., 2017 GenBank accession numbers KY323724.1  KY324834.1, and Baron et al., 2018, GenBank accession numbers MH632763  MH633255.
Longitudinal HIV1 sequence data from untreated patients
Single nucleotide polymorphism (SNP) data was obtained from Zanini et al., 2015 and aligned to the HXB2 reference using HIValign tool (Gaschen et al., 2001). The dataset includes 11 patients observed for 5–8 years of infection, with HIV1 sequence data sampled over 6–12 time points per patient (Figure 2).
Patient 4 and patient 7 were excluded from the original analysis done in Zanini et al., 2015 because of suspected superinfection and failure to amplify early samples, respectively. These patients were included in our analysis, since (i) superinfection poses no additional difficulties for our treefree procedure, and (ii) only time points with measurable viral diversity entered into our selection likelihood, which automatically limits our analysis to samples with highquality sequences.
All patients were infected with clade B of HIV, except for patient 6 (clade C), and patient 1 (clade 01_AE). We assessed the robustness of our inference to exclusion of these patients from our analysis in Effect of genomic linkage on the inference of selection. Overall, our inference was not strongly affected by this choice (Model robustness Figure 4—figure supplement 1), and therefore, we included these patients in our main analyses to enhance the statistics with larger data.
For our analysis we considered only data reported in single nucleotide polymorphism (SNP) counts. The number of raw SNP counts are the result of amplification and must be converted into estimates for the number of preamplification template fragments. Zanini et. al. reported that on average about 10^{2} templates of amplicon were associated with the fragments of the envelope (env) protein Zanini et al., 2015. We converted raw SNP counts into templates by normalizing to 120 counts and rounding the resulting number to the nearest integer.
Identifying escapemediating variants against bNAbs
The starting point for our analysis of HIV response to a given a bNAb is the description of the escapemediating amino acids in the HIV env protein. We use a combination of methods to identify the escape variants for a given bNAb. First, we use deep mutational scanning (DMS) data of HIV1 in the presence of a bNAb from Dingens et al., 2019 to identify these mutations. These DMS experiments have created libraries of all single mutations from a given genomic background of HIV1 and tested the fitness of these variants (i.e. growth on Tcell culture) in the absence and presence of nine different bNAbs, including the 10–1074 and 3BNC117 Dingens et al., 2019.
Escape variants in DMS data are identified as those which are strongly selected for only in the presence of a bNAb. Specifically, we identify escape variants as those which show 3logschange in their frequency in the presence versus absence of a bNAb. DMS data reflects in vitro escape in cell culture. However, some of these variants may not be viable in vivo. To identify the reasonable candidates of escape in vivo, we limit our set to the variants that are also observed in the circulating viral strains of untreated HIV1 patients from Zanini et al., 2015. It should be noted that since we use HIV1 sequence data from Zanini et al., 2015 to infer selection on escape mediating variants in the absence of a bNAb, the candidates of escape that are not observed in the dataset Zanini et al., 2015 would be inferred to be strongly deleterious, and hence, unlikely to contribute to our predictions of viral rebound.
Our analysis of DMS data results in a set of escape mediating amino acids for 10–1074 that is consistent with the escape variants that emerge in response to the bNAb trial (Caskey et al., 2017; Stephenson et al., 2021 Appendix 4). However, the DMS data is very noisy for bNAbs that target CD4 binding site of HIV, that is, 3BNC117 and VRC01 (Dingens et al., 2019). One reason for this observation may be that the CD4 binding site is crucial for the entry of HIV to the host’s Tcells and mutations in this region are highly deleterious. As a results, only a small number of variants with mutations in this region can survive in the absence of a bNAb in a DMS experiment. Growth in the absence of a bNAb is the first step in the DMS experiments, which is then followed by exposure of the replicated variants to a bNAb. Therefore, a low multiplicity of variants in the absence of a bNAb could result in a noisy pattern of growth of the small subpopulation in the next stage of the experiment, in which growth is subject to a CD4targeting bNAb.
For the CD4 binding site antibodies 3BNC117 and VRC01, we used additional data to call the escape variants. For 3BNC117, we used a combination of trialpatient sequences (Caskey et al., 2015; Scheid et al., 2016), that is posttreatment enrichment, along with contact site information compiled in the crystallographic studies to narrow down candidate sites (Zhou et al., 2015; LaBranche et al., 2018). For VRC01, we assumed a similar escape pattern to 3BNC117 but included sites known from other studies (Lynch et al., 2015) and the clear DMS signal at HXB2 site 197. The sites we called were similar to those identified using humanizedmouse models of HIV infection (Horwitz et al., 2013), although more complex mutational patterns were seen in the softrandomization scanning of Otsuka et al., 2018. Although the complete list of escape substitutions are unknown and backgrounddependent (Otsuka et al., 2018), the escape profiles which are most important are those that are most likely to be seen consistently in data and to be correctly identified. The list of substitutions are shown in Appendix 1—table 1.
Statistics and dynamics of viral rebound
Inference of growth parameters from dynamics of viremia
The concentration of viral RNA copies in blood serum is a delayed reflection of the total viral population size $N(t)$, containing a resistant and susceptible subpopulations, with respective sizes ${N}_{r}(t)$ and ${N}_{s}(t)$. After infusion of bNAbs in a patient, the susceptible subpopulation decays due to neutralization by bNAbs and the resistant subpopulation grows and approaches the carrying capacity ${N}_{k}$, with the dynamics,
Here, $\gamma $ is the growth rate of the resistant population, and $r$ is the neutralization rate impacting the susceptible subpopulation. By setting the initial condition for fraction of resistant subpopulation prior to treatment (at time $t=0$) $x={N}_{r}(0)/({N}_{r}(0)+{N}_{s}(0))$, we can characterize the evolution of the total viremia in a patient. This dynamics is governed by the combined processes of neutralization by the infused bNAb and the viral rebound (Figure 1A), which entails,
We use Equation 5 to define the evolution of blood concentration of viral RNA sequences which is observed indirectly via noisy viremia measurement data from Caskey et al., 2015; Caskey et al., 2017; Baron et al., 2018. To connect the data with the simple model of viral dynamics in Equation 5, we fit the initial frequency of resistant mutants $x$ for each patient separately, and fit a global estimate for the decay rate of susceptible variants $r$ shared across all patients in a trial, using a joint maximumlikelihood procedure. In addition, we fix the growth rate $\gamma $ to 1/3 days, corresponding to a doubling time of approximately 2 days García et al., 1999. Our analyses indicate that the initial viremia decline lagged treatment by about 1 day (Figure 3—figure supplements 1–3), consistent with previous findings (Ioannidis et al., 2000), and therefore, we included a 1 day lag between the fitted viremia response model and the treatment.
$t$ The number of viral RNA copies in a blood sample is subject to count fluctuations with respect to the true number of circulating virions in a given volume of the blood. We use a Poisson sampling model to define a likelihood for our model of viral population. The likelihood of observing viral counts in a sample collected from patient at time is given by a Poisson distribution,
with rate parameter set by the model value of the viral multiplicity ${N}_{p}(t)$ (Equation 5). We use the Poisson likelihood in Equation 6 to characterize an error model to fit the parameters of the viral dynamics in Equation 5. However, since the mean and variance of the Poisson distribution are related, combining data with different mean values ${N}_{p}(t)$ at different times and from different patients can cause inconsistencies in evaluations of errors in our fits. To overcome this problem, we use a variance stabilizing transformation (McCullagh and Nelder, 2019) and define a change in variable ${\widehat{n}}_{p}(t)=\sqrt{{n}_{p}(t)}$. This transformed variable has a constant variance, and in the limit of largesample size, it is Gaussian distributed with a mean and variance given by, ${\hat{n}}_{p}(t)\sim \mathcal{N}(\sqrt{\lambda},1/4)$. The constant variance of the transformed variable enables us to combine data from all patients and time points, irrespective of the sample’s viral loads, and fit the model parameters $(r,{x}_{p})$ using (nonlinear) leastsquares fitting of the function
Here, ${N}_{p}(tr,{x}_{p})$ is the model estimate of viremia in patient $p$ at time $t$ (Equation 5), given the pretreatment fraction of resistant variants x_{p}, and the decay rate $r$.
Note that the viremia measurements have a minimum sensitivity threshold of 20 RNA copies per ml. We treat the data points below the threshold of detection as missing data and if ${n}_{p}(t)$ is below the threshold of detection we impute ${n}_{p}(t)=\text{min}(20,{N}_{t})$.
The fitted viremia curves for patients enrolled in the three bNAb trials under consideration are shown in Figure 3—figure supplements 1–3, and the respective decay rates $r$ for each experiment are,
Individualbased model for viral population dynamics
To encode for different viral variants, we specify a coarsegrained phenotypic model, where a viral strain of type $a$ is defined by a binary state vector ${\overrightarrow{\rho}}^{a}=[{\rho}_{1}^{a},\mathrm{\dots},{\rho}_{\mathrm{\ell}}^{a}]$, with $\mathrm{\ell}$ entries for potentially escapemediating epitope sites; the binary entry of the state vector at the epitope site $i$ represents the presence (${\rho}_{i}^{a}=1$) or absence (${\rho}_{i}^{a}=0$) of a escape mediating mutation against a specified bNAb at site $i$ of variant $a$. We assume that a variant is resistant to a given antibody if at least one of the entries of its corresponding state vector is nonzero.
We define an individualbased stochastic birthdeath model (Wilkinson, 2019) to capture the competitive dynamics of different HIV variants within a population. This dynamic model will allow us to predict the distribution of rebound times under any combination of antibodies.
We assume that a viral strain of type $a$ can undergo one of three processes: birth, death and mutation to another type $b$ with rates ${\beta}_{a}$, ${\delta}_{a}$, and ${\mu}_{a\to b}$, respectively:
We specify an intrinsic fitness f_{a} for a given variant a, defined as the growth rate of the virus in the absence of neutralizing antibody or competition. Since bNAbs target highly vulnerable regions of the virus, we expect that HIV1 escape mutations to be in trinsically deleterious for the virus and to confer a fitness cost relative to the susceptible viral variants prior to the infusion of bNAbs. Assuming that fitness cost of escape is additive across sites and backgroundindependent, we can express the fitness of a variant as , where Δ is the cost associated with the presence of an escape mutation at site of variant (i.e., for = 1).
We assume that growth is selflimiting via a competition for host Tcells. This competition enforces a carrying capacity, which sets the steadystate population size ${N}_{k}$. Competition is mediated through a competitive pressure term $\varphi =\frac{{\sum}_{a}{N}_{a}{f}_{a}}{{N}_{K}}$ which attenuates the net growth rate ${\gamma}_{a}$ so that ${\gamma}_{a}={f}_{a}\varphi $. At the carrying capacity, the competitive pressure equals the mean population fitness $\varphi =\overline{f}$, making the net growth rate of the population zero.
The net growth rate of a variant $a$ is given by its birth rate minus the death rate: ${\gamma}_{a}={\beta}_{a}{\delta}_{a}$. We assume that the total rate of events (i.e. the sum of birth and death events) is equal for all types, that is, $\lambda ={\beta}_{a}+{\delta}_{a},\forall a$. Assuming that $\lambda $ is constant is to be agnostic about the mechanism of a fitness decrease, attributing fitness loss equally to (i) an increase in the death rate and (ii) a decrease in the birth rate.
Because the absolute magnitude of $\beta $ and $\delta $ asymptotically converge in the continuum limit for a surviving population, that is, ${lim}_{N\to \mathrm{\infty}}\beta /\delta =1$, it is impossible to distinguish between (i) and (ii) in the continuous limit. Choosing constant $\lambda $ simplifies both theoretical calculations and the simulation algorithm.
This leads to the following equations for the birth and the death rates:
In the presence of an antibody, birth is effectively halted for susceptible variants, resulting in birth and death rate for a susceptible variant $s$,
so that the susceptible phenotype decays at rate $r$.
We assume that mutations occur independently at each site,
where ${1}_{s}$ is the vector which has only one nonzero entry at site $i$, and ${\mu}_{i}$ and ${\mu}_{i}^{\u2020}$ are the forward the backward mutation rates at site $i$, respectively. We characterize the state of a population by vector $\mathit{n}=({n}_{1},\dots {n}_{M})$, where n_{a} is the number of type $a$ variants within the population. We approximate the evolution of the population state distribution using a FokkerPlanck equation (see Appendix 2).
Of special interest for our analysis is the steadystate density of frequencies, which we use to describe the initial state of the population (before treatment) and to infer selection intensity. In the steady state, the population is fluctuating around carrying capacity ${\sum}_{a}{n}_{a}\approx {N}_{k}$ and we can represent the population state via allele frequencies ${x}_{a}={n}_{a}/{N}_{k}$. In the simple case of a biallelic problem, the equilibrium allele frequency distribution ${P}_{\text{eq}}(x)$ follows the Wrightequilibrium distribution (Crow and Kimura, 2010) with modified rates,
where $Z$ is the normalization factor, f_{1} is the intrinsic fitness of the variant of interest, f_{0} is the fitness of the competing variant, $\mu $ and ${\mu}^{\u2020}$ are the forward and backward mutation rates, ${N}_{k}$ is the carrying capacity, and $\lambda $ is the total rate of events in the birthdeath process, which sets a characteristic time scale over which the impact of selection and mutations can be measured. In this case, we can define an ‘effective population size’ that sets the effective size of a bottleneck and the natural time scale of evolution as ${N}_{e}={N}_{k}/\lambda $, and specify a scaled selection factor $\sigma ={N}_{e}s={N}_{e}({f}_{0}{f}_{1})$, and scaled forward mutation and backward mutation rates (diversity) $\theta =2{N}_{e}\mu $, and ${\theta}^{\u2020}=2{N}_{e}{\mu}^{\u2020}$. The normalization factor is given by,
where ${}_{1}F_{1}(\cdot )$ denotes a Kummer confluent hypergeometric function and $\mathcal{B}(\theta ,{\theta}^{\u2020})=\frac{\mathrm{\Gamma}[\theta ]\mathrm{\Gamma}[{\theta}^{\u2020}]}{\mathrm{\Gamma}[\theta +{\theta}^{\u2020}]}$ is the Euler beta function.
Extinction Probability
The logistic dynamics describing a patient’s viremia over time in Equation 5 is the deterministic approximation to the underlying birthdeath process. However, the resistant population can also go extinct due stochastic effects, which in turn contribute to the probability of late rebound in a population. To capture this effect, we derive an approximate closed form expression for the probability of extinction.
Using the standard birthdeath process generating function theory Allen, 2010 the probability $P(\text{extinct}{n}_{i})$ that a population consisting of n_{i} resistant variants of type $i$ go extinct can be expressed as,
To characterize the probability of extinction for a population of size ${N}_{k}$ with pretreatment fraction of ${i}^{th}$ resistant variants x_{i}, we can convolve the extinction probability in Equation 14 with a Binomial probability density for sampling n_{i} resistant variants from ${N}_{k}$ trials. Given that the pretreatment fraction of resistant variants is small ${x}_{i}\ll 1$ and ${N}_{k}$ is large, this Binomial distribution can be well approximated by a Poisson distribution, $\text{Poiss}({n}_{i};{N}_{k}{x}_{i})$ with rate ${N}_{k}{x}_{i}$, resulting in an extinction probability,
Using the expressions for the growth in the absence of competition, ${\beta}_{i}{\delta}_{i}={\gamma}_{i}={f}_{i}$ (since $\varphi =0$), and assuming that fitness is small relative to the total rate of birth and death events ${f}_{i}\ll \lambda $, we can use the approximation ${\beta}_{i}=(\lambda +{f}_{i})/2\approx \lambda /2$, to arrive at,
${x}_{\text{ext}}$ where the characteristic escape threshold can be written in terms of concrete genetic observables,
Figure 3 shows that this threshold can well separate the fate of stochastic evolutionary trajectories, simulated with relevant parameters for intrapatient HIV evolution.
Numerical simulations of the birthdeath process
To treat the full viral dynamics including mutations, and transient competition effects, we can exactly simulate the viral dynamics defined by our individual based model. Below are the key steps in this simulation.
Population initialization
At the starting point, we set the population size (i.e. the carrying capacity in the simulations) ${N}_{k}$ as a free parameter chosen to be large enough to make discretization effects small. The population is then evolved through time using an exact stochastic sampling procedure (the Gillespie algorithm Gillespie (1977)). Simulating the outcome of this stochastic evolution generates the distribution of rebound times and the probability of late rebound—the key quantities related to treatment efficacy.
The input to our procedure is a list of antibodies for which we specify (i) the escape mediating sites for each antibody, and the (invariant) quantities describing (ii) the sitespecific cost of escape $\frac{\sigma}{{\theta}_{\text{ts}}}$, and (iii) the forward and backward mutation rates $(\frac{\theta}{{\theta}_{\text{ts}}},\frac{{\theta}^{\u2020}}{{\theta}_{\text{ts}}})$. To simulate the trial outcome for each patient, we use the neutral population diversity ${\theta}_{\text{ts}}$ directly inferred from the patients; see Inference of mutation rates and the neutral diversity within a population. From this, we construct the list of $L$ site parameters (concatenated across all antibodies) for selection and diversity: ${\sigma}_{1:L}$, ${\theta}_{1:L}$, ${\theta}_{1:L}^{\u2020}$.
We assume that at the start of the simulation, populations are in the steady state and that the potential escape sites are at linkage equilibrium. The approximate linkage equilibrium assumption is justified since the distance between these escape sites along the HIV1 genome is greater than the characteristic recombination length scale $\approx 100\text{bp}$ of the virus (Zanini et al., 2015). As a result, we draw an independent frequency x_{i} from the stationary distribution ${P}_{\text{eq}}(x{\sigma}_{i},{\theta}_{i},{\theta}_{i}^{\u2020})$ in Equation 12 to describe the state of a give site $i$, and use these frequencies to construct the initial viral genotypes ${\rho}^{v}$ for each virus $v$ in our initial population; see Algorithm 1 (Appendix 5). In simulations, we show that this assumption does not bias our results even when ${\theta}_{\text{ts}}$ is fluctuating and recombination is absent (Section 6.1 and Figure 4—figure supplement 2).
To sample from the stationary distribution itself, we define a novel Gibbssampling procedure (Geman and Geman, 1984) for generating the allele frequencies of the escape variants for the initial state of the population $x\sim {P}_{\text{eq}}(x\sigma ,\theta ,{\theta}^{\u2020})$ (Equation 12). To characterize this procedure, we expand the exponential selection factor ${e}^{\sigma (1x)}$ in the original distribution, which results in,
Here, ${Q}_{\text{eq}}(x,k\sigma ,\theta ,{\theta}^{\u2020})$ is a joint distribution over $(x,k)$, and the desired distribution over the allele frequency $x$ can be achieved by marginalizing the joint distribution over the discrete variable $k$. We can also express the conditional distributions for $x$ and $k$ as,
We use these conditional distributions to define a joint Gibbs sampler for ${Q}_{\text{eq}}$. We summarize the resulting $(x,k)\sim {Q}_{\text{eq}}(x,k\sigma ,\theta ,{\theta}^{\u2020})$ in the joint Gibbs sampler in Algorithm 2 (Appendix 5). This chain mixes extremely quickly and avoids calculation of the hypergeometric function for the normalization factor (Equation 20), which is computationally costly; see Algorithm 2 (Appendix 5).
Simulation of the evolutionary process
We use a Gillespie algorithm to simulate the evolutionary process, where we break up the reaction calculation into two parts: randomly choosing a viral strain ${\rho}_{i}$ from the population and then determining whether it reproduces or dies based on its fitness f_{i} and escape status; see Algorithm 3 (Appendix 5).
Determining the simulation parameters of the birthdeath process from genetic data
We set the intrinsic growth rate (fitness) of the wildtype virus, in the absence of competition to be $\gamma ={(3\text{days})}^{1}$, consistent with intrapatient doubling time of the virus (García et al., 2001; Ioannidis et al., 2000; García et al., 1999). We infer the neutralization rate $r$ by fitting the viremia curves (Figure 3) in the trials under study, and use the averaged decay rate $r=0.31$ for simulations, fitted using Equation 8. For the absolute mutation rate ${\mu}_{\text{ts}}$ (per nucleotide per day), we use $1.1\times {10}^{5}$ which is the average of the reported values for transitions per site per day from Zanini et al., 2017. Using the covariance of neutral diversity in twofold and fourfold synonymous sites, we determine the transition/transversion diversity ratio to be ${\theta}_{\text{ts}}/{\theta}_{\text{tv}}=7.8$ (Figure 2—figure supplement 1). We use these values to determine the forward and backward mutation rates ${\mu}_{s}$ and ${\mu}_{s}^{\u2020}$ for each site (see Inference of mutational target size for each bNAb).
Generally, the trial patients show viral populations with larger neutral diversity at the start of the trial compared to the patients enrolled in the highthroughput study of Zanini et al., 2015 (Figure 2). We account for differences in the genetic makeup of the patients enrolled in the trial by directly estimating the neutral diversity ${\theta}_{\text{ts}}$ from the synonomous sitefrequency data of patients, before the start of the trial. The estimated viral diversity ${\theta}_{\text{ts}}$, coupled with the mutation rate $\mu =1.1\times {10}^{5}\text{/day / nt}$, and the total rate of birth and death events in the viral population $\lambda $, set the carrying capacity ${N}_{k}$ for a given individual,
It should be noted that the value of ${N}_{k}$, as the number of viruses in our individualbased simulations, is not related to the maximal viral load in the viremia measurements (i.e. steady state copy number per ml) as this relationship depends on the microscopic details of the population dynamics.
The total rate of birth and death events $\lambda =\beta +\delta $ tunes the amount of stochasticity, that is, more events cause noisier dynamics. Notably, stochasticity can be linked to the size of the population ${N}_{k}$, which is directly coupled to $\lambda $ (Equation 21). We set the value of $\lambda $ selfconsistently by requiring the minimum frequency of a variant in our simulations ${x}_{\text{min}}=1/{N}_{k}=\frac{2{\mu}_{\text{ts}}}{\lambda}{\theta}_{\text{ts}}^{1}$ to be smaller than the escape threshold ${x}_{\text{ext}}=\frac{{\mu}_{\text{ts}}}{\gamma}{\theta}_{\text{ts}}^{1}$ due to stochasticity (Equation 17). We set $\lambda =2{\text{day}}^{1}$ so that ${x}_{\text{min}}=\frac{1}{3}{x}_{\text{ext}}$. Increasing $\lambda $ results in an increase in the size of population ${N}_{k}$ in our simulations, which is computationally costly, without qualitatively changing the statistics of the rebound trajectories.
Inference of mutation rates and the neutral diversity within a population
Previous work has indicated an order of magnitude difference between the rate of transitions (mutations within a nucleotide class) and transversions (outclass mutations) in HIV (Nielsen, 2006; Zanini et al., 2017; Theys et al., 2018; Feder et al., 2017). Therefore, to infer the neutral diversity parameter ${\theta}_{\text{ts}}$, we also account for the differences between transition and transversion rates.
Consider the set of sequences sampled from a patient’s viral population at a particular time. Two neutral alleles that are linked by a symmetric mutational process ${\mu}_{1\to 2}={\mu}_{2\to 1}$ have a simple count likelihood. The probability to see allele 1 with multiplicity $n$ and allele 2 with multiplicity $m$ is given by a binomial distribution $\text{Binom}(n,mx)$ with parameter $x$ denoting the probability for occurrence of allele 1, convolved with the neutral biallelic frequency distribution ${P}_{\text{eq}}(x\sigma =0,\theta )$ from Equation 12. Using this probability distribution, we can evaluate the loglikelihood $\mathcal{L}(\theta n,m)$ for the neutral diversity $\theta $ given the observations $(n,m)$ for the multiplicities of the two alleles in the population,
Inference of netral diversity, mutational target size, and selection from genetic data
To estimate the transition diversity, we only use twofold synonymous sites, and treat each site independently but with a shared diversity parameter ${\theta}_{\text{ts}}$. For example, consider neutral variations for two amino acids glutamine and phenylalanine. The third position in a codon for both of these amino acids are twofold synonymous, as the two possible codons for glutamine are CAG and CAA, and for phenylalanine are TTT, and TTC. Now consider that in the data a conserved glutamine has $n=3$ G’s and $m=97$ A’s in the third codon position, and a conserved phenylalanine has $n=10$ T’s and $m=90$ C’s at its third codon position. In this case, the combined loglikelihood for the shared diversity parameter is $\mathcal{L}(\theta \text{data})=\mathcal{L}(\theta 3,97)+\mathcal{L}(\theta 10,90)$. Extending to all of sites in the env protein, the maximumlikelihood estimator for the transition diversity ${\theta}_{\text{ts}}$ can be evaluated by maximizing the likelihood summed over all conserved twofold synonymous sites,
In Figure 2—figure supplement 1 (panel A), we show that the maximum likelihood estimation method described above has better properties than the more commonly used estimator of the variance $\overline{x(1x)}$ Stoddart and Taylor, 1988.
In a similar way, the likelihood for the transversion ${\theta}_{\text{tv}}$ is determined from polymorphic data at all conserved fourfold synonymous sites. One such example is the third position in a glycine codon, where (GGT, GGC, GGA, GGG) translate to the same amino acid. The maximumlikelihood estimator for the transversions is
The factor of 2 in the argument of the likelihood accounts for the multiplicity of mutational pathways, e.g. from a $G$ nucleotide there are two transversion possibilities, $G\to C$ and $G\to A$ for moving from one allele to the other Kimura, 1981.
Using this likelihood approach, we can infer the neutral diversities ${\theta}_{\text{ts}}^{*}$ and ${\theta}_{\text{tv}}^{*}$ for each patient at each time point from the polymorphism in twofold and fourfold synonymous sites. To characterize the ratio of transition to transversion rates, we use linear regression on the entire patient population and sample history and infer a constant ratio ${\mu}_{\text{ts}}/{\mu}_{\text{tv}}={\theta}_{\text{ts}}/{\theta}_{\text{tv}}=7.8$ (Figure 2—figure supplement 1B). In Figure 2—figure supplement 1, we also show that the estimate for this ratio is relatively consistent across different data sources, produced even by different sequencing technologies. The previously reported relative rate of transitions to transversions, based on the estimates of sequence divergence along phylogenetic trees of HIV1 is ${\mu}_{\text{ts}}/{\mu}_{\text{tv}}=5.6$ (Zanini et al., 2017), which is similar to our maximum likelihood estimate.
Inference of mutational target size for each bNAb
The nucleotide triplets which encode for amino acids at an escape site undergo substitutions which can change the amino acid type and create an escape variant. The changes in the state of an amino acid codon can be modeled as a Markov jump process and can be visualized as a weighted graph where the nodes represent codon states, and edges represent single nucleotide substitutions linking two codon states (Figure 1D). In our mutational model, these edges have weights associated with either the mutation rates for transitions ${\mu}_{\text{ts}}$ or transversions ${\mu}_{\text{tv}}$. We call this the codon substitution graph.
The codon states can be clustered into three distinct classes: (i) codons which are fatal $F$, (ii) wildtype (i.e. susceptible to neutralization by the bNAb) $W$, and escape mutants (i.e. resistant to the bNAb) $M$. We expect the escape mutants to be at a selective disadvantage compared to the resistant wildtype, and that the most common escape codons to be those which are adjacent to wildtype states.
The mutational target size is determined by the density of paths from the wildtype $W$ to the escape mutants $M$.
The functions $[cd=\text{ts}]$ or $[cd=\text{tv}]$ are 1 when the two codons are separated by a transition or transversion, and are zero otherwise. Note that since we only have an estimate for the ratio of the transition to transversion rates ${\mu}_{\text{tv}}/{\mu}_{\text{ts}}$, we can only determine the scaled mutational target sizes, $\widehat{\mu}=\mu /{\mu}_{\text{ts}}$ and ${\widehat{\mu}}^{\u2020}={\mu}^{\u2020}/{\mu}_{\text{ts}}$, which are sufficient for inference of selection in the next section. The full list of mutational target sizes inferred for the bNAbs in this study are shown in Appendix 1—table 1.
When discussing the mutational target size of escape from a given bNAb, we refer to the total mutation rate from the susceptible (wild type) to the escape variant as,
where the sum runs over all the mediating escape sites $i$, and can be interpreted as the average number of accessible escape variants. In the strong selection regime, we can write the frequency of escape mutants as,
Inference of selection for escape mutations against each bNAb
Here, we develop an approximate likelihood approach to infer the selection ratio $\widehat{\sigma}=\frac{\sigma}{{\theta}_{\text{ts}}}$, using the highthroughput sequence data from bNAbnaive HIV1 patients from Zanini et al., 2017. The quantity $\widehat{\sigma}$ is a dimensionless ratio which is independent of the coalescence timescale ${N}_{e}$, and therefore, represents a stable target for inference.
We assume that the probability to sample $m$ escape mutants and $s$ susceptible (wildtype) alleles at a given site in the genome of HIV in a population sampled from a patient at a given time point follows a binomial distribution $\text{Binom}(m,sx)$, governed by the underlying frequency $x$ of the mutant allele. In addition, the frequency $x$ of the allele of interest itself is drawn from the equilibrium distribution ${P}_{\text{eq}}(x\sigma ,\theta ,{\theta}^{\u2020})$ (Equation 12), governed by the diversity ${\theta}_{\text{ts}}$ inferred from the neutral sites, the estimated mutational target sizes $\widehat{\mu}=\mu /{\mu}_{\text{ts}},{\widehat{\mu}}^{\u2020}={\mu}^{\u2020}/{\mu}_{\text{ts}}$, and the unknown selection ratio $\widehat{\sigma}=\sigma /{\theta}_{\text{ts}}$. As a result, we can characterize the probability $P(m,s{\theta}_{\text{ts}},\widehat{\mu},{\widehat{\mu}}^{\u2020},\widehat{\sigma})$ to sample $m$ escape mutants and $s$ susceptibletype alleles, given the scaled selection and diversity parameters as,
Here, $Z(\sigma ,\theta ,{\theta}^{\u2020})$ is a confluent hypergeometric function of the model parameters that sets the normalization factor for the allele frequency distribution ${P}_{\text{eq}}(x)$ (Equation 13). It should be noted that the viral population is in fact out of equilibrium, due to constant changes in immune pressure evolution from the Bcell and Tcell populations (Nourmohammad et al., 2016). Although we are ignoring these significant complications, we later use the same equilibrium distribution in a consistent way to generate standing variation in simulations. For the model to make accurate predictions, it is not necessary that the equilibrium model be exactly correct, but only that it is rich enough to provide a consistent description for the distribution of mutant frequencies observed across viral populations.
We will use the probability density in Equation 29 to define a loglikelihood function in order to infer the scaled selection $\widehat{\sigma}=\sigma /{\theta}_{\text{ts}}$ from data. To do so, we first express the logarithm of this probability density as,
where the constant factors (const.) are independent of selection, and $\mathbb{E}[\cdot {]}_{\text{Beta}(\cdot )}$ denotes the expectation of the argument over a Beta distribution with parameters specified in the subscript. The expression in Equation 30 implies that we can evaluate the likelihood of selection strength by computing the difference between the logarithms of the expectation for ${e}^{\sigma x}$ over allele frequencies drawn from two neutral distributions (Beta distributions), with parameters $(\theta ,{\theta}^{\u2020})$ and $(\theta +m,{\theta}^{\u2020}+s)$, respectively. This approach is more attractive as it would not require direct evaluation of the confluent hypergeometric functions for the normalization factors in Equation 29. Estimating these normalization factors is computationally intensive for large values of $\sigma $, since many terms in the underlying hypergeometric series should be taken into account to stably compute them. However, evaluating the expectations via sampling from these two neutral distributions has the disadvantage that it is subject to variations across simulations. We reduce the variance of our estimate of $\mathrm{log}P(m,s\sigma ,\theta ,{\theta}^{\u2020})$ in Equation 30 by using a mixtureimportance sampling scheme (Owen and Zhou, 2000) with the details shown in Algorithm 4 (Appendix 5).
We use data collected across all time points and from all patients to infer reliable estimates for selection strengths. However, allele frequencies are correlated across time points within patients (Figure 2B), and thus, sequential measurements are not independent data points. Our estimates indicate a coalescence time of about ${N}_{e}\sim {10}^{3}$ days based on the estimates for the mutation rate ${\mu}_{\text{ts}}={10}^{5}$ /nt/day, and the neutral diversity ${\theta}_{\text{ts}}=2{N}_{e}{\mu}_{\text{ts}}=0.01$. This coalescence time is much longer that the typical separation between sampled time points within a patient ($\sim {10}^{2}$ days), suggesting that sequential samples collected from each individual in this data are correlated. Therefore, we treat each patient as effectively a single observation, using the timeaveraged likelihood for the (scaled) selection factor $\widehat{\sigma}$:
where $p$ and $t$ denote patient identity and sampled time points, respectively, and ${T}_{p}$ is the total number of time points sampled in patient $p$. We use the likelihood in Equation 31 to generate samples from the posterior distribution for selection strengths under a flat prior, with a standard MetropolisHastings algorithm Hastings, 1970. Since the prior is constant, this procedure amounts to simply accepting or rejecting samples based on the likelihood ratio of Equation 31. We used the centerednormal distribution with standard deviation of 50 ($\times {\mu}_{\text{ts}}$ in absolute units) as the proposal density for the jumps in the Markov chain.
Prior work has also inferred the fitness effect of mutations in HIV, but our approach differs in important aspects. For instance, maximum entropy models have been used to infer the preference of different amino acids in the Gag and the env proteins of HIV1 from their prevalences across sequences sampled from different patients( Louie et al., 2018; Ferguson et al., 2013). The inferred preference values can explain the invitro growth rate (fitness) of the associated viral strains, especially for sites that are relatively conserved and are not the drivers of antigenic evolution in HIV1. However, further modeling would be needed to quantitively map these inferred amino acid preferences onto population genetics measures that can be used to characterize the evolutionary dynamics of an HIV population. Other work has used longitudinal HIV1 sequence data to infer selection and to characterize the role of selective sweeps due to immune pressure, genetic hitchhiking and recombination in the turnover of HIV1 populations within patients (Zanini et al., 2017; Neher and Leitner, 2010; Illingworth et al., 2020; Haddox et al., 2018). In contrast, our work focuses on the expected composition of the population in a viremic patient prior to the start of treatment as opposed to the history of a viral population. Our approach uses a selfconsistent formalism for inference of the population genetics parameters (e.g. population diversity and selection strength) and for the evolutionary simulations used to predict outcomes. Therefore, we can directly interpret the fitted parameters in terms of both the viral dynamics and the pretreatment state of HIV1 populations within patients.
Predicting trial outcomes from genetically informed evolutionary models
Predicting rebound times
We expect different distributions of patient outcomes depending on whether they have been recently infected and thus have relatively low viral diversity, or whether their infection is longstanding with a diverse viral population. To construct the distribution of initial population diversities ${\theta}_{\text{ts}}$ for simulating trial outcomes, we apply the ${\theta}_{\text{ts}}$ inference procedure (Equation 22) to pretreatment sequence datasets available from the clinical trials under consideration. In Figure 2D this set of pretrial ${\theta}_{\text{ts}}$ is compared to the longitudinal inpatient ${\theta}_{\text{ts}}$ from Zanini et al., 2017. We used random draws from the inferred ${\theta}_{\text{ts}}$ values for patients to generate ${\theta}_{\text{ts}}$ for simulations.
We found that there was considerably more viral escape and nonresponders in our simulations than in the observed data as shown in Figure 4—figure supplement 3A. This is in addition to the fact that the patients were screened to have only susceptible variants to the antibodies used in trials (Caskey et al., 2015; Caskey et al., 2017; Baron et al., 2018). In theory, there should be zero nonresponders, as such patients should have been excluded by screening. The overprediction of both nonresponders and late rebounds is a signature of undercounting the effective diversity of the viral populations.
The failure of both screening and our naive prediction in undercounting the diversity in the viral population can be explained by an effective viral reservoir. Viral variants which mediate rebound can come from compartments such as including bone marrow, lymph nodes, and organ tissues, and can be genetically distinct from those sample from the plasma Tcells during screening (Chaillon et al., 2020; Wong and Yukl, 2016; Chun et al., 2005). This reservoir of viral diversity can reappear in plasma after infusion of a bNAb and could in part contribute to treatment failure (AvettandFenoel et al., 2007; Shan and Siliciano, 2013; Sharkey et al., 2011; Tagarro et al., 2018).
Determining patient diversity enhancement due to latent reservoirs
We model the effect of reservoirs as a simple inflation of the diversity observed by a multiplicative factor $\xi $. We fit $\xi $ directly to trial observations, using a disparitybased approach by minimizing an empirical divergence estimator Ekström, 2008 between the observed and simulated data. To do so, we characterize the Hellinger distance Lindsay, 1994; Simpson, 1987 between the true distribution of rebound times $P(t)$ and the rebound times $Q(t\xi )$ generated by simulations with a given reservoir factor $\xi $,
Algorithm 5 (Appendix 5) defines the procedure that we use to estimate the Hellinger distance D_{H}(Q(t)P(tξ)). Specifically, we use n_{q} quantiles of the observed data x_{(i)} ~ Q to partition the space of observations into discrete outcomes
where ${P}_{(i)}$ is a constant by construction, and ${P}_{(i)}(\xi )$ is estimated by simulations, and $[\cdot ]$ is the Iverson bracket Graham et al., 1989; see Algorithm 5 (Appendix 5).
To simulate data for this analysis, we generate $S$ rebound times (${T}_{1:S}$) by simulations, given the scaled diversity values $\xi {\theta}_{\text{ts}}$. We then find the optimal value ${\xi}^{*}$ by minimizing the disparity with the observed rebound times ${t}_{1:p}$ by bruteforce search,
Here, ReboundDisparity is the function defined by Algorithm 5 (Appendix 5); see Ekström, 2008 for details. We find the optimal reservoir factor to be ${\xi}^{*}=2.1$, which we use in subsequent therapy prediction. The disparity over various values of $\xi $ for different trials is shown in Figure 4—figure supplement 3.
Simulating outcomes of clinal trials
Given the reservoircorrected estimate of the diversity $\xi \theta $ and the posterior samples for selection factors $\widehat{\sigma}$, we now summarize how we simulated the outcome of clinical trials.
For a given bNAb, we draw the selection factor at each of the escape mediating sites from the corresponding Bayesian posterior on $\widehat{\sigma}$; the posterior distributions are shown in Figure 4A, and summarized in Appendix 1—table 1. We also use the mutational target size (forward $\mu $ and backward ${\mu}^{\u2020}$ rates) associated with each of the escape mediating sites of a given bNAb; see Appendix 1—table 1. The result can be summarized in a mutation / selection matrix ${\widehat{M}}_{a}$ for a given bNAb $a$, where each column corresponds to an escape mediating site $i$ against the bNAb,
The elements of the matrix ${\widehat{M}}_{a}$ are the scaled mutation and selection factors, i.e., ${\widehat{\mu}}_{i}={\mu}_{i}/{\mu}_{\text{ts}},{\widehat{\mu}}_{i}^{\u2020}={\mu}_{i}/{\mu}_{\text{ts}},\text{and}{\widehat{\sigma}}_{i}={\sigma}_{i}/{\theta}_{\text{ts}}$, where the absolute value of mutation rate is set to ${\mu}_{\text{ts}}=1.11\times {10}^{5}/\text{nt}/\text{day}$ from Zanini et al., 2017.
For each patient in our simulated trial, we then draw diversity ${\theta}_{\text{ts}}$ from the patient pool, and scale it by our fitted $\xi =2.1$, resulting in patientspecific selection and mutation factors,
These parameters are then used to initialize the state of an HIV population within a patient according to (Appendix 5), and to determine the absolute rates in Algorithm 3 (Appendix 5) for the population evolution according to Equation 38. The decay rate is set to the fitted trial average of $r=0.31{\text{days}}^{1}$ (Equation 8). The carrying capacity ${N}_{k}$ is set according to Equation 21. This determines all parameters of the birthdeath process simulating the intrapatient evolution of HIV, which are used in Algorithm 3 (Appendix 5).
We evolve a population through time until 56 days have elapsed since treatment, or until the escape fraction relative to the carrying capacity x_{t} is above 0.8; see Algorithm 3 (Appendix 5). After ${x}_{t}>.8$ the evolution is governed by the deterministic equations, and the stochastic simulation ends. The rebound time $T$, defined as the intersection of the exponential envelope and the carrying capacity, can then be calculated analytically as,
The resulting distribution for rebound times are shown as model predictions in Figure 3B–D.
Rebound times generated in this fashion were also used to estimate the probability of late rebound to characterize the efficacy of a given bNAb in curbing viral rebound. The probability of late rebound was estimated from 10^{4} simulated patients. The interdecile quantiles (0.1–0.9) of early rebound (lt_{56} days) probability over 200 values of scaled selection coefficients $\widehat{\sigma}$ drawn from the posteriors in Figure 4A are shown in Figure 4C.
Model robustness
Effect of genomic linkage on the inference of selection
In our inference of selection (Equation 31, Figure 4A), we assume that the escapemediating sites are at linkage equilibrium and that the distribution of allele frequencies can be approximated by a skewed Beta distribution (Equation 12), reflecting the equilibrium of allele frequencies. In reality, despite recombination, the HIV genome exhibits linkage effects, especially at nearby sites Zanini et al., 2015, and the viral populations experience changing selective pressures by the immune system Feder et al., 2021; Theys et al., 2018; Nourmohammad et al., 2019, and the transient population bottlenecks during therapy Feder et al., 2016.
To test the limits on the validity of our inference procedure, we applied it to in silico populations generated by fullgenome forwardtime simulations (Appendix 5, Algorithm 3) in the presence and absence of recombination. To do so, we considered an ensemble of ten patients with 100 genomes sampled at 10 time points, and used two diversity parameters ${\theta}_{\text{ts}}=0.01$ and ${\theta}_{\text{ts}}=0.1$, to cover the range reflected in patient data (Figure 2D, Figure 4—figure supplement 2).
One relevant scenario to consider is the impact of other selected sites in the genome on the distribution of alleles at the escape mediating sites against bNAbs. The sites under a strong constant selection are likely to be already fixed (or at a high a frequency) at their favorable state in the population. However, the strong selection on a large fraction of antigenic sites can be thought as timevarying, due to the changing pressure imposed by the immune system or therapy. To capture this effect, we simulated whole genome evolution in which linked sites were under strong selection ($0.1\times \text{growth rate}$), and where the sign of selection changed after exponentially distributed waiting times (i.e. as a Poisson process); this model of fluctuating selection has been used in the context of influenza evolution Strelkowa and Lässig, 2012, and for somatic evolution of Bcell repertoires in HIV patients Nourmohammad et al., 2019. The resulted evolutionary dynamics in this case can involve strong selective sweeps and clonal interference due to the continuous rise of beneficial mutations (in the linked sites) within a population.
To test the robustness of our selection inference, we evaluated the distribution of maximum likelihood estimates (MLEs) for the selection values $\widehat{\sigma}=\sigma /{\theta}_{t}s$ at the escape mediating sites, inferred from the ensemble of sequences obtain from simulated data with linkage. Figure 4—figure supplement 2 shows that even for fully linked genomes (zero recombination) our MLE estimate of selection has little bias relative to the true values used in the simulations. Adding recombination into the simulations only further attenuates the effect of linkage (Figure 4—figure supplement 2), making the estimates more accurate.
The reason that selective sweeps of linked beneficial mutations have only minor effects on our inference of selection for the escape mediating sites is twofold: First, the primary mechanism by which selective sweeps change the strength of selection at linked sites is via a reduction in the effective population size Hill and Robertson, 2009; Comeron et al., 2008. However, variations in the effective population size are already accounted for in our inference procedure: The selection likelihood in Equation 31 is conditioned on the measured neutralsite diversity, $\theta =2{N}_{e}\mu $. The change in the effective population size impacts the selection coefficient $\sigma =2{N}_{e}\mathrm{\Delta}$ and the diversity $\theta =2{N}_{e}\mu $ in the same way, and therefore, the (scaled) selection parameter $\widehat{\sigma}=\sigma /\theta $ that we infer from data remains insensitive to changes in the effective population size.
The second reason for the robustness of our selection inference to linkage is due to the fact that a beneficial allele in a linked locus can appear on a genetic background with or without a susceptible variant, leading to the rise of either variants in the population. As a results, the impact of such hitchhiking remains a secondary issue in inference of selection at the escape sites, for which an ensemble of populations from different patients with distinct evolutionary histories of HIV1are used.
Robustness of selection inference to intrapatient temporal correlations of HIV alleles
To infer the selection effect of mutations from the longitudinal deep sequencing data of Zanini et al., 2015, we use time averaging of the likelihood (Equation 31) to avoid conflating our results due to temporal correlations between the circulating alleles within patients (Figure 2B). We can view this choice as being one choice among two extremes: (i) to treat each patient as effectively one independent data point so that all patients are given the same weight or (ii) to treat each time point as independent, giving patients with more time points a higher weight. These two choices correspond to different loglikelihood functions for the (scaled) selection factor $\widehat{\sigma}$:
where ${T}_{p}$ is the total number of time points from patient $p$, and ${P}_{p}({m}_{t},{s}_{t}{\theta}_{\text{ts}}(t),\widehat{\sigma})$ is the probability to observe $m$ escape mutants, and $s$ susceptible variants at time $t$ in patient $p$, given the neutral diversity ${\theta}_{\text{ts}}(t)$ and the scaled selection factor $\widehat{\sigma}$.
We find that both of these approaches result in similar posteriors for selection $\widehat{\sigma}$ (Model robustness Figure 4—figure supplement 1A and C), although the $t$averaged likelihood has a higher uncertainty due to fewer independent time points. Thus, our inference of selection is insensitive to the exact choice of the likelihood function given in Equation 40, yet our timeaveraged approach remains the more conservative choice between the two.
Statistical significance of the reservoircorrected diversity
In Equation 36, we introduced the reservoir factor ${\xi}^{*}=2.1$ to account for the diversity of HIV1that is not sampled from a patient’s plasma prior to therapy, which resulted in a better fit of the rebound time distributions (Figure 3) compared to a reservoirfree model (Figure 4—figure supplement 3A). Here, we quantify the importance of the reservoir factor with a statistical test on the null hypothesis, ${\xi}_{0}=1$. Specifically, we perform a hypothesis test to test the necessity of using an inflated diversity $\xi {\theta}_{\text{ts}}$ relative to using the bare diversity observed in pretrial sequence data ${\theta}_{\text{ts}}$. To do so, we construct a disparitybased test statistic Ekström, 2008, which is analogous to the likelihood ratio test statistic.
Recall that the optimal reservoir factor ${\xi}^{*}=\mathrm{arg}{\mathrm{min}}_{\xi}R(\xi {t}_{1:p})$ was obtained by minimizing the disparity function $R(\xi {t}_{1:p})$ across measurements of rebound times ${t}_{1:p}$ from all the $p$ patients in data (Equation 36). We can estimate the test statistic for the reduction in disparity between the null hypothesis, ${\xi}_{0}=1$ and the fitted reservoir factor ${\xi}^{*}$ as,
We can then determine the pvalue by estimating the quantile of the observed test statistic ${\mathrm{\Delta}}_{R}({t}_{1:p})$ relative to that inferred from the distribution of ${\mathrm{\Delta}}_{R}({T}_{1:p})$ obtained from simulations under null hypothesis ${\xi}_{0}=1$ (Fisher, 1956). Specifically,
where $\underset{{T}_{1:p}{\xi}_{0}}{\mathbb{E}}\phantom{\rule{thinmathspace}{0ex}}(\cdot )$ denotes the expectation over the rebound times ${T}_{1:p}$ obtained from 1000 realizations of simulated populations each with $p$ patients, and under the null hypothesis ${\xi}_{0}=1$ is the Iverson bracket that takes value 1 when its argument is true and 0, otherwise (Equation 34). The observed ${\mathrm{\Delta}}_{R}$ (Equation 41), the distribution of simulated values of ${\mathrm{\Delta}}_{R}({T}_{1:p}{\xi}_{0})$ under the null hypothesis, and the resulting $\text{pvalue}=0.004$ are shown in Figure 4—figure supplement 3B, C.
It should be noted that here we use the disparity measure because the corresponding likelihood function for the reservoir factor is inaccessible through forward simulations of populations. However, a general analogy exists between our approach and the more commonly used likelihood approach. Specifically, in an analogous likelihoodratio test, the test statistic ${\mathrm{\Delta}}_{L}={\mathrm{max}}_{\xi}\mathrm{log}p(\xi {t}_{1:p})\mathrm{log}p({\xi}_{0}{t}_{1:p})$ would be asymptotically ${\chi}^{2}$distributed with one degree of freedom under the null hypothesis Fisher, 1954, and the quantile under the nullhypothesis (pvalue) would be estimated by inverting the ${\chi}^{2}$ cumulative distribution function (i.e. a ${\chi}^{2}$ test).
Robustness of selection inference to strains from different clades of HIV
The longitudinal deep sequencing data of Zanini et al., 2015 is collected from 11 patients, 9 of whom are infected with clade B strains of HIV1, which is the dominant clade circulating in Europe Spira et al., 2003. All of the clinical trials we considered Caskey et al., 2015; Caskey et al., 2017; Baron et al., 2018 are from patients carrying clade B strains. For the results presented in the main text, we included all the 11 patients in our analysis. Here, we test wether our inference of selection is sensitive to the choice of including or excluding nonclade B patients in our analysis. We therefore repeated our inference procedure for selection by excluding the two nonclade B patients. Model robustness Figure 4—figure supplement 1A,B shows a strong agreement between the Bayesian posterior for selection factors in the two cases, with a slight increase in uncertainty for the case with only clade B patients. This increased uncertainty is related to the reduction in sample size by excluding the nonclade B patients from data. Nonetheless, the richness of the intrapatient diversity makes the inference robust to the exclusion of one or two patients.
Robustness of predictions for trial efficacy to the inferred values of selection strength
How sensitive are the outcomes of our predictions for the rebound time distributions (Figure 1) to the exact values of inferred selection strengths we used for our simulations? We addressed this question by performing a disparity analysis similar to that for the diversity $\theta $ in Equation 41. Specifically, we assessed whether we might need to rescale our inferred selection strength $\sigma /{\theta}_{\text{ts}}$ by a multiplicative factor ${\xi}_{s}$ (Figure 4—figure supplement 4).
In contrast to diversity, the reduction in disparity for adjustment of selection with a factor ${\xi}_{s}$ is small (Figure 4—figure supplement 4A) and not statistically significant ($\text{pvalue}=0.49$; Figure 4—figure supplement 4B), and could be attributed to count noise. Still, we cannot discount the possibility that selection was slightly overestimated, possibly due to the effect of compensatory mutations in linked genome, the interplay between the reservoir and the inference of selection, or other biological factors. Nonetheless, in absolute terms, the null hypothesis (i.e. ${\xi}_{s}=1$) cannot be rejected and we have no statistical justification for adding an adjustment factor for selection inferred from untreated patients.
Robustness of rebound time predictions to methods of identifying escape mutations
Our predictions rely on identifying escape variants against each bNAb, either based on in vivo trial data and patient surveillance, or in vitro DMS assays. To test the sensitivity of our results to these methods, we compare the predictions of rebound time distributions when identifying escape sites from the DMS data versus the trial data. In Appendix 4, we perform this comparison for the 10–1074 and the PGT121 bNAbs; we do not include the 3BNC117 bNAb in this analysis since it targets the CD4 binding site of HIV1and the DMS data is unreliable for identifying escape variants against it. As shown in Appendix 4—figure 1, both trialinferred and DMSinferred escape sites result in good predictions for rebound time distributions. However, its appears that using the DMSinferred escape sites could lead to a more optimistic prediction for treatment success (i.e. a later rebound). This inconsistency may be due to the fact that DMS data is collected in vitro, and other biological factors could be influencing the in vivo escape patterns. Moreover, the differences in the genetic composition of the HIV1strains circulating in patients enrolled in trials and the strains used for the DMS experiments could lead to different epistatic interactions that can enhance or reduce the chances of escape. For a systematic understanding of these differences and limitations, more experiments would be necessary. Nonetheless, DMS data can provides baseline to gauge the efficacy of different bNAbs for therapy and further in vivo investigation.
Appendix 1
Appendix 2
FokkerPlanck description of the birthdeath process
The individualbased birthdeath model introduced above specifies the stochastic dynamics of a population state over time. We characterize the state of a population by vector $\mathit{n}=({n}_{1},\dots {n}_{M})$, where n_{a} is the number of type $a$ variants within the population. Using the concept of chemical reactions, suitable for Gillespie algorithm Wilkinson, 2019; Gillespie, 2002, we can determine the propensity ${a}_{r}(\mathit{n})$ for a given reaction $r$ (i.e., birth, death, or mutation) in a population of state $\mathit{n}$, which in turn determines the rate at which the reactions occur (Equation 9). We denote the resulting change in the state of a population due to reaction $r$ by $\mathit{\nu}}_{r$. Taken together, the impact of the reactions in the birthdeath model can be summarized as, Reaction Rate parameter Propensity State change
where ${\widehat{e}}_{i}$ is a vector of size $M$ equal to size of the population state vector, in which the ${i}^{th}$ element equal to one and the rest are zero. For example, a mutation reaction $\mu (a\to b)$ destroys a variant $a$ and creates a variant $b$, resulting in the following change in the state vector,
The reactions in Equation 43 specify a Master equation for the change in the probability of the population state $p(\mathit{n})$,
where $\mathit{n}=\sum _{i}{n}_{i}{\hat{e}}_{i}$ is the state vector. Using a KramersMoyal expansion Risken, 1989, we arrive at a FokkerPlanck approximation for the change in the probability distribution of the population state $p(\mathit{n})$,
We can identify the drift (i.e. the deterministic force) and diffusion tensors of the FokkerPlanck operator:
To better demonstrate the structure of this birthdeath operator, consider a biallelic case (e.g. susceptible and resistant) with a twodimensional state vector, $\mathit{n}=\left(\begin{array}{c}{n}_{0}\\ {n}_{1}\end{array}\right)$. The drift and the diffusion tensors associated with this process follow,
Note that in Equation 49 we neglect the stochasticity due to mutations since the magnitude of the associated noise is much smaller than the noise due to the birth and death (i.e., genetic drift). From this Fokkerplanck equation we directly recover the equilibrium distribution which links the parameters of our model to the genetic observables.
In the steady state, the population is fluctuating around carrying capacity ${\sum}_{a}{n}_{a}\approx {N}_{k}$ and we can represent the population state via allele frequencies ${x}_{a}={n}_{a}/{N}_{k}$. In the simple case of a biallelic problem, the equilibrium allele frequency distribution ${P}_{\text{eq}}(x)$ follows the Wrightequilibrium distribution Crow and Kimura, 2010 with modified rates,
Appendix 3
Incomplete escape of HIV1 from bNAbs
In our model, we assume that during therapy the susceptible HIV1 subpopulation is completely neutralized by bNAbs and is removed at an exponential rate from the viral population. On the other hand, the resistant variants grow in accordance with their intrinsic fitness in the presence of bNAbs. However, this binary categorization is an approximation, and binding kinetics between viral epitopes and bNAbs implies that neutralization is a probabilistic process and could be incomplete, resulting in the removal of only a fraction of the susceptible variants. The degree of this neutralization depends on the concentration of the infused bNAbs in a patient’s serum, and the binding affinity between bNAbs and the susceptible viral epitopes. In a simple model of binding kinetics, the binding affinity is characterized by the half maximal inhibitory concentration, or IC_{50}. The probability that a (susceptible) viral variant ‘$i$’ does not bind to a neutralizing bNAb with serum concentration [A] is given by,
where ${\text{IC}}_{50}^{i}$ is the half maximal inhibitory concentration of the bNAb against viral variant $i$. We model the growth rate ${\gamma}_{i}(\text{[A]})$ of the susceptible variant $i$ in the presence of a neutralizing bNAb with concentration [A] as a weighted sum of the viral growth rate in the absence of bNAbs $\gamma}_{i}^{0$, and the decay rate $r$ of the virus due to neutralization; the respective weights are set according to the probability that a viral variant is bound (or unbound) to a bNAb (Equation 51), which results in,
In the course of a trial, the concentration of an infused bNAb decays exponentially $\text{[A]}(t)={\text{[A]}}_{0}{e}^{\frac{t}{{\tau}_{a}}}$, with a characteristic time of about ${\tau}_{a}\simeq 20\text{days}$Stephenson et al., 2021; here ${\text{[A]}}_{0}$ is the concentration of the bNAb upon infusion. By combining Equation 52 with the bNAb’s exponential decay over time, we find that the growth rate of variant $i$ over time is given by
The growth rate of the susceptible variant is determined by the ratio ${\text{[A]}}_{0}/{\text{IC}}_{50}^{i}$ between the initial antibody concentration and the half maximal inhibitory concentration of the bNAb.
As shown in Appendix 3—figure 1, if the initial antibody concentration is an order of magnitude below the IC_{50}(orange line), the dynamics of a patient’s viremia closely resembles that of a fully resistant population (solid black line). On the other hand, if the initial bNAb dosage is an order of magnitude above the IC_{50}(blue line), the dynamics behaves as though the viral population is completely susceptible (dotted line), resulting in a late viral rebound (more than 8 weeks). We find that incomplete neutralization is most relevant when the initial bNAb concentration is roughly similar to IC_{50}, that is, ${\text{[A]}}_{0}/{\text{IC}}_{50}\approx 1$ (green line). Therefore, our binary categorization is applicable so long as the distribution of ${\text{IC}}_{50}^{i}$ has low density around initial serum concentrations ${\text{[A]}}_{0}$. The likelihood of ${\text{IC}}_{50}^{i}$ matching the antibody dose determines the rate at which our results will be biased by incomplete neutralization.
To compare IC_{50}with the initial bNAb concentration, we use the neutralization data from the 10–1074 bNAb trial Caskey et al., 2017. For this trial, TZMbl neutralization assays against different bNAbs were obtained on 114 pseudoviruses expressing envelope proteins derived from circulating viruses in patients on day 0 (55 pseudoviruses) and week 4 (59 pseudoviruses) after 10–1074 infusion. Expectantly, a fraction of viruses from week 4 were resistant (i.e. large IC_{50}) to the 10–1074 bNAb used in the trial (orange histogram in Appendix 3—figure 2A). However, almost all viruses were susceptible to the (control) 3BNC117 bNAb, which the viruses had not been previously exposed to (orange histogram in Appendix 3—figure 2B).
For comparison, the distribution for the initial concentration ${\text{[A]}}_{0}$ (or equivalently the maximum concentration) of the infused 10–1074 bNAb in patients enrolled in this trial is shown in Appendix 3—figure 2 (blue histograms), and is peaked around $200\mu \text{g/ml}$. The IC_{50}values in this trial are much lower (higher) for susceptible (resistant) variants compared to the initial bNAb concentrations. Therefore, our simplified model assuming that a viral variant is either fully resistant or susceptible to a bNAb (i.e. no incomplete escape) is a reasonable approach for capturing the statistics of treatment failure at the concentrations tested in these trials. Nonetheless, developing a genotypetoneutralization model such as the ones developed by Wagh et al., 2016; Yu et al., 2019, may allow for a more nuanced approach to modeling of neutralization in future work.
Appendix 4
Comparison of DMSinferred and trialinferred escape pathways
In this appendix we compare the sensitivity of our rebound time predictions to the methodology used to identify sites of escape (i.e., data from DMS experiments versus therapy trials). To do so, we focus on two bNAbs, 10–1074 and PGT121 for which we have access to both the trial Caskey et al., 2017; Stephenson et al., 2021 and the DMS data Dingens et al., 2019; Dingens et al., 2017; Schommers et al., 2020. The inferred resistant and susceptible variants at the identified sites for both of these bNAbs based on their respective trials and the DMS data are reported in Appendix 4—table 1.
Although there is a lot of commonalities among the two approaches, the two methods can be sensitive to different mutational pathways in certain sites. This is in part expected as the DMS procedure can generate many rare variants, but all on a common genetic background background; in this case the experiments scan only single point mutations away from the HIV1 strain BF520.W14M.C2 Dingens et al., 2017. On the other hand, the clinical trial data shows escape variants on the genetic background of a diverse viral population circulating in a patient, and the fate of a mutation can be strongly determined by the epistatic interactions with the background genome that it appears on.
To assess the sensitivity of our analyses, we make predictions for the distribution of rebound times, using the escape pathways identified based on the DMS versus the trial data in Appendix 4—table 1. Overall, we see good agreements between the two approaches, but DMSinferred escape pathways predict slightly too many patients with late rebound (${T}_{p}>56$ days) compared to the trialbased inference, i.e., DMSbased predictions are more optimistic (Appendix 4—figure 1). This deviation is likely to be related to the diversity of viral genetic backgrounds in clinical trials, since more escape pathways could be realized through positive epistasis, leading to a faster viral rebound.
Appendix 5
Simulation algorithms
Algorithm 1. Population initialization 

procedure Population Initialization ($N}_{k},{\sigma}_{1:L},{\theta}_{1:L},{\theta}_{1:L}^{\u2020$) ⊳ Generates a population of size of $N}_{k$ from equilibrium. $\theta$, and $\sigma$ are the parameters defining the equilibrium values of the population state. Returns the initial vector of genotypes for $i\in 1:L$ do ${x}_{i}\sim {P}_{\text{eq}}(x{\sigma}_{i},{\theta}_{i},{\theta}_{i}^{\u2020})$ end for for $v\in 1:{N}_{k}$ do ${\rho}_{i}^{v}\sim \text{Bernoulli}({x}_{i})$ end for return $\rho}^{1:{N}_{k}$ end procedure 
Algorithm 2. Gibbs Sampler for Allele Frequencies 

procedure Equlibrium Sampler ($\sigma ,\theta ,{\theta}^{\u2020}\text{Samples}$) ⊳Generates a stream of nonindependent but rapidly mixing samples $X\sim p(x\sigma ,\theta ,{\theta}^{\u2020})$. Default BurnIn is 10. $N\leftarrow \text{Samples}+\text{BurnIn}$ ${K}_{0}\leftarrow \text{Round}(\sigma )$ for $n\in 1:N$ do ${X}_{n+1}\sim \text{Beta}(\theta ,{\theta}^{\u2020}+{K}_{n})$ ⊳Sample the mutant fraction ${K}_{n+1}\sim \text{Poisson}(\sigma (1{X}_{n+1}))$ ⊳Sample the auxiliary parameter end for return $X}_{\text{BurnIn}:N$ ⊳Return only the mutant frequency $X$ part of the chain (marginalize over $k$) end procedure 
Algorithm 3. Population time step 

procedure EvolvePopulation ($t,{\rho}^{1:N}\lambda ,\gamma ,r,$) ⊳ Acts on a time $t$ and a list of $N$ genotypes $\rho}^{1:N$. Inherits dependency on other parameters from the fitness function $F(G)$ and the $\text{Mutate}(G)$ operator which depend on $\mathrm{\Delta}}_{1:L},{\mu}_{1:L},{\mu}_{1:L}^{\u2020$ and $\gamma$ and the population diversity measure $\theta}_{ts$. $\begin{array}{l}\varphi \leftarrow \frac{1}{N}\sum _{i}F({g}_{i})\\ {t}^{\prime}\leftarrow t+\frac{\text{RandExp()}}{\lambda N}\\ i\sim \text{Rand}(1:N)\\ G\leftarrow {\rho}^{i}\end{array}$ if $\text{IsEscaped}(G)$ then ⊳ If the virus is escaped $D\sim \text{Bernoulli}(\frac{\lambda F(G)+\varphi}{2\lambda})$ ⊳ Determine if the virus dies ($D=\text{true}$) or lives ($D=\text{false}$). if $D$ then $\begin{array}{l}{N}^{\prime}\leftarrow N1\\ {\rho}^{1:{N}^{\prime}}\leftarrow {\rho}^{1:\stackrel{~}{i}:N}\end{array}$ ⊳ Delete genotype at position $i$ else $\begin{array}{l}{N}^{\prime}\leftarrow N+1\\ {\rho}^{1:{N}^{\prime}}\leftarrow \text{Append}({\rho}^{1:N},G)\end{array}$ ⊳ Duplicate genotype at position $i$ end if else ⊳ If the virus is neutralized $D\sim \text{Bernoulli}(\frac{r}{\lambda})$ ⊳ remove it at the appropriate rate if $D$ then $\begin{array}{l}{N}^{\prime}\leftarrow N1\\ {\rho}^{1:{N}^{\prime}}\leftarrow {\rho}^{1:\stackrel{~}{i}:N}\end{array}$ ⊳ Delete genotype at position $i$ end if end if $j\sim \text{Rand}(1:{N}^{\prime})$ ⊳ Choose a random virus to mutate ${\rho}^{j}\leftarrow \text{Mutate}({\rho}^{j})$ ⊳ Apply mutation operator with intensity $\mu /\lambda$ return $({t}^{\prime},{\rho}^{1:{N}^{\prime}})$ ⊳ Return the new time and the new population. end procedure 
Inference algorithms
Algorithm 4. Importance sampled loglikelihood given a single datapoint 

procedure SigmaLikelihood ($\hat{\mu},{\hat{\mu}}^{\u2020},m,s,{\theta}_{ts},N$) ⊳ Takes mutant $m$ and susceptible $s$ counts for a particular site at a single timepoint. Returns an approximate loglikelihood function $l(\hat{\sigma})=\mathrm{log}P(m,s{\theta}_{ts},\hat{\sigma})+c$, up to an additive constant. SigmaLikelihood is a closure that returns a oneparameter function. We used N =103 samples. $\begin{array}{l}\theta \leftarrow \hat{\mu}{\theta}_{ts}\\ {\theta}^{\u2020}\leftarrow {\hat{\mu}}^{\u2020}{\theta}_{ts}\end{array}$ for $i\in 1:N$ do ${x}_{i}\sim \text{Beta}(\theta ,{\theta}^{\u2020})$ ⊳ Sample from the neutral distribution $w}_{i}\leftarrow \frac{\mathcal{B}(\theta +m,{\theta}^{\u2020}+s)}{\mathcal{B}(\theta ,{\theta}^{\u2020})}\frac{1}{{x}_{i}^{m}(1{x}_{i}{)}^{w}$ ⊳ Importance weight ratio ${y}_{i}\sim \text{Beta}(\theta +m,{\theta}^{\u2020}+s)$ ⊳ sample from the neutral distribution, conditioned on the observations $v}_{i}\leftarrow \frac{\mathcal{B}(\theta +m,{\theta}^{\u2020}+s)}{\mathcal{B}(\theta ,{\theta}^{\u2020})}\frac{1}{{y}_{i}^{m}(1{y}_{i}{)}^{w}$ ⊳ Importance weight ratio end for $Z}_{0}(\hat{\sigma}):=\frac{1}{N}\sum _{i}{e}^{\hat{\sigma}{\theta}_{ts}{x}_{i}}\frac{1}{1+{w}_{i}}+\frac{1}{N}\sum _{i}{e}^{\hat{\sigma}{\theta}_{ts}{y}_{i}}\frac{1}{1+{v}_{i}$ ⊳ importance sampling mean $Z}_{1}(\hat{\sigma}):=\frac{1}{N}\sum _{i}{e}^{\hat{\sigma}{\theta}_{ts}{x}_{i}}\frac{1}{1+{w}_{i}^{1}}+\frac{1}{N}\sum _{i}{e}^{\hat{\sigma}{\theta}_{ts}{y}_{i}}\frac{1}{1+{v}_{i}^{1}$ ⊳ Note the inversion in the weighting factor compared to Z_{0}. return $l(\hat{\sigma}):=\mathrm{log}{Z}_{1}(\hat{\sigma})\mathrm{log}{Z}_{0}(\hat{\sigma})$ ⊳ Return a loglikelihood function. The same random variable realizations $x}_{1:N$ and $y}_{1:N$ are cached in memory and used for each function evaluation, making $l(\hat{\sigma})$ continuous and differentiable. end procedure 
Algorithm 5. Reboundtime Disparity 

procedure ReboundDisparity ($t}_{1:P},{T}_{1:S$) ⊳ Takes the observed rebound times ${t}_{1:P}\sim Q$ from a set of trial patients and simulated late rebound times ${T}_{1:S}\sim P$ and returns a disparity estimator. ⊳ First estimate the probabilities in the truncated observation categories ${P}_{(NR)}\leftarrow \frac{1}{P}\sum _{p}[{t}_{p}<1\mathrm{d}\mathrm{a}\mathrm{y}]$ ⊳ Count the fraction of nonresponders in trial ${Q}_{(NR)}\leftarrow \frac{1}{S}\sum _{s}[{T}_{s}<1\mathrm{d}\mathrm{a}\mathrm{y}]$ ⊳... and in simulation ${P}_{(LR)}\leftarrow \frac{1}{P}\sum _{p}[{t}_{p}\ge 56\text{days}]$ ⊳ Count the fraction of late rebounds in trial ${Q}_{(LR)}\leftarrow \frac{1}{S}\sum _{s}[{T}_{s}\ge 56\text{days}]$ ⊳... and in simulation ⊳ Then construct a histogram over the continuous datapoints (i.e. t ∈ [0, 56]) and estimate the probability in each bin ${t}_{1:{P}^{\prime}}=\text{SortAscending}(Filter[1\le t<56]({t}_{1:p}))$ ⊳ Select only the observed rebound times $t}_{0}\leftarrow \mathrm{\infty$ and $t}_{{P}^{\prime}+1}\leftarrow \mathrm{\infty$ for $p\in 1:{P}^{\prime}$ do $Q}_{(p)}\leftarrow \frac{1}{P$ ⊳ By design, each histogram bin contains one observed data point, and gets 1 /P mass ${P}_{(p)}\leftarrow \frac{1}{S}\sum _{s}[max(1,\frac{1}{2}({t}_{p1}+{t}_{p}))\le {T}_{s}<min(56,\frac{1}{2}({t}_{p+1}+{t}_{p}))\text{days}]$ ⊳ Use the midpoints of adjacent points to construct the boundaries of histogram bins, and determine probability mass in each bin. end for return $P\sum _{i\in NR,LR,1:{P}^{\prime}}({Q}_{(i)}^{1/2}{P}_{(i)}^{1/2}{)}^{2}$ ⊳ Return the discretized estimate of the Hellinger distance, scaled by the number of patients end procedure 
Data availability
The current manuscript is a computational study, so no data have been generated for this manuscript. Reference to the previously published data used in this manuscript is provided. Modelling code is uploaded on GitHub at https://github.com/StatPhysBio/HIVTreatmentOptimization, (copy archived at swh:1:rev:0194cf6e554996a066633e99dd53cd5901da552e) and in the Julia package https://github.com/StatPhysBio/EscapeSimulator (copy archived at swh:1:rev:9a343f598820bafddfc7ea4547cefa90bf96fd6e).

European Nucleotide ArchiveID PRJEB9618. Project: PRJEB9618.

NCBI GenBankID KX016803. HIV1 isolate 2A1_DB7_02 from USA envelope glycoprotein (env) gene, partial cds.

NCBI GenBankID KY323724KY324834. Antibody 101074 suppresses viremia in HIV1infected individuals.

NCBI GenBankID MH632763. Safety and antiviral activity of combination HIV1 broadly neutralizing antibodies in viremic individuals.
References

BookAn Introduction to Stochastic Processes with Applications to BiologyBoca Raton: CRC press.https://doi.org/10.1201/b12537

Effect of HIV antibody VRC01 on viral rebound after treatment interruptionThe New England Journal of Medicine 375:2037–2050.https://doi.org/10.1056/NEJMoa1608243

Evidence for positive epistasis in HIV1Science 306:1547–1550.https://doi.org/10.1126/science.1101786

Broadly neutralizing antibodies to HIV and their role in vaccine designAnnual Review of Immunology 34:635–659.https://doi.org/10.1146/annurevimmunol041015055515

Broadly neutralizing antibodies for HIV1 prevention or immunotherapyThe New England Journal of Medicine 375:2019–2021.https://doi.org/10.1056/NEJMp1613362

Antibody 101074 suppresses viremia in HIV1infected individualsNature Medicine 23:185–191.https://doi.org/10.1038/nm.4268

HIV persists throughout deep tissues with repopulation from multiple anatomical sourcesThe Journal of Clinical Investigation 130:1699–1712.https://doi.org/10.1172/JCI134815

SoftwareHIVTreatmentOptimization, version swh:1:rev:0194cf6e554996a066633e99dd53cd5901da552eSoftware Heritage.

SoftwareEscapeSimulator, version swh:1:rev:9a343f598820bafddfc7ea4547cefa90bf96fd6eSoftware Heritage.

HIVinfected individuals receiving effective antiviral therapy for extended periods of time continually replenish their viral reservoirThe Journal of Clinical Investigation 115:3250–3255.https://doi.org/10.1172/JCI26197

Comprehensive mapping of HIV1 escape from a broadly neutralizing antibodyCell Host & Microbe 21:777–787.https://doi.org/10.1016/j.chom.2017.05.003

Broadly neutralizing antibodies against influenza virus and prospects for universal therapiesCurrent Opinion in Virology 2:134–141.https://doi.org/10.1016/j.coviro.2012.02.005

Alternatives to maximum likelihood estimation based on spacings and the Kullback–Leibler divergenceJournal of Statistical Planning and Inference 138:1778–1791.https://doi.org/10.1016/j.jspi.2007.06.031

BookStatistical Methods for Research Workers (Twelfth ed)New York: Hafner Publishing.

Stochastic relaxation, gibbs distributions, and the bayesian restoration of imagesIEEE Transactions on Pattern Analysis and Machine Intelligence 6:721–741.https://doi.org/10.1109/tpami.1984.4767596

Exact stochastic simulation of coupled chemical reactionsThe Journal of Physical Chemistry 81:2340–2361.https://doi.org/10.1021/j100540a008

Concrete mathematics: A foundation for computer scienceComputers in Physics 3:106.https://doi.org/10.1063/1.4822863

The effect of linkage on limits to artificial selectionGenetical Research 8:269–294.https://doi.org/10.1017/S0016672300010156

Broadly neutralizing antibodies and the search for an HIV1 vaccine: the end of the beginningNature Reviews Immunology 13:693–701.https://doi.org/10.1038/nri3516

New drugs for the treatment of tuberculosis: needs, challenges, promise, and prospects for the futureThe Journal of Infectious Diseases 205:S241–S249.https://doi.org/10.1093/infdis/jis034

Efficiency versus robustness: the case for minimum hellinger distance and related methodsThe Annals of Statistics 22:1081–1114.https://doi.org/10.1214/aos/1176325512

Development of CART cells for longterm eradication and surveillance of HIV1 reservoirCurrent Opinion in Virology 38:21–30.https://doi.org/10.1016/j.coviro.2019.04.004

HIV antibodies for treatment of HIV infectionImmunological Reviews 275:313–323.https://doi.org/10.1111/imr.12506

Optimizing clinical dosing of combination broadly neutralizing antibodies for HIV preventionPLOS Computational Biology 18:e1010003.https://doi.org/10.1371/journal.pcbi.1010003

Evolution of relative synonymous codon usage in Human Immunodeficiency Virus type1Journal of Bioinformatics and Computational Biology 3:157–168.https://doi.org/10.1142/s0219720005000953

Recombination rate and selection strength in HIV intrapatient evolutionPLOS Computational Biology 6:e1000660.https://doi.org/10.1371/journal.pcbi.1000660

BookStatistical Methods in Molecular EvolutionNew York: Springer.https://doi.org/10.1007/0387277331

Fierce selection and interference in bcell repertoire response to chronic HIV1Molecular Biology and Evolution 36:2184–2194.https://doi.org/10.1093/molbev/msz143

Safe and effective importance samplingJournal of the American Statistical Association 95:135–143.https://doi.org/10.1080/01621459.2000.10473909

How HIV1 entry mechanism and broadly neutralizing antibodies guide structurebased vaccine designCurrent Opinion in HIV and AIDS 12:229–240.https://doi.org/10.1097/COH.0000000000000360

Modelling viral and immune system dynamicsNature Reviews. Immunology 2:28–36.https://doi.org/10.1038/nri700

Mathematical modeling to reveal breakthrough mechanisms in the HIV Antibody Mediated Prevention (AMP) trialsPLOS Computational Biology 16:e1007626.https://doi.org/10.1371/journal.pcbi.1007626

BookThe FokkerPlanck EquationBerlin, Heidelberg: SpringerVerlag.https://doi.org/10.1007/9783642615443

Emergence of HIV1 drug resistance during antiretroviral treatmentBulletin of Mathematical Biology 69:2027–2060.https://doi.org/10.1007/s1153800792033

Rapid emergence of protease inhibitor resistance in hepatitis C virusScience Translational Medicine 2:30ra32.https://doi.org/10.1126/scitranslmed.3000544

Minimum hellinger distance estimation for the analysis of count dataJournal of the American Statistical Association 82:802–807.https://doi.org/10.1080/01621459.1987.10478501

Recent progress in broadly neutralizing antibodies to HIVNature Immunology 19:1179–1188.https://doi.org/10.1038/s4159001802357

Impact of clade diversity on HIV1 virulence, antiretroviral drug sensitivity and drug resistanceThe Journal of Antimicrobial Chemotherapy 51:229–240.https://doi.org/10.1093/jac/dkg079

Early and highly suppressive antiretroviral therapy are main factors associated with low viral reservoir in european perinatally HIVInfected childrenJAIDS Journal of Acquired Immune Deficiency Syndromes 79:269–276.https://doi.org/10.1097/QAI.0000000000001789

Tissue reservoirs of HIVCurrent Opinion in HIV and AIDS 11:362–370.https://doi.org/10.1097/COH.0000000000000293
Article and author information
Author details
Funding
Deutsche Forschungsgemeinschaft (1310)
 Armita Nourmohammad
National Science Foundation (2045054)
 Armita Nourmohammad
Max Planck Institute for Dynamics and Selforganization (Open access funding)
 Colin LaMont
 Armita Nourmohammad
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Acknowledgements
We are thankful to Jesse Bloom and Adam Dingens for their input on the Deep Mutational Scanning data of HIV, and Michael Lässig and Matthijs Meijers for helpful discussions. This work has been supported by the NSF CAREER award (grant No: 2045054), DFG grant (SFB1310) for Predictability in Evolution, and the MPRG funding through the Max Planck Society.
Version history
 Received: December 1, 2021
 Preprint posted: December 5, 2021 (view preprint)
 Accepted: July 4, 2022
 Accepted Manuscript published: July 19, 2022 (version 1)
 Version of Record published: September 12, 2022 (version 2)
Copyright
© 2022, LaMont et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics

 1,570
 views

 508
 downloads

 11
 citations
Views, downloads and citations are aggregated across all versions of this paper published by eLife.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading

 Evolutionary Biology
 Immunology and Inflammation
CD4^{+} T cell activation is driven by fivemodule receptor complexes. The T cell receptor (TCR) is the receptor module that binds composite surfaces of peptide antigens embedded within MHCII molecules (pMHCII). It associates with three signaling modules (CD3γε, CD3δε, and CD3ζζ) to form TCRCD3 complexes. CD4 is the coreceptor module. It reciprocally associates with TCRCD3pMHCII assemblies on the outside of a CD4^{+} T cells and with the Src kinase, LCK, on the inside. Previously, we reported that the CD4 transmembrane GGXXG and cytoplasmic juxtamembrane (C/F)CV+C motifs found in eutherian (placental mammal) CD4 have constituent residues that evolved under purifying selection (Lee et al., 2022). Expressing mutants of these motifs together in T cell hybridomas increased CD4LCK association but reduced CD3ζ, ZAP70, and PLCγ1 phosphorylation levels, as well as IL2 production, in response to agonist pMHCII. Because these mutants preferentially localized CD4LCK pairs to nonraft membrane fractions, one explanation for our results was that they impaired proximal signaling by sequestering LCK away from TCRCD3. An alternative hypothesis is that the mutations directly impacted signaling because the motifs normally play an LCKindependent role in signaling. The goal of this study was to discriminate between these possibilities. Using T cell hybridomas, our results indicate that: intracellular CD4LCK interactions are not necessary for pMHCIIspecific signal initiation; the GGXXG and (C/F)CV+C motifs are key determinants of CD4mediated pMHCIIspecific signal amplification; the GGXXG and (C/F)CV+C motifs exert their functions independently of direct CD4LCK association. These data provide a mechanistic explanation for why residues within these motifs are under purifying selection in jawed vertebrates. The results are also important to consider for biomimetic engineering of synthetic receptors.

 Evolutionary Biology
Understanding how plants adapt to changing environments and the potential contribution of transposable elements (TEs) to this process is a key question in evolutionary genomics. While TEs have recently been put forward as active players in the context of adaptation, few studies have thoroughly investigated their precise role in plant evolution. Here, we used the wild Mediterranean grass Brachypodium distachyon as a model species to identify and quantify the forces acting on TEs during the adaptation of this species to various conditions, across its entire geographic range. Using sequencing data from more than 320 natural B. distachyon accessions and a suite of population genomics approaches, we reveal that putatively adaptive TE polymorphisms are rare in wild B. distachyon populations. After accounting for changes in past TE activity, we show that only a small proportion of TE polymorphisms evolved neutrally (<10%), while the vast majority of them are under moderate purifying selection regardless of their distance to genes. TE polymorphisms should not be ignored when conducting evolutionary studies, as they can be linked to adaptation. However, our study clearly shows that while they have a large potential to cause phenotypic variation in B. distachyon, they are not favored during evolution and adaptation over other types of mutations (such as point mutations) in this species.