1. Evolutionary Biology
Download icon

Binding affinity landscapes constrain the evolution of broadly neutralizing anti-influenza antibodies

  1. Angela M Phillips
  2. Katherine R Lawrence
  3. Alief Moulana
  4. Thomas Dupic
  5. Jeffrey Chang
  6. Milo S Johnson
  7. Ivana Cvijovic
  8. Thierry Mora
  9. Aleksandra M Walczak
  10. Michael M Desai  Is a corresponding author
  1. Department of Organismic and Evolutionary Biology, Harvard University, United States
  2. NSF-Simons Center for Mathematical and Statistical Analysis of Biology, Harvard University, United States
  3. Quantitative Biology Initiative, Harvard University, United States
  4. Department of Physics, Massachusetts Institute of Technology, United States
  5. Department of Physics, Harvard University, United States
  6. Department of Applied Physics, Stanford University, United States
  7. Laboratoire de physique de ÍÉcole Normale Supérieure, CNRS, PSL University, Sorbonne Université, and Université de Paris, France
Research Article
  • Cited 0
  • Views 209
  • Annotations
Cite this article as: eLife 2021;10:e71393 doi: 10.7554/eLife.71393

Abstract

Over the past two decades, several broadly neutralizing antibodies (bnAbs) that confer protection against diverse influenza strains have been isolated. Structural and biochemical characterization of these bnAbs has provided molecular insight into how they bind distinct antigens. However, our understanding of the evolutionary pathways leading to bnAbs, and thus how best to elicit them, remains limited. Here, we measure equilibrium dissociation constants of combinatorially complete mutational libraries for two naturally isolated influenza bnAbs (CR9114, 16 heavy-chain mutations; CR6261, 11 heavy-chain mutations), reconstructing all possible evolutionary intermediates back to the unmutated germline sequences. We find that these two libraries exhibit strikingly different patterns of breadth: while many variants of CR6261 display moderate affinity to diverse antigens, those of CR9114 display appreciable affinity only in specific, nested combinations. By examining the extensive pairwise and higher order epistasis between mutations, we find key sites with strong synergistic interactions that are highly similar across antigens for CR6261 and different for CR9114. Together, these features of the binding affinity landscapes strongly favor sequential acquisition of affinity to diverse antigens for CR9114, while the acquisition of breadth to more similar antigens for CR6261 is less constrained. These results, if generalizable to other bnAbs, may explain the molecular basis for the widespread observation that sequential exposure favors greater breadth, and such mechanistic insight will be essential for predicting and eliciting broadly protective immune responses.

Introduction

Vaccination harnesses the adaptive immune system, which responds to new pathogens by mutating antibody-encoding genes and selecting for variants that bind the pathogen of interest. However, influenza remains a challenging target for immunization: most antibodies elicited by vaccines provide protection against only a subset of strains, largely due to the rapid evolution of the influenza surface protein hemagglutinin (HA) (Wiley et al., 1981; Smith et al., 2004). After nearly two decades of studies, numerous broadly neutralizing antibodies (bnAbs) have been isolated from humans, with varying degrees of cross-protection against diverse strains (Corti et al., 2017; Throsby et al., 2008; Dreyfus et al., 2012; Corti et al., 2011; Schmidt et al., 2015). Still, we do not fully understand many factors affecting how and when bnAbs are produced. In particular, affinity is acquired through a complex process of mutation and selection (Victora and Nussenzweig, 2012), but the effects of mutations on binding affinity to diverse antigens are not well characterized.

For example, consider two well-studied influenza bnAbs that display varying levels of breadth: CR9114 is one of the broadest anti-influenza antibodies ever found, neutralizing strains from both groups of influenza A and strains from influenza B, while CR6261 is limited to neutralizing strains from Group 1 of influenza A (Throsby et al., 2008; Dreyfus et al., 2012; Ekiert et al., 2009; Lingwood et al., 2012). Both antibodies were isolated from vaccinated donors, derive from very similar germline sequences (IGHV1 –69 and IGHJ6), and bind the conserved HA stem epitope (Figure 1—figure supplement 3; Throsby et al., 2008; Dreyfus et al., 2012; Ekiert et al., 2009). Each antibody heavy chain has many mutations (18 amino acid changes for CR9114, 14 for CR6261, Figure 1A), including seven positions that are mutated in both, yet the contributions of these mutations to affinity against different antigens remain unclear (Dreyfus et al., 2012; Avnir et al., 2014).

Figure 1 with 7 supplements see all
Binding landscapes.

(A) Sequence alignment comparing somatic heavy chains to reconstructed germline sequences. Mutations under study (purple, numbered) and excluded mutations (black) are indicated; residues are numbered by IMGT unique numbering. (B) Influenza hemagglutinin phylogenetic tree with selected antigens and breadth of CR9114 (black box) and CR6261 (gray box) indicated. (C, E), Scatterplots of the (C) CR9114 library binding affinities against three antigens, with 2D planes shown below, and (E) CR6261 library binding affinities against two antigens. (D, F) Distributions of library binding affinities for (D) CR9114 and (F) CR6261 for each antigen (gray histogram, right) separated by number of somatic mutations (boxplots, left). Numbers and percentages of variants with measurable binding are indicated at right. (G), Force-directed graph of CR9114 H1 –logKD. Each variant (node) is connected to its 16 single-mutation neighbors (edges not shown for clarity); edges are weighted such that variants with similar genotypes and –logKD tend to cluster. Nodes are colored by binding affinity to H1 (top; showing all 65,091 nodes), H3 (lower left inset; showing only the region containing nodes with –logK> 6), and Flu B (lower right inset; showing only the region containing nodes with –logKD > 6).

Figure 1—source data 1

CR9114 library -logKD to H1, H3, and influenza B.

Biological triplicates, mean, and standard error reported.

https://cdn.elifesciences.org/articles/71393/elife-71393-fig1-data1-v2.csv
Figure 1—source data 2

CR6261 library -logKD to H1 and H9.

Biological triplicates, mean, and standard error reported.

https://cdn.elifesciences.org/articles/71393/elife-71393-fig1-data2-v2.csv
Figure 1—source data 3

Isogenic flow cytometry measurements of –logKD for select CR9114 and CR6261 variants.

Inferred –logKD and standard deviation for each replicate of isogenic FACS, alongside inferred –logKD mean and SEM from Tite-Seq using the mean bin and maximum likelihood (ML, shown only for CR9114) inference methods.

https://cdn.elifesciences.org/articles/71393/elife-71393-fig1-data3-v2.xlsx

Beyond single mutational effects, it remains unknown whether there are correlated effects or strong trade-offs between binding to different antigens (pleiotropy), or non-additive interactions between mutations (epistasis). Such epistatic and pleiotropic effects can constrain the mutational pathways accessible under selection, as has been observed for other proteins (Weinreich et al., 2006; Starr et al., 2017; Ortlund et al., 2007; Podgornaia and Laub, 2015; Gong et al., 2013; Sailer and Harms, 2017b; Miton and Tokuriki, 2016; Poelwijk et al., 2019; Bank et al., 2015). Epistasis in antibody-antigen interactions remains significantly understudied (Adams et al., 2019; Pappas et al., 2014; Braden et al., 1998) and most deep mutational scanning studies have focused on antigens (Doud et al., 2018; Wu et al., 2020; Starr et al., 2021). In contrast to typical protein evolution, antibody affinity maturation proceeds by discrete rounds of mutation and selection (Victora and Nussenzweig, 2012), typically with more than one nucleotide mutation occurring between selective rounds (Unniraman and Schatz, 2007). In addition, antibodies are inherently mutationally tolerant (Braden et al., 1998; Chen et al., 1999; Burks et al., 1997; Corti and Lanzavecchia, 2013; Klein et al., 2013), generating opportunities for interactions that scale combinatorially. Thus, if epistatic and pleiotropic constraints exist for antibodies, they could affect the likelihood of producing bnAbs under different antigen selection regimes (Pappas et al., 2014) and may account for the low frequencies of bnAbs in natural repertoires (Corti et al., 2017). Characterizing the prevalence of these constraints on bnAb evolution may provide valuable insight for improving vaccination strategies (Yewdell, 2013; Henry et al., 2018).

To date, studies of antibody binding have been limited to small numbers of individual sequences, deep mutational scans of single mutations, and mutagenesis of small regions (Pappas et al., 2014; Braden et al., 1998; Burks et al., 1997; Adams et al., 2016; Koenig et al., 2017; Forsyth et al., 2013; Wu et al., 2017; Xu et al., 2015; Madan et al., 2021; Schmidt et al., 2015), due in part to practical constraints on library scale and the throughput of affinity assays. This has limited our ability to comprehensively characterize binding landscapes for naturally isolated bnAbs, which often involve many mutations spanning framework (FW) and complementarity-determining regions (CDR) (Corti et al., 2017; Corti and Lanzavecchia, 2013; Klein et al., 2013).

We overcome these challenges by generating combinatorially complete libraries of up to ∼105 antibody sequences and assaying their binding affinities in a high-throughput yeast-display system (Adams et al., 2016). This approach enables us to infer the contributions of individual mutations as well as hundreds of pairwise and higher order interactions between mutations, revealing that these interactions can restrict evolutionary pathways leading to greater breadth. In particular, we find that mutational effects on binding affinity to diverse antigens display a nested structure, where increasingly large groups of specific mutations are required to gain affinity to divergent antigens, resulting in highly constrained paths to broad affinity. This pattern is not observed for more similar antigens, where many mutational paths to broad affinity are accessible. Further, these nested patterns of mutational effects provide new molecular insight into why sequential exposure to diverse antigens often favors greater breadth (Wang et al., 2010; Krammer et al., 2012; Wang et al., 2015; Wang, 2017; Sachdeva et al., 2020; Molari et al., 2020; Sprenger et al., 2020). Together, this work provides the first comprehensive characterization of antibody affinity landscapes and advances our understanding of the molecular constraints on bnAb evolution.

Results

Binding affinity landscapes of CR9114 and CR6261

Here, we characterize the binding affinity landscapes of the two well-studied bnAbs noted above: CR9114 and CR6261. Specifically, we made all combinations of a set of mutations separating the germline and somatic sequences for CR9114 (16 mutations totaling 65,536 variants) and CR6261 (11 mutations totaling 2048 variants). These libraries include all heavy-chain mutations in these antibodies, except a few select mutations distant from the paratope (Figure 1, Figure 1—figure supplement 7, and see Materials and methods). Both antibodies engage antigens solely through their heavy-chain regions (Dreyfus et al., 2012; Ekiert et al., 2009), and thus are well-suited for yeast display as single-chain variable fragments (see Materials and methods) (Boder and Wittrup, 1997).

We use the Tite-Seq method (Adams et al., 2016), which integrates flow cytometry and sequencing (Figure 1—figure supplement 1), to assay equilibrium binding affinities of each scFv sequence in these libraries against select antigens that span the breadth of binding for each antibody (Figure 1B). For CR6261, we chose two divergent group 1 HA subtypes (H1 and H9; see Figure 1—figure supplement 1), while for CR9114, we chose the three highly divergent subtypes present in the vaccine (H1 from group 1, H3 from group 2, and influenza B; see Figure 1—figure supplement 1, Throsby et al., 2008). Inferred affinities outside our titration boundaries (10−11–10−6 M for H3 and influenza B, 10−12–10−7 M for H1 and H9) are pinned to the boundary, as deviations beyond these boundaries are likely not physiologically relevant (Batista and Neuberger, 1998). Antibody expression is not strongly impacted by sequence identity, although some mutations have modest effects that may be inversely correlated with their effect on affinity (Figure 1—figure supplement 5). Affinities obtained by Tite-Seq are reproducible across biological triplicates (Figure 1—figure supplement 2; average standard error of 0.047 -logKD units across antibody-antigen pairs) and are highly accurate as verified for select variants by isogenic flow cytometry (Figure 1—figure supplement 2) and by solution-based affinity measurements made by others (Throsby et al., 2008; Dreyfus et al., 2012; Lingwood et al., 2012; Pappas et al., 2014).

We begin by examining the distribution of binding affinities across antigens for each antibody library (Figure 1). We observe that most CR9114 variants have measurable affinity to H1 (97%), fewer to H3 (11%), and still fewer to influenza B (0.3%) (Figure 1C,D). For H1, only a few mutations are needed to improve from the germline affinity. In contrast, variants are not able to bind H3 unless they have several more mutations, and many more for influenza B. This hierarchical structure is in striking contrast to the CR6261 library, in which most variants can bind both antigens (92% for H1, 81% for H9), variants have a similar KD distribution, and many variants display intermediate affinity to both antigens (Figure 1E,F). To visualize how genotypes give rise to the hierarchical structure of CR9114 binding affinities, we represent the binding affinities for H1 as a force-directed graph. Here, each variant is a node connected to its 16 single-mutation neighbors, with edge weights inversely proportional to the change in H1 binding affinity, such that variants with similar genotype and KD tend to form clusters (Figure 1G, Figure 1—figure supplement 4). Coloring this genotype-to-phenotype map by the –logKD to each of the three antigens, we see that sequences that bind H3 and influenza B are highly localized and overlapping, meaning that they share specific mutations. Thus, while many CR9114 variants strongly bind H1, only a specific subset bind multiple antigens.

Mutational effects on binding to diverse antigens

To dissect how mutations drive the structure of these binding landscapes, we next infer specific mutational effects. We first log-transform binding affinities such that they are proportional to free energy changes (ΔGbinding), which should combine additively under the natural null expectation (Wells, 1990; Olson et al., 2014). We then define a linear model with single mutational effects and interaction terms up to a specified order (defined relative to the unmutated germline sequence, see Appendix 2 for alternatives), and fit coefficients by ordinary least squares regression. We use cross-validation to identify the maximal order of interaction for each antigen and report coefficients at each order from these best-fitting models (CR9114: fifth order for H1, fourth for H3, first for influenza B; CR6261: fourth order for H1 and H9; see Materials and methods). We note that the maximum order of interactions is affected by our inference power, particularly by the number of sequences with appreciable binding, and so we interpret these models as showing strong evidence of epistasis at least up to the order indicated. We explored the possibility of ‘global’ epistasis by inferring a nonlinear transformation of the -logKD values (Sailer and Harms, 2017a; Otwinowski et al., 2018), but found that this approach did not significantly reduce the order or number of specific interaction coefficients needed to explain the data (see Appendix 2). We also explored inferring epistasis up to full order using Walsh-Hadamard transformations; results are qualitatively similar but less conservative than cross-validated regression (see Appendix 2).

Examining the effect of individual mutations on the germline background (Figure 2A,B), we observe several mutations that enhance binding to all antigens (e.g. S83F for CR9114), and mutations that confer trade-offs for binding distinct antigens (e.g. F30S in CR9114 reduces affinity for H1 but enhances affinity for influenza B). Generally, large-effect mutations are at sites that contact HA (Figure 2C, Figure 2—figure supplement 1, Dreyfus et al., 2012; Ekiert et al., 2009). Consistent with prior biochemical and structural work, mutations essential for CR9114 breadth are spread throughout FW3 and the CDRs, forming hydrophobic contacts and hydrogen bonds with residues in the conserved HA stem epitope (Dreyfus et al., 2012; Avnir et al., 2014). We observe three specific mutations that are required for binding to H3 (present at over 90% frequency in the set of binding sequences), likely because they form hydrophobic contacts with HA (K82I and S83F) and reorient the CDR2 loop (I57S), which interacts with residues and a glycan in H3 that are distinct from those in H1 (Dreyfus et al., 2012). We also observe eight specific mutations that are required for binding to influenza B. Many of these breadth-conferring mutations are absent in CR6261, particularly those in CDR2 (Dreyfus et al., 2012; Ekiert et al., 2009). Notably, these sets of required mutations in CR9114 exhibit a nested structure: mutations beneficial for H1 are required for H3, and mutations required for H3 are required for influenza B, giving rise to the hierarchical structure of the binding landscape (Figure 1C).

Figure 2 with 1 supplement see all
First- and second-order effects.

(A, B) First-order effects inferred in best-fitting epistatic interaction models for (A) CR9114 and (B) CR6261. Mutations required for binding (present at over 90% frequency in binding sequences) have effect sizes denoted as ‘R’ and are removed from inference. Error bars indicate standard error. (C) First order effects for each site plotted against the contact surface area between the corresponding somatic residue and HA (left, CR9114; right, CR6261). Sites with notable contact area or effect size are labeled. Cocrystal structures are also shown; mutations are colored by first-order effect size. (D) Significant second-order epistatic interaction coefficients for CR9114 mutations (bottom left, H3; top right, H1). Interactions involving required mutations are shown in dark red. (E) Significant second-order coefficients for CR6261 mutations (bottom left, H9; top right, H1). Significance in (D), (E) indicates Bonferroni-corrected p-value < 0.05, see Materials and methods. (F) Second-order coefficients for H1 –logKD plotted against the distance between the respective α-carbons in the crystal structures.

Figure 2—source data 1

Interaction model coefficients for CR9114.

Coefficients are reported with standard errors, p-values, and confidence intervals (95% with Bonferroni correction by the number of parameters).

https://cdn.elifesciences.org/articles/71393/elife-71393-fig2-data1-v2.csv
Figure 2—source data 2

Interaction model coefficients for CR6261.

Coefficients are reported with standard errors, p-values, and confidence intervals (95% with Bonferroni correction by the number of parameters).

https://cdn.elifesciences.org/articles/71393/elife-71393-fig2-data2-v2.csv
Figure 2—source data 3

Tabulated contact surface area, number of HA contacts, and pairwise distances for mutations in CR9114 and CR6261.

For each mutated position, the contact surface area with HA (as plotted in Figure 2C and Figure 2—figure supplement 1) and the number of HA residues within six angstroms is tabulated for CR9114-H5 (4FQI), CR9114-H3 (4FQY), and CR6261-H1 (3GBN). Distances between alpha-carbons plotted in Figure 2F and Figure 2—figure supplement 1 are also tabulated here alongside the corresponding second order effects.

https://cdn.elifesciences.org/articles/71393/elife-71393-fig2-data3-v2.xlsx

Beyond these exceptionally synergistic interactions between required mutations, we find that epistasis is widespread, accounting for 18–33% of explained variance depending on the antibody-antigen pair (except influenza B, see Materials and methods, Appendix 2). Pairwise interactions are dominated by a few mutations (e.g. F30S for CR9114 and S35R for CR6261) that exhibit many interactions, both positive and negative, with other mutations (Figure 2D,E). Overall, mutations with strong pairwise interactions tend to be close in the crystal structure, although there are long-range pairwise interactions that are likely mediated by interactions with the antigen or conformational rearrangements (Figure 2F, Figure 2—figure supplement 1, Dreyfus et al., 2012; Ekiert et al., 2009; Avnir et al., 2014).

High-order epistasis is dominated by a subset of mutations

Our dataset also allows us to resolve higher order epistasis. In addition to the required mutations, our models identify numerous strong third to fifth order interactions, with a subset of mutations participating in many mutual interactions at all orders. For CR9114 binding to H1, this subset consists of five mutations, distributed across three different regions of the heavy chain (Figure 3A,B). Some of these mutations likely generate (K82I, S83F) or abrogate (F30S) contacts to HA, and others (I57S, A65T) may indirectly impact HA binding by reorienting contact residues in CDR2 (Dreyfus et al., 2012; Avnir et al., 2014). Within this set of five residues, we first illustrate two examples of third-order epistasis by grouping sequences by their genotypes at these five sites (Figure 3C). Intriguingly, some mutations that are deleterious in the germline background ('–' annotations) are beneficial in doubly-mutated backgrounds ('+' annotations). For example, mutation F30S is significantly less deleterious in backgrounds with S83F than in the germline background, suggesting that new hydrophobic contacts in FW3 may be able to compensate for the potential loss of contacts in CDR1. Yet F30S unexpectedly becomes beneficial after an additional mutation I57S in CDR2, indicating more complex interactions between flexible CDR and FW loop regions (Figure 3B,C; Dreyfus et al., 2012).

Figure 3 with 3 supplements see all
High-order epistasis for CR9114.

(A) Total higher order epistatic contributions of CR9114 mutation pairs for binding H1. Color bar indicates the sum of absolute values of significant higher order interaction coefficients involving each pair of mutations; key epistatic residues indicated by arrows. Significance is given by Bonferroni-corrected p-value < 0.05, see Materials and methods. (B) Location of key epistatic residues in the CR9114 –HA co-crystal structure colored by region. (C) –logKD distributions for genotypes grouped by their identity at the five residues indicated in (A), (B), with means indicated as white dots (N=8,192 genotypes per violin). Annotations indicate notable deleterious ('−') and beneficial ('+') mutational effects. (D) CR9114 force-directed graph from Figure 1G, colored as in (C) by the genotype at the five sites indicated in (A), (B). Genotypes not shown in (C) are shown in light gray. Data are also available in an interactive data browser at https://yodabrowser.netlify.app/yoda_browser/. (E), Third-order interaction involving site 64 accounts for distinct clusters (teal) corresponding to genotypes with mutations 57 and 65 in (D). Colors correspond to mutation groups in (C), (D) (N=4,096 genotypes per violin). (F), Epistatic interaction coefficients among the five key sites from (A), (B). Colors for certain groups as in (C), (D); other groups denoted in gray, with notable terms labeled.

To see how these high-order interactions drive the overall structure of the binding affinity landscape, we return to the force-directed graph, now colored by genotype at these five key sites (Figure 3D; only points corresponding to genotypes shown in Figure 3C are colored). We see that these five sites largely determine the overall structure of the map: points of the same color tend to cluster together, despite varying in their genotypes at the other 11 sites. However, we observe that interactions with other mutations do exist, as evidenced by separate clusters with the same color (e.g. the two clusters in teal for 57,65 are distinguished by a positive third-order interaction with site 64, Figure 3E). These patterns are not confined to the genotypes shown in Figure 3C; if we color all 32 possible genotypes at the five key sites, we observe the same general patterns (Figure 3—figure supplement 1; an interactive data browser for exploring these patterns of epistasis in CR9114 is available at: https://yodabrowser.netlify.app/yoda_browser/). Interactions between these five sites are also enriched for significant epistatic coefficients (p<10-3; 26 of 31 possible terms are significant, compared to an average of 4 terms among all sets of five sites, Figure 3—figure supplement 1), including the fifth order interaction between all five residues (Figure 3F). Remarkably, these five mutations underlie significant high-order epistasis for other antigens as well: all five are either required for binding or participate extensively in interactions for H3 and influenza B (Figure 3—figure supplement 3).

Higher order epistasis in CR6261 is similarly dominated by a subset of mutations in CDR1 and FW3, at identical or neighboring positions as some key sites for CR9114 (Figure 4A). These mutations exhibit strong diminishing returns epistasis at third and fourth order, counteracting their synergistic pairwise effects, in a similar manner across both antigens (Figure 4B, Figure 4—figure supplement 1Figure 4—figure supplement 2). Many fourth-order combinations of these mutations display interaction coefficients of similar magnitude (Figure 4—figure supplement 1), though they may be signatures of even higher order interactions that we are underpowered to infer.

Figure 4 with 2 supplements see all
High-order epistasis for CR6261.

(A) Total significant epistatic contributions of CR6261 mutation pairs for binding H1, as in Figure 3A. Significance is given by Bonferroni-corrected p-value < 0.05, see Materials and methods. (B) Third-order interaction for CR6261 H1 binding between mutations T29P, S35R, and S83F (N=256 genotypes per violin).

A common approach to quantify how epistasis constrains mutational trajectories is to count ‘uphill’ paths (i.e. where affinity improves at every mutational step from the germline to the somatic sequence). We find that only a small fraction of potential paths are uphill (0.00005% +/- 0.00004% for CR9114 binding H1, and 0.2% +/- 0.04% for CR6261 binding H1, as estimated by bootstrap, see Materials and methods). However, we note that for all antibody-antigen combinations, the somatic sequence is not the global maximum of the landscape (the best-binding sequence) and some mutations have deleterious effects on average. Hence, strictly uphill paths are only possible due to sign epistasis, where normally deleterious mutations have beneficial effects in specific genetic backgrounds.

Overall, we see that mutational effects and interactions between them explain the affinity landscapes we observe. For CR9114, binding affinity to H1 can be achieved through different sets of few mutations with complex interactions. In contrast, a specific set of many mutations with strong synergistic interactions is required to bind H3, and to an even greater extent, influenza B (Figure 2A), giving rise to the landscape's hierarchical structure (Figure 1C). For CR6261, the higher order interactions are more similar between H1 and H9, which is consistent with the more correlated patterns of binding affinities between these two antigens (Figure 1E).

Affinity to diverse antigens was likely acquired sequentially

The hierarchical nature of the CR9114 landscape suggests that this lineage developed affinity to each antigen sequentially. Considering the maximum –logKD achieved by sequences with a given number of mutations (a proxy for time), we see that improvements in H1 binding can be realized early on, whereas improvements in H3 binding are not possible until later, and even later for influenza B (Figure 5A). In fact, the nested structure of affinity-enhancing mutations forces improvements in binding affinity to occur sequentially. If selection pressures were also experienced in this sequence, mutations that improve binding to the current antigen would lead to the genotypes required to begin improving binding to the next. Indeed, we find that for CR9114, there are more uphill paths leading to the somatic sequence if selection acts first on binding to H1 and later to H3 and influenza B (Figure 5C). In contrast, for CR6261, improvements in binding can occur early on for both antigens (Figure 5B) and the number of uphill paths is more similar across single-antigen and sequential selection pressures (Figure 5D).

Figure 5 with 3 supplements see all
Antigen selection scenarios and likely mutational pathways.

(A, B) Maximum binding affinity achievable for sequences with a given number of mutations. For each antigen for (A) CR9114 and (B) CR6261, the maximum observed (circles) and model-predicted (triangles) affinity for each number of somatic mutations is shown. (C, D) Total number of 'uphill' paths for select antigen selection scenarios (colored bars) for (C) CR9114 and (D) CR6261. Error bars indicate standard error obtained through bootstrap, see Materials and methods. (E, F) Total log probability (in arbitrary units) of mutational trajectories from germline to somatic sequence for (E) CR9114 and (F) CR6261 under different antigen selection scenarios, in a moderate selection model. Error bars indicate standard error obtained through bootstrap, see Materials and methods. (G, H) 25 most likely paths for (G) CR9114 and (H) CR6261, from select scenarios in (E, F); –logKD plotted for each antigen. For the random mixed scenario ('R'), a representative case is shown. ’A’ indicates the average mixed scenario; ’O’ indicates the optimal scenario. (I, J) Probability of mutation order under optimal antigen selection scenario ‘O' for CR9114 (I) and H1 for CR6261 (J). Selection scenarios are as in (E, F) and shown in colored bar at top; the total probability (through all possible paths) for each mutation to occur at each mutational step is shown as stacked colored bars.

Figure 5—source data 1

Total probability of mutational trajectories for CR9114 under different antigen selection scenarios.

Mean and standard error across 10 bootstrap samples are reported for moderate, weak, and strong selection strengths.

https://cdn.elifesciences.org/articles/71393/elife-71393-fig5-data1-v2.csv
Figure 5—source data 2

Total probability of mutational trajectories for CR6261 under different antigen selection scenarios.

Mean and standard error across 10 bootstrap samples are reported for moderate, weak, and strong selection strengths.

https://cdn.elifesciences.org/articles/71393/elife-71393-fig5-data2-v2.csv

To compare antigen selection scenarios more generally, we developed a framework that evaluates the total probability of all possible mutational pathways from germline to somatic, under an array of antigen selection scenarios (individual, sequential, and mixed). Our framework assumes that the probability of any mutational step is higher if –logKD increases, but does not necessarily forbid neutral or deleterious steps; we evaluate a variety of specific forms of this step probability and find that our major results are consistent (Figure 5—figure supplement 1A, see Materials and methods). We assume that each amino acid substitution occurs in a single mutational step; though there are amino acid substitutions that must proceed by multiple nucleotide mutations that may occur in a single round, or over multiple rounds, of somatic hypermutation (Spisak et al., 2020; Unniraman and Schatz, 2007). Mixed antigen regimes approximate exposure to a cocktail of antigens. We model these with two approaches: (1) ‘average’, using the average –logKD across all antigens, and (2) 'random', using –logKD for a randomly selected antigen at each step (note that using the maximum –logKD across antigens would always be trivially favored) (Wang et al., 2015). While these models simplify the complexities of affinity maturation in vivo (Victora and Nussenzweig, 2012), especially how affinity relates to B cell lineage dynamics and the mutational bias at the nucleotide level (Spisak et al., 2020), they provide insight into the relative probabilities of mutational paths under distinct antigen selection scenarios.

Again we find that the vast majority of likely antigen selection scenarios for CR9114 involve first H1, followed by H3, followed by influenza B (Figure 5E, Figure 5—figure supplement 1B). These results are underscored by examining improvement in –logKD along the most likely mutational paths for each scenario (Figure 5G): in the optimal sequential scenario, –logKD can improve substantially for each antigen in turn, while in an H1-only scenario, the improvements in H1 binding at each step are much more gradual, reducing the likelihood. The average mixed scenario shows qualitatively similar paths to the optimal sequential scenario, although with lower overall probability. In the random mixed scenario, even the best pathways are often unable to improve affinity to the randomly selected antigen, and affinity to antigens not under selection often declines, making these scenarios much less likely.

Given the optimal sequential selection scenario, the vast majority of genotypes are unlikely evolutionary intermediates to the somatic sequence (Figure 5—figure supplement 2). We visualize the impact of epistasis on mutational order by considering the probability of each mutation to occur at each mutational step (Figure 5I; Figure 5—figure supplement 3). The three antigen exposure epochs exhibit clear differences in favored mutations. Mutations I57S, K82I, and S83F must occur early, due to their strong synergistic interactions for all three antigens. In addition, we see that F30S is unlikely to happen very early (due to its sign epistasis under H1 selection) as well as unlikely to happen very late (due to its strong benefit under influenza B selection).

In contrast, for CR6261, all selection scenarios have relatively similar likelihood (Figure 5F, Figure 5—figure supplement 1C). Among sequential scenarios, however, those beginning with H1 are more likely than those beginning with H9, as the first two mutational steps can improve affinity to H1 more than H9, and mutations late in maturation can improve affinity to H9 more than H1 (Figure 1F, Figure 5B). Still, unlike CR9114, in both single antigen and mixed scenarios, there are many likely paths that continually improve in binding to both antigens (Figure 5H). Initially the order of mutations is highly constrained due to strong synergistic epistasis, and differences between selection scenarios reflect differences in mutational effects between antigens (Figure 5J, Figure 5—figure supplement 3). We note that T29P is highly likely to occur first in scenarios that begin with H1, as this is the only single mutation that can improve H1 affinity, albeit rather modestly.

Discussion

Overall, we find that evolutionary pathways to bnAbs can be highly contingent on epistatic and pleiotropic effects of mutations. Specifically, the acquisition of breadth for CR9114 is extremely constrained and is likely to have occurred through exposure to diverse antigens in a specific order, due to the structure of correlations and interactions between mutational effects. In contrast, CR6261 could have acquired affinity to H1 and H9 in a continuous and simultaneous manner, perhaps because these antigens are more similar; since H9 is not a commonly circulating strain, this breadth was likely acquired by chance (Pappas et al., 2014).

We note that we cannot conclusively determine how CR9114 and CR6261 evolved in vivo. The isolation of these specific antibodies from phage display libraries (Throsby et al., 2008; Dreyfus et al., 2012) was likely biased by the HA subtypes used for screening, and although unlikely, may have introduced mutations during PCR amplification. Regardless, these antibody sequences occupy regions of sequence space that are useful for understanding the relationship between sequence, affinity, and breadth. By characterizing their binding landscapes, we find that epistasis and trade-offs constrain the mutational pathways to these specific somatic sequences and their associated breadth. Indeed, we find that not all of the observed mutations are required to confer broad affinity, and future work is needed to explore what alternative pathways to breadth might be accessible through other mutations. It is also worth noting that selection pressure to bind the HA stem epitope on virions may be different from pressure to bind soluble recombinant HA, although several studies have found anti-stem antibody affinity to recombinant HA to be indicative of viral neutralization (Dreyfus et al., 2012; Corti et al., 2011; Lingwood et al., 2012). Further, stem-targeting bnAbs and their germline precursors have been characterized as polyreactive (Bajic et al., 2019; Guthmiller et al., 2020) and thus likely experience additional selection pressures that are not captured by our measurements and models, such as negative selection against autoreactivity. Although we cannot determine which specific antigens were involved in the selection of these antibodies in vivo, the diverse HA subtypes we employ capture variation representative of circulating influenza strains and thus serve as useful probes of varying levels of breadth (Corti et al., 2017). Future work integrating these measurements of affinity and breadth with measurements of stability and polyreactivity will provide important insight into the molecular constraints of bnAb evolution.

Notably, the landscapes characterized here are among the largest combinatorially complete collections of mutations published to date. In some respects, our observations of high-order interactions are consistent with earlier work in other proteins. In particular, epistasis has been found to affect function and constrain evolutionarily accessible pathways across functionally and structurally distinct proteins (Weinreich et al., 2006; Starr et al., 2017; Ortlund et al., 2007; Podgornaia and Laub, 2015; Gong et al., 2013; Sailer and Harms, 2017b; Miton and Tokuriki, 2016; Poelwijk et al., 2019; Bank et al., 2015). Further, pairwise and high-order epistasis appear to be common features of binding interfaces, such as enzyme-substrate and receptor-ligand interactions (Weinreich et al., 2006; Starr et al., 2017; Ortlund et al., 2007; Podgornaia and Laub, 2015; Sailer and Harms, 2017b; Miton and Tokuriki, 2016), and interacting mutations are often spaced in both sequence and structure, underscoring the complexity of protein-protein interfaces (Podgornaia and Laub, 2015; Adams et al., 2019; Braden et al., 1998; Esmaielbeiki et al., 2016; Rotem et al., 2018). On the other hand, the strongly synergistic, nested mutations crucial for CR9114 breadth are unusual, perhaps due to the nature of antibody-antigen interfaces or to the unique dynamics of affinity maturation (Victora and Nussenzweig, 2012). Together, these observations suggest that interactions between multiple mutations, such as those we characterize here, could play a substantial role in affinity maturation and may contribute to the rarity of bnAbs in natural repertoires.

Our findings provide molecular insight into the emerging picture of how selection can elicit broad affinity, illustrated by a substantial recent body of work ranging from in vivo experimental approaches (Krammer et al., 2012; Wang et al., 2010) to quantitative modeling of immune system dynamics (Wang et al., 2015; Wang, 2017; Sachdeva et al., 2020; Molari et al., 2020; Sprenger et al., 2020). These diverse studies often find that mixed-antigen regimens are less effective than sequential regimens at eliciting bnAbs. Our results demonstrate that, at least in part, this may be due to the intrinsic structure of the mutational landscape, defined by the complex interactions of mutational effects across antigens. With more studies of binding landscapes for diverse antibodies, we could better understand how such features generalize between different germline sequences, somatic mutation profiles, and antigen molecules. These insights will be valuable for leveraging germline sequence data and antigen exposure information to predict, design, and elicit bnAbs for therapeutic and immunization applications.

Materials and methods

Key resources table
Reagent type
(species) or resource
DesignationSource or referenceIdentifiersAdditional information
Strain, strain background (Saccharomyces cerevisiae)EBY100ATCCCat#:MYA-4941
Cell line (Spodoptera frugiperda)Sf9ThermoFisherCat#:B82501Cell line for production of baculovirus
Cell line (Trichoplusia ni)High-FiveThermoFisherCat#:B85502Cell line for HA expression
AntibodyAnti-cMyc-FITC (Mouse monoclonal)Miltenyi BiotecCat#:130-116-485FACS (1:50)
Recombinant DNA reagentpCT302 (plasmid)AddgeneCat#:41845
Recombinant DNA reagentpCT302_CR9114 _germline (plasmid)This paperPlasmid map in Supplementary file 4
Recombinant DNA reagentpCT302_CR9114 _somatic (plasmid)This paperPlasmid map inSupplementary file 5
Recombinant DNA reagentpCT302_CR6261 _germline (plasmid)This paperPlasmid map inSupplementary file 6
Recombinant DNA reagentpCT302_CR6261 _somatic (plasmid)This paperPlasmid map inSupplementary file 7
Recombinant DNA reagentpET21a-BirA (plasmid)AddgeneCat#:20857
Sequence-based reagentCR9114 golden gate dsDNA fragmentsIDTSequences listed inSupplementary file 2
Sequence-based reagentCR6261 Golden Gate primersIDTSequences listed inSupplementary file 3
Sequence-based reagentIllumina sequencing primersIDTSequences listed inSupplementary file 1
Peptide, recombinant proteinStreptavidin-RPEThermoFisherCat#:S866FACS (1:100)
Peptide, recombinant proteinBiotinylated A/New Caledonia/99 (H1) ectodomainThis paperPlasmid sequence inSupplementary file 8
Peptide, recombinant proteinBiotinylated A/Hong Kong/99 (H9) ectodomainThis paperPlasmid sequence inSupplementary file 9
Peptide, recombinant proteinBiotinylated A/Wisconsin/05 (H3) ectodomainThis paperPlasmid sequence inSupplementary file 10
Peptide, recombinant proteinBiotinylated B/Ohio/05 (Flu B) ectodomainThis paperPlasmid sequence inSupplementary file 11
Commercial assay or kitBac-to-Bac KitThermoFisherCat#:10359016
Commercial assay or kitZymo Yeast Plasmid Miniprep IIZymo ResearchCat#:D2004
Software, algorithmCustom codeThis paperhttps://github.com/klawrence26/bnab-landscapes (copy archived at swh:1:rev:61c1673a101ea739d5b7e9b282f6bcfad41d7e90Phillips, 2021)
Software, algorithmInteractive CR9114 data browserThis paperhttps://yodabrowser.netlify.app/yoda_browser/

Antibody library production

Germline sequence reconstructions

Request a detailed protocol

For CR9114, we obtained the somatic heavy chain nucleotide sequence from Dreyfus et al., 2012 (GenBank JX213639.1) and reconstructed the germline nucleotide sequence using IMGT (Giudicelli et al., 2006) and IgBLAST (Ye et al., 2013). Both methods assigned the same V-gene and J-gene alleles (IGHV1-69*06 and IGHJ6*02), but there is ambiguity in the D-gene assignment and at the V-D junction, particularly at site 109. The preferred IMGT junction alignment assigns a mutation here, S109N, while a different junction alignment from IgBLAST does not. Because of the inherent difficulty of reconstructing mutations in the junction region, especially in antibodies with a short D region, we chose the alignment without the mutation at site 109. Our reconstructed germline nucleotide sequence is available in Supplementary file 12. We then took the resulting germline and somatic amino acid sequences, as shown in Figure 1A, and constructed new nucleotide sequences codon-optimized for yeast.

For CR6261, the somatic and reconstructed germline heavy chain amino acid sequences were published in Lingwood et al., 2012. We used these sequences, similarly constructing codon-optimized nucleotide sequences for expression in yeast. The original somatic nucleotide sequence is also available (GenBank HI919029.1).

We note that all antibody libraries and clonal strains were constructed using somatic forms of the light chain, as these antibodies were isolated by combinatorial phage display (Throsby et al., 2008; Dreyfus et al., 2012), and so it is not possible to infer the naturally paired germline light chain sequence. Additionally, the CR9114 and CR6261 light chains were previously determined not to impact binding (Lingwood et al., 2012; Dreyfus et al., 2012; Ekiert et al., 2009). The somatic light chain sequence for CR9114 was obtained from Dreyfus et al., 2012 (GenBank JX213640.1), and that for CR6261 was obtained from Throsby et al., 2008 (GenBank HI919031.1).

Mutation selection

Request a detailed protocol

CR9114 contains a total of 18 amino acid substitutions between the somatic variant and the reconstructed germline sequence. However, a library of 218=262,144 variants would be costly and time-consuming to produce and assay via our methods. We therefore identified two mutations that were distant from antigen contacts in the crystal structure: A25S and E51D (Dreyfus et al., 2012). We measured binding affinities for somatic sequences with and without these two mutations, and found that these variants had comparable affinities for both H1 and H3 (Figure 1—figure supplement 7). Although these mutations may have some small impact on binding, especially in combination with others, excluding them allowed for a simpler cloning strategy and a more manageable library size.

Similar to the CR9114 library design, we reduced the number of mutations present in the CR6261 library by excluding three mutations that were distant from antigen contacts in the crystal structure: 6QE, L50P, and V101M (Ekiert et al., 2009). We validated the marginal contribution of these mutations to binding by measuring the binding affinities for the somatic sequence with and without these mutations reverted to the respective germline residue (Figure 1—figure supplement 7).

Yeast display plasmid and strains

Request a detailed protocol

To generate clonal yeast display strains and libraries for CR9114, we cloned scFv constructs (VL -Ser(Gly4Ser)5-VH-Myc) into the pCT302 plasmid (Midelfort et al., 2004) (kind gift from Dane Wittrup; Addgene, Watertown, MA, #41845). For the clonal CR9114 somatic and germline strains, gene blocks corresponding to the somatic or inferred germline sequences were cloned into pCT302 by Gibson Assembly (Gibson et al., 2009) (plasmid maps in Supplementary files 45). For producing the plasmid backbone required for Golden Gate library generation (described below), we removed an existing Bsa-I site from the pCT302 plasmid by site-directed mutagenesis (Agilent, Santa Clara, CA, #200521) and replaced the VH domain with the ccdB gene. To generate clonal yeast strains, Gibson Assembly products were transformed into electrocompetent DH10B E. coli cells, and the resulting plasmids were mini-prepped and Sanger sequenced. Following sequence confirmation, plasmids were transformed into EBY100 yeast cells (ATCC #MYA-4941) as described in the high-efficiency yeast transformation protocol (Gietz and Schiestl, 2007). Transformants were plated on SDCAA-agar (1.71 g/L YNB without amino acids and ammonium sulfate [Sigma-Aldrich #Y1251], 5 g/L ammonium sulfate [Sigma-Aldrich #A4418], 2% dextrose [VWR #90000–904], 5 g/L Bacto casamino acids [VWR #223050], 100 g/L ampicillin [VWR #V0339], 2% Difco Noble Agar [VWR #90000–774]) and incubated at 30°C for 48 hr, single colonies were restruck on SDCAA-agar and again incubated at 30°C for 48 hr, and the resulting clonal yeast strains were verified to have the construct of interest by colony PCR. Construction of the yeast libraries is described below. All yeast strains were grown to saturation in SDCAA at 30°C, supplemented with 5% glycerol, and stored at −80°C.

CR6261 clonal yeast display strains and libraries were generated in an identical manner to that of CR9114, except where noted below (see Supplementary files 67 for plasmid maps corresponding to the germline and somatic sequences).

Golden gate assembly

Request a detailed protocol

For CR9114, due to the number of mutations required and their positions along the heavy chain coding sequence, we designed a library cloning strategy using Golden Gate combinatorial assembly (Engler et al., 2008). We divided the heavy chain coding region into five roughly equal fragments, ranging from 79 to 85 bp and each containing between 1 and 5 mutations. We added BsaI sites and additional overhangs to both ends of each fragment sequence, with cut sites carefully chosen so that the five fragments will assemble uniquely in their proper order within the plasmid backbone. For each fragment with n mutations, we then ordered 2n individual DNA duplexes with each possible combination of mutations (ranging from 2 to 32 versions for each fragment, a total of 66 fragments) from IDT (Coralville, IA) (see Supplementary file 2). By pooling the versions of each fragment in equal volumes, then pooling the five fragment pools in equimolar ratios, we obtained a randomized fragment mix containing all 216 sequences present at approximately equal frequencies.

In addition to the fragment mix, we prepared the plasmid backbone for the Golden Gate reaction. We created a version of the yeast display plasmid with the counter-selection marker ccdB in place of the heavy chain sequence, with flanking BsaI sites (see above). We performed Golden Gate cloning using BsaI-HFv2 (NEB, Ipswich, MA, #R3733) following the manufacturer recommended protocol, with a 5:1 molar ratio of the fragment insert pool to plasmid backbone.

We transformed the assembly mix into electrocompetent E. coli (DH10B) via electroporation in 10 x 50 μL cell aliquots. We recovered each transformation in 5 mL SOC (2% tryptone, 0.5% yeast extract, 10 mM NaCl, 2.5 mM KCl, 10 mM MgCl2, 10 mM MgSO4, 20 mM glucose) at 37°C for 1 hr, and then transferred each to 100 mL of molten LB (1% tryptone, 0.5% yeast extract, 1% NaCl) containing 0.3% SeaPrep agarose (VWR, Radnor, PA #12001–922) spread into a thin layer in a 1L baffled flask (about 1 cm deep). The mixture was allowed to set on ice for an hour, after which it was kept for 18 hr at 37C to allow for dispersed growth of colonies in 3D. We observed 3×105 colonies per aliquot, for a total of 3 million transformants. After mixing the flasks by shaking for 1 hr, we pelleted the cells and prepared plasmid by standard midiprep (Zymo Research, Irvine, CA, D4201), from which we obtained >120 μg of purified plasmid.

For CR6261, we designed a library cloning strategy also using Golden Gate combinatorial assembly, but with fragments created by PCR instead of purchased. We divided the heavy chain coding region into three roughly equal fragments, each containing between 2 and 5 mutations. We designed these fragments such that the mutations they contain are close to the 3’ or 5’ ends and can thus be easily incorporated by PCR. PCR primers included mutations, BsaI sites, and unique overhangs chosen so that the three fragments would assemble uniquely in their proper order within the plasmid backbone. For each version of the three fragments, we generated dsDNA by PCR (52 PCR reactions in total; see Supplementary file 3 for primer sequences). By pooling all versions of each fragment in equal volumes, then pooling the three fragment pools in equimolar ratios, we obtain a randomized fragment mix that, when ligated in the Golden Gate reaction, produces all of the 211 sequences present at approximately equal frequencies.

In addition to the fragment mix, we prepared the plasmid backbone for the Golden Gate reaction. We created a version of the yeast display plasmid with the counter-selection marker ccdB in place of the three-fragment sequence, with flanking BsaI sites. We performed Golden Gate cloning using BsaI-HFv2 (NEB #R3733) following the manufacturer recommended protocol, with a 7:1 molar ratio of fragment inserts to plasmid backbone.

The transformation of the CR6261 library into E. coli was conducted in a similar fashion to that of CR9114, except that 8x50 μL cell aliquots were transformed, and 600,000 colonies were pooled for plasmid midiprep.

Yeast library production

Request a detailed protocol

We then transformed the CR9114 plasmid library into EBY100 cells by standard high-efficiency protocols (Gietz and Schiestl, 2007). We recovered transformants in molten SDCAA (1.71 g/L YNB without amino acids and ammonium sulfate (Sigma-Aldrich #Y1251), 5 g/L ammonium sulfate (Sigma-Aldrich, St. Louis, MO, #A4418), 2% dextrose (VWR #90000–904), 5 g/L Bacto casamino acids (VWR #223050), 100 g/L ampicillin (VWR # V0339)) containing 0.35% SeaPrep agarose (VWR #12001–922) spread into a thin layer (about 1 cm deep). The mixture was allowed to set on ice for an hour, after which it was kept for 48 hr at 30°C to allow for dispersed growth of colonies in 3D. From five such flasks, we obtained ∼700,000 colonies (>10 times the library diversity). After mixing the flasks thoroughly by shaking for 1 hr, we grew cells in 5 mL tubes of liquid SDCAA for five generations and froze the saturated culture in 1 mL aliquots with 5% glycerol.

The CR6261 yeast library was generated in a manner identical to that of CR9114, except that ∼60,000 colonies were pooled due to the smaller library size.

Isogenic strain production

Request a detailed protocol

In addition to the full library, for both CR9114 and CR6261 we assayed a small number of variants by low-throughput flow cytometry for Tite-Seq validation. Any individual variant in the library can be produced in the same manner as described above: we simply selected the DNA duplex fragments corresponding to each desired variant and set up an individual Golden Gate reaction. The resulting assembled plasmid was transformed into E. coli, mini-prepped, and transformed into EBY100 in the same manner as described above. We verified the sequence identity of each variant by Sanger sequencing the entire scFv sequence.

We also constructed isogenic strains for validation experiments with genotypes that are not present in the full library. For CR9114, to test the impact of excluding mutations A24S and E46D, we constructed a strain containing the remaining 16 somatic mutations by cloning a gene block of the corresponding VH sequence into the germline CR9114 pCT302 plasmid via Gibson Assembly (Figure 1—figure supplement 7). For CR6261, we similarly constructed a strain with the Q6E, L50P, and V101M mutations reverted.

Antigen production

Choice of HA antigens

Request a detailed protocol

CR9114 was isolated from pooled PBMC from three donors who had received the trivalent 2006 influenza vaccine (Throsby et al., 2008; Dreyfus et al., 2012), which contained A/New Caledonia/20/1999 (H1N1), A/Wisconsin/67/2005 (H3N2), and B/Malaysia/2506/2004 (Victoria lineage) (Ekiert et al., 2011). CR6261 was isolated from pooled PBMC from the same three donors, plus an additional seven donors who did not receive the vaccine (Throsby et al., 2008). Because PBMC were isolated only 7 days after vaccination, although it is possible that CR6261 and CR9114 matured in response to these specific antigens, it is more likely that the vaccine elicited memory recall of these antibodies (Victora and Nussenzweig, 2012). Here, we chose to measure binding affinities to diverse antigens spanning the range of breadth for both CR9114 and CR6261. CR9114 neutralizes strains across influenza A (groups 1 and 2) and influenza B, so we measured affinities to one strain from each of these groups, and selected vaccine-like strains: A/New Caledonia/20/1999 (H1N1), A/Wisconsin/67/2005 (H3N2), and B/Ohio/1/2005 (Victoria lineage). CR6261 neutralizes strains across influenza A group 1, thus we measured affinities to two strains from distinct subtypes within group 1: A/New Caledonia/20/1999 (H1N1) and A/Hong Kong/1073/1999 (H9N2). We note that CR9114 indeed binds A/Hong Kong/1073/1999 (H9N2) (Dreyfus et al., 2012), but CR9114 variant affinities for this strain were not measured here, as we prioritized measurements to antigens that span the breadth of each antibody.

HA cloning, expression, and purification

Request a detailed protocol

Trimeric hemagglutinin (HA) antigen was produced as previously described (Ekiert et al., 2011; Dreyfus et al., 2012; Margine et al., 2013). Briefly, the HA ectodomain (Influenza A: residues 11–329 of HA1 and 1–176 of HA2 (H3 numbering); Influenza B: residues 1–523) of Influenza A/New Caledonia/1999 H1, Influenza A/Hong Kong/1999 H9, Influenza A/Wisconsin/2005 H3, and Influenza B/Ohio/2005, with N-terminal gp67 signal peptide and C-terminal biotinylation site (GGGLNDIFEAQKIEWHE), thrombin cleavage site, trimerization domain and His6 tag, were cloned into pFastbac (plasmid maps in Supplementary files 811). Recombinant bacmid was generated using the ThermoFisher Bac-to-Bac kit (ThermoFisher, Waltham, MA, #10359016). Sf9 cells (ThermoFisher #B82501, not authenticated but verified to be mycoplasma-negative) were then transfected (ThermoFisher #A38915, not authenticated but verified to be mycoplasma-negative) with the resulting bacmids, and P0 HA-baculovirus was harvested 7 days post-transfection by clarifying viral supernatant at 1000 x g for 10 min. HA-baculovirus was then amplified twice by successively infecting 187 million Sf9 cells with 100 μL of viral supernatant and incubating in a humidified incubator at 28°C for 12 days. To induce HA expression, 105 million High-Five cells (ThermoFisher #B85502) were resuspended with 15 mL P2 HA-baculovirus, incubated for 20 min at room temperature, and then transferred to a 1 L non-baffled flask with 200 mL Corning Express-Five media (ThermoFisher #10486025) supplemented with 18 mM L-glutamine (VWR #45000–676). Expression cultures were incubated in a shaking incubator at 28C and 110 rpm for 48 hr, after which HA-containing media was clarified by spinning first at 1000 x g for 5 min at 4°C, and then by spinning the resulting supernatant again at 4000 x g for 20 min at 4C. The clarified media was then dialyzed into PBS (VWR #45000–448) by performing 4 x 2 hr 10-fold buffer exchanges to remove metal chelators from culture media. Dialyzed media was then combined with 10 mL equilibrated NiNTA resin (ThermoFisher #R90101), gently shaken for 3 hr at 4°C, and loaded onto a column. The resin was washed first with 15 column volumes Wash Buffer 1 (50 mM Tris pH 8 at 4C, 300 mM KCl, 10 mM imidazole) and subsequently with 15 column volumes Wash Buffer 2 (50 mM Tris pH 8 at 4C, 300 mM KCl, 20 mM imidazole). HA was eluted from the resin after 10 min incubation with Elution Buffer (50 mM Tris pH 8 at 4°C, 300 mM KCl, 250 mM imidazole). HA was then buffer exchanged into PBS using 10 KDa Amicon Ultra Centrifugal Filters (Millipore Sigma, Burlington, MA #UFC901008) and concentrated to at least 1 mg/mL for downstream biotinylation.

BirA expression and purification

Request a detailed protocol

BirA was expressed and purified as previously described (Ekiert et al., 2011). Briefly, pET21a-BirA expression plasmid (Howarth et al., 2005) (kind gift from Alice Ting; Addgene #20857) was transformed into BL21 (DE3). Transformed BL21 cells were grown in 4 L baffled flasks with 1 L low-salt LB medium (5 g/L NaCl, 5 g/L yeast extract (VWR #90000–722), 10 g/L tryptone (VWR #90000–286)) at 37°C to an OD (600 nm) of ∼0.8. The culture was then moved into cold water to bring it to 23°C, IPTG was added to a final concentration of 1 mM, and the culture was incubated at 23C for ∼16 hr. The culture was then harvested by centrifugation (3000 x g, 10 min), resuspended in 30 mL lysis buffer (50 mM Tris pH 8 at 4C, 300 mM KCl, 10 mM imidazole, EDTA-free protease inhibitor cocktail tablet (Millipore Sigma #4693159001)), lysed by sonication (Branson Sonifier 450), and shaken at 4°C for 30 min. Lysate was clarified by spinning at 25,000 x g for 1 hr, and then the supernatant was incubated with 5 mL NiNTA resin at 4°C for 3 hr with gentle shaking. The resin was pelleted by spinning at 500 x g for 5 min and washed twice by gentle shaking with 35 mL lysis buffer at 4°C for 30 min. Protein was eluted with 20 mL Elution Buffer (50 mM Tris pH 8 at 4°C, 300 mM KCl, 250 mM imidazole), buffer exchanged into Storage Buffer (50 mM Tris pH 7.5 at 4°C, 200 mM KCl, 5% glycerol) using 10 KDa Amicon Ultra Centrifugal Filters (Millipore Sigma #UFC901008), flash frozen in liquid nitrogen, and stored in single-use aliquots at −80°C.

Biotinylation and HA-biotin quality control

Request a detailed protocol

Purified hemagglutinin was biotinylated as previously described (Fairhead and Howarth, 2015; Ekiert et al., 2011). Briefly, 100 μL HA (> 1 mg/mL) was incubated with 0.5 μL 1 M MgCl2, 2 μL 100 mM ATP, 0.5 μL 50 mM biotin, and 2.5 μL BirA (10 mg/mL). This was mixed by gentle pipetting and incubated at 30C with gentle rocking. After 1 hr incubation, equivalent amounts of ATP, BirA, and biotin were added to the reaction, which was incubated for an additional hour at 30°C. Following the 2 hr incubation, the 100 μL reaction was exchanged thrice into 15 mL PBS using a 50 KDa MWCO buffer exchange column (Millipore Sigma #UFC905008). The degree of biotinylation was then assessed by a streptavidin gel-shift assay, as previously described (Fairhead and Howarth, 2015). Briefly, 10-fold molar excess streptavidin (Millipore Sigma #189730) was added to 4 μg biotinylated HA and incubated at room temperature for 5 min prior to running on SDS-PAGE. Gels were transferred to nitrocellulose membranes and probed with mouse anti-His monoclonal antibodies (ThermoFisher #R930-25) and Goat-anti-mouse secondary antibodies (LiCor, Lincoln, NE, Cat#925–32210). HA was verified to be > 80% biotinylated by densitometry.

Tite-Seq assays

Tite-Seq was performed essentially as previously described (Adams et al., 2016), with some modifications as detailed below. For each antibody-antigen pair, three replicate Tite-Seq assays were performed on different days.

Induction of antibody expression

Request a detailed protocol

On day 1, yeast scFv libraries, as well as germline and somatic clonal strains, were thawed by inoculating 5 mL SDCAA (1.71 g/L YNB without amino acids and ammonium sulfate (Sigma-Aldrich #Y1251), 5 g/L ammonium sulfate (Sigma-Aldrich #A4418), 2% dextrose (VWR #90000–904), 5 g/L Bacto casamino acids (VWR #223050), 100 g/L ampicillin (VWR # V0339)) with 150 μL glycerol stock (saturated culture with 5% glycerol) and rotated at 30°C for 20 hr. On day 2, yeast cultures were back-diluted to OD600 = 0.2 in 5 mL SDCAA and rotated at 30°C for approximately 4 hr, or until reaching log phase (OD600 = 0.4–0.8). 1.5 mL log-phase cells were then pelleted, resuspended in 4 mL SGDCAA (1.71 g/L YNB without amino acids and ammonium sulfate (Sigma-Aldrich #Y1251), 5 g/L ammonium sulfate (Sigma-Aldrich #A4418), 0.2% dextrose (VWR #90000–904), 1.8% galactose (Sigma-Aldrich #G0625), 5 g/L Bacto casamino acids (VWR #223050), 100 g/L ampicillin (VWR #V0339)), and rotated at room temperature for 20–22 hr.

Primary antigen labeling

Request a detailed protocol

On day 3, 20–22 hr post-induction, yeast cultures were pelleted, washed twice with 0.1% PBSA (VWR #45001–130; GoldBio, St. Louis, MO, #A-420–50), and resuspended to an OD600 of 1. A total of 700 μL of OD1 yeast cells were labeled with biotinylated HA at each of eleven antigen concentrations (half-log increments spanning 1 pM – 100 nM for H1 and H9, and 10 pM – 1 μM for H3 and influenza B, as well as no HA), with volumes adjusted such that the number of antigen molecules was in 10-fold excess of antibody molecules (assuming 50,000 scFv/cell). Yeast-HA mixtures were rocked at 4°C for 24 hr.

Secondary labeling

Request a detailed protocol

On day 4, yeast-HA complexes were pelleted by spinning at 3000 x g for 10 min at 4°C, washed twice with 5% PBSA + 2 mM EDTA, and simultaneously labeled with Streptavidin-RPE (1:100, Thermo Fisher #S866) and anti-cMyc-FITC (1:50, Miltenyi Biotec, Somerville, MA, #130-116-485) at 4°C for 45 min. Following secondary labeling, yeast were washed twice with 5% PBSA + 2 mM EDTA, and left on ice in the dark until sorting.

Sorting and recovery

Request a detailed protocol

Yeast were sorted on a BD FACS Aria Illu, equipped with 405 nm, 440 nm, 488 nm, 561 nm, and 635 nm lasers, and an 85 micron fixed nozzle. Prior to sorting, single-color controls were used to compensate for the minimal FITC overlap with PE. Single cells were gated by FSC vs SSC, and then this population was sorted either by expression (FITC) or by expression and binding (PE). For all sorts, at least ten-fold excess of the library diversity was sorted (∼1.6 million cells for CR9114; ∼500,000 cells for CR6261). For the expression sorts, singlets were sorted into eight equivalent FITC log-spaced gates. For the binding sorts, FITC-positive cells were sorted into 4 PE bins (the PE-negative population comprised bin 1, and the PE-positive population was split into three equivalent log-spaced bins 2–4; see Figure 1—figure supplement 6). Polypropylene collection tubes were coated and filled with 1 mL YPD supplemented with 1% BSA and placed on ice until recovery. Sorted cells were pelleted by spinning at 3000 x g for 10 min, and supernatant was removed by pipette to avoid disturbing the pellets. Pellets were then resuspended in 4 mL SDCAA, a small amount was plated on SDCAA-agar to quantify recovery efficiency, and cultures were rocked at 30°C until reaching late-log phase (OD600 = 0.6–1.2).

Sequencing library preparation

Request a detailed protocol

A total of 1.5 mL of late-log yeast cultures were pelleted and scFv plasmid was extracted using Zymo Yeast Plasmid Miniprep II (Zymo Research # D2004), per the manufacturer’s instructions, and eluted in 10 μL elution buffer. Heavy-chain amplicon sequencing libraries were prepared by a two-step PCR as previously described (Nguyen Ba et al., 2019). In the first PCR, unique molecular identifiers (UMI), inline indices, and partial Illumina adapters were appended to the heavy chain through 3–5 amplification cycles to minimize PCR amplification bias. In the second PCR, the remainder of the Illumina adapter and sample-specific Illumina i5 and i7 indices were appended through 35 amplification cycles (see Supplementary file 1 for primer sequences). The first PCR used 5 μL plasmid DNA as template in a 25 μL reaction volume, with Q5 polymerase according to the manufactuer’s instructions (NEB # M0491L), and was incubated in a thermocycler with the following program: 1. 60 s at 98°C, 2. 10 s at 98°C, 3. 30 s at 66°C, 4. 30 s at 72°C, 5. GOTO 2, 2-4x, 6. 60 s at 72°C. PCR products were then combined with carrier RNA and purified by 1.1X Aline beads (Aline Biosciences #C-1003–5), and eluted in 35 μL elution buffer. 33 μL of the elution was used as input for the second PCR, in a total volume of 50 μL using Kapa polymerase (Kapa Biosystems #KK2502) according to the manufacturer’s instructions, and incubated in a thermocycler with the following program: 1. 30 s at 98°C, 2. 20 s at 98°C, 3. 30 s at 62°C, 4. 30 s at 72°C, 5. GOTO 2, 34x, 6. 300 s at 72°C. The resulting sequencing libraries were purified by 0.85X Aline beads, amplicon size was verified to be ∼500 bp by running on a 1% agarose gel, and amplicon concentration was quantified by a fluorescent DNA-binding dye (Biotium, Fremont, CA, #31068, per manufacturer’s instructions). Amplicons were then pooled for each gate according to the number of sorted cells to ensure even sequencing coverage. The pool was further size-selected by a two-sided Aline bead cleanup (0.55–0.85X), and the final pool size was verified by Tapestation 5000 HS and 1000 HS. Final sequencing library concentration was determined by Qubit fluorometer and sequenced on an Illumina NovaSeq S2 or Miseq v3 (2x150) with 5% PhiX.

Sequencing data processing

Request a detailed protocol

We first processed our raw sequencing reads to identify and extract the indexes and mutational sites, discarding priming regions and the constant regions between mutations. To do so, we developed custom Python scripts using the approximate regular expression library regex (Barnett, 2013), which allowed us to handle complications in sequence parsing that arise from the irregular lengths of the indices and from sequencing errors. We accept sequences that match the entire read (with no restrictions on bases at mutational sites) within the following mismatch tolerances: two mismatches in the multiplexing index, two mismatches in the priming site, and 15 substitution mismatches within the 170 bases of constant antibody sequence.

We then examine the mutational sites to call germline or somatic alleles, producing binary genotypes (‘0’ for germline or ‘1’ for somatic at each position). We require the exact germline or somatic sequence at every site: if there are any substitution errors in any of the mutation sites, the entire read is rejected. While it is possible to perform error correction based on Hamming distance to rescue reads with a few substitution errors, we find that on average only <8% of reads per sample contain any errors, and so we adopt the conservative approach of requiring perfect matching.

We next discarded sequencing reads with any mismatched indices (four total indices from the two PCR reactions), as well as reads with duplicate UMI sequences. Counts for each genotype were then tabulated, producing the final counts used for binding affinity inference (see below). On average, across all antigens and replicates, we obtain a mean coverage of ∼350 for CR9114 and ∼950 for CR6261, and a median coverage of ∼250 for CR9114 and ∼900 for CR6261.

Isogenic validation

Request a detailed protocol

Induction of scFv surface display, primary labeling, and secondary labeling of isogenic strains were performed identically to the Tite-Seq assay, except yeast cell and antigen volumes were scaled down by a factor of 10. Yeast cell FITC (scFv expression) and R-PE (HA binding) fluorescence intensity was assayed on a BD LSR Fortessa equipped with four lasers (440, 488, 561, and 633 nm). The equilibrium binding affinities (KD) for each variant are inferred by fitting the log of a Hill function to the mean log R-PE fluorescence of scFv-expressing (FITC+) singlet yeast cells:

(1) meanlogfluorescence=log10(Ascc+KD,s+Bs),

where c is the antigen concentration in molar units, As is the increase in fluorescence due to saturation with antigen, Bs is the background fluorescence, and KD,s is the equilibrium binding affinity. All isogenic measurements were performed in two to three biological replicates; see Figure 1—source data 3 for isogenic -log10KD.

Tite-Seq binding affinity inference

Mean-bin approach

Request a detailed protocol

To infer binding affinities using a simple mean-bin approach (Peterman and Levine, 2016), we incorporate sequencing data (the unique read counts of each genotype sequence s in bin b at concentration c, Rb,s,c) with flow cytometry data (the mean and standard deviation of log10-fluorescence of sorted cells in each bin b at concentration c, Fb,c and σFb,c respectively, and cell counts for each bin b at each concentration c, Cb,c).

The mean log-fluorescence of each genotype sequence at each of the 12 antigen concentrations is calculated as:

(2) F¯s,c=bFb,cpb,s|c,

where pb,s|c is the probability a cell with sequence s would be sorted into bin b at concentration c is estimated from the sequencing read counts as:

(3) pb,s|c=Rb,s,csRb,s,cCb,cb(Rb,s,csRb,s,cCb,c),

in other words, the fraction of total reads in the bin corresponding to sequence s, scaled by the number of sorted cells in that bin, normalized over the four bins for each concentration.

The uncertainty in the mean bin inference was propagated as:

(4) δF¯s,c=b(δFb,c2pb,s|c2+Fb,c2δpb,s|c2).

Here, δFb,c represents the spread in log-fluorescence values of cells sorted into the same bin b. While we could estimate this value using the bin width, in practice we find that the distribution of cell log-fluorescence values in a bin is far from uniform across the bin width. The distribution is often not normal either, but we find that approximating δFb,cσFb,c, or the standard deviation in log10-fluorescence of cells sorted into bin b at concentration c, adequately captures the typical variation. The error in pb,s|c arises largely from the sampling process of sequencing, which can be approximated as a Poisson process when read counts are relatively high. This gives

(5) δpb,s|c=pb,s|cRb,s,c.

Thus, δF¯s,c can be written as

(6) δF¯s,c=b(σFb,c2pb,s|c2+Fb,c2pb,s|c2Rb,s,c).

The equilibrium binding affinities (KD) for each variant are inferred by fitting the logarithm of a Hill function to the resulting mean log10-fluorescence across the twelve antigen concentrations:

(7) F¯s,c=log10(Ascc+KD,s+Bs),

where c is the antigen concentration in molar units, As is the increase in fluorescence due to saturation with antigen, Bs is the background fluorescence, and KD,s is the binding affinity. Fitting was performed with the curve_fit function of the Python package scipy.optimize. Reasonable bounds on the values of A(103-105), B(100-103) and KD(10-14-10-5) were imposed. Sequences leading to a failed optimization were deemed ‘non-binding’.

Inferred KD outside of the titration boundaries were then pinned to the boundaries (10-12 and 10-7 for H1 and H9; 10-11 and 10-6 for H3 and FluB). Inferred KD with high error (standard deviation of log10KD>1.0) or resulting from a poor fit (r2<0.8) were removed from the data set prior to averaging -log10KD values across biological replicates.

We also explored an alternative maximum-likelihood framework for inferring binding affinities (see Appendix 1), but found it to be less accurate than the mean-bin approach when compared to isogenic flow cytometry measurements. Thus, we restricted our analysis to the simpler and more robust mean-bin inference presented here.

Force-directed layouts

Request a detailed protocol

To represent the high-dimensional binding affinity landscape in two dimensions, we use a force-directed graph layout approach. Each sequence in the antibody library is a node, connected by edges to its single-mutation neighbors (sequences that can be reached by one additional somatic mutation). An edge between two sequences s and t is given the weight

(8) ws,t=10.01+|log10(KD,sag)log10(KD,tag)|,

where KDag represent binding affinities to a particular antigen, ag. In the layouts shown in the main text, we use binding affinities to H1 for both CR6261 and CR9114. In force-directed layouts, edge weights correspond to the effective spring constant that tends to pull nodes closer together. Thus, a mutation from sequence s to t that has little impact on binding will cause that edge weight to be large, and the nodes will be pulled strongly together. A mutation from sequence s to t that causes a large difference in binding affinity (positive or negative) to the antigen will reduce the edge weight, moving those nodes further apart. After assigning all edge weights, we use the layout function layout_drl from the Python package iGraph, with default settings, to obtain the layout coordinates for each variant.

Expression data

Request a detailed protocol

As noted above, antibody libraries were sorted into eight bins along the FITC-A fluorescence axis (where FITC-A fluorescence is proportional to expression), each comprising 12.5% of the total singlet population (Figure 1—figure supplement 6). The mean expression log-fluorescence was computed for each variant using the corresponding variant counts and fluorescence data, as described above for the mean-bin KD inference. These expression values were then averaged across all biological replicates for each antibody (nine replicates for CR9114, six replicates for CR6261), and correlation between biological replicates, as well as with -log10KD values, are illustrated in Figure 1—figure supplement 5. For the isogenic flow cytometry measurements, variant expression was computed as the mean log FITC-A fluorescence.

Epistasis analysis

Linear interaction models

Request a detailed protocol

To infer specific mutational effects, we begin with simple linear models where the effects of mutations (and mutation combinations) add to produce phenotypes. Our log-transformed phenotypes for each variant s, ys=-log10(KD,s), are proportional to free-energy changes, and thus a natural null expectation is that they combine additively (Wells, 1990; Olson et al., 2014) (although we also consider nonadditive epistatic interactions between individual loci here, and analyze the effects of an overall nonlinear transformation of this data in Appendix 2). Our additive-only model is

(9) ys=β0+i=1Lβixi,s+ε,

 where L is the number of mutations for a given antibody, β0 is an intercept term, βi is the effect of the mutation at site i, xi,s is the genotype of variant s at site i, and ε represents independently and identically distributed errors. Our general linear interaction models are

(10) ys=β0+iβixi,s+i<jLβijxi,sxj,s+i<j<kLβijkxi,sxj,sxk,s++ε

where βij represent second-order interaction coefficients between distinct sites i and j, βijk represent third-order interaction coefficients, and so on up to the desired maximum order of interaction.

There are multiple alternative coding systems for the binary genotypes xi,s that affect the values of inferred effects β as well as their interpretation. Two common choices are (1) xi,s{0,1}, often called ‘biochemical’ or ‘local’ epistasis, and (2) xi,s{-1,1}, often called ‘statistical’ or ‘ensemble’ epistasis (Poelwijk et al., 2016). These frameworks are equivalent and related by a simple linear transformation, but the values of the coefficients vary between frameworks and have different interpretations. For ease of interpretation, in the Main Text and Figures we always show results obtained from inference in the biochemical epistasis framework. In Appendix 2, we discuss the differences between these two frameworks, and present results from inference in the statistical epistasis framework.

For an antibody with L mutations, there are L possible orders of interactions, with a total of 2L epistatic coefficients β. From a measurement of y for all 2L possible sequences, there is a simple linear transformation to calculate the resulting 2Lβ parameters (Poelwijk et al., 2016). This is a simple and fast approach to the calculation of epistasis that is widely used (Sailer and Harms, 2017b; Poelwijk et al., 2019), and we explore this approach in Appendix 2. However, we may instead wish to restrict our model to a lower order and examine whether it can explain the data with far fewer than 2L parameters, as a conservative approach to detecting high-order epistasis.

Specifically, we truncate the model above at a maximum order n and then fit and evaluate the resulting model. We begin with n=1 and continue to increase n until the optimal model has been identified. There are multiple strategies for selecting between models with different numbers of parameters, such as AIC and BIC; here we take a cross-validation approach. For each fold, we hold out 10% of the dataset, train models at each maximum order on the remaining 90%, and evaluate the prediction performance (R2) of the model on the held-out test set. After averaging the performances across all 10 folds for each truncated model, we choose the order that maximizes the test set performance as the optimal maximal order of interaction. We then re-train the model truncated at this order on the full dataset to obtain the final coefficients. We find that the optimal model identified by cross-validation for each antibody-antigen pair satisfies p<N by ∼1 order of magnitude, where p is the total number of model coefficients and N the number of data points with measurable binding affinity. This gives confidence that our parameter estimates are well constrained by the data, even in the absence of other regularization (such as Lasso or Ridge regularization approaches).

To train a model of given order on a set of sequences, we use ordinary least squares (OLS) regression with the Python package statsmodels. From this, we obtain the coefficient values β with their standard errors and p-values. To define significance of coefficients, we use a p-value cutoff of 0.05 with Bonferroni correction by the total number of model parameters. Coefficients, standard errors, p-values, and Bonferroni-corrected 95% confidence intervals are reported in Figure 1—source data 1 and 2. We also predict phenotypes y^ for each sequence from the coefficients and use these values in Figure 5A,B.

For CR9114 binding to influenza B, the number of sequences used for inference is far fewer than other antibody-antigen pairs (N=256), due to the large number of required mutations. We therefore use a fivefold rather than 10-fold split to reduce the test set noise. Nevertheless, the cross-validation procedure identifies a first-order (additive) model as optimal, due to the smaller sample size.

Structural analysis of epistatic coefficients

Request a detailed protocol

To examine the structural context of linear and pairwise coefficients, we performed three simple analyses. (1) First, we used ChimeraX (Pettersen et al., 2021) to calculate the buried surface area between HA and each mutated residue in CR9114 and CR6261, using the measure buriedarea function and the default probeRadius of 1.4 angstroms to approximate a water molecule. We plot this ‘contact surface area’ vs the linear effect of the corresponding mutation on HA binding (Figure 2C; Figure 2—figure supplement 1A). (2) We used PyMol (Schrodinger, LLC, 2015) to count the number of HA residues within six angstroms of each antibody mutation site. Six angstroms was chosen as an upper limit to capture potential antibody-antigen interactions (Bondi, 1964; Baker and Hubbard, 1984; Israelachvili and Pashley, 1982; Ekiert et al., 2009; Dreyfus et al., 2012), although we note that this analysis is robust to other distance thresholds. (3) We also used PyMol to measure the distances between α-carbons for all mutation pairs, and plotted these distances against the corresponding pairwise epistatic terms (Figure 2F; Figure 2—figure supplement 1B). We note that each of these analyses were performed with co-crystal structures of the somatic antibodies with HA (PDB ID: 4FQI (CR9114 –H5; CR9114 –H1 crystal structure not available) Dreyfus et al., 2012; 4FQY (CR9114 –H3) Dreyfus et al., 2012; 3GBN (CR6261 –H1) Ekiert et al., 2009).

Pathway analysis

Selection models

Request a detailed protocol

To study the likelihood of various mutational pathways leading from the germline to the somatic sequence, we must assume a selection model. Selection in germinal centers is considerably more complex than in classical population genetics models, involving spatial structure, changing population sizes, and T-cell-mediated selection, among other factors (Mesin et al., 2016). Capturing these aspects in quantitative models is an active field of research (Amitai et al., 2017). However, here we wish to adopt an extremely simple model of selection as a first step in understanding the impacts of the binding affinity landscape on antibody selection, with the goal of understanding the implications of the expectation that mutational steps become more probable as their effect on binding affinity becomes more positive. Combining the more realistic models of immune selection with our detailed characterization of mutational effects on antigen binding affinity remains an interesting avenue for future work.

Here, we restrict to the weak-mutation regime where mutation fixation events occur independently of one another. Selection proceeds as a Markov process, where the population is characterized by a single sequence that acquires a single mutation at each discrete step (McCandlish, 2011). We choose a simple form for the fixation probability of a mutation from sequence s to sequence t, as discussed below. This then determines the transition probability for the population to move from s to t. We assume that sequences cannot back-mutate (i.e. a residue changing from the somatic allele to the germline allele), and do not acquire multiple mutations in the same step. The absence of back-mutation is justified by the relatively large number of possible mutation sites compared to the total number of mutation events.

We define the transition probability of a single mutational step from the classical fixation probability for a mutation with selection coefficient σ in a population of size N(Kimura, 1962):

(11) pstep(σ,N)=1eσ1eNσ.

Here, we define the selection coefficient σ to be proportional to the difference in log binding affinities to a particular antigen between the two sequences s and t:

(12) σ=γΔs,tag=γ(-log10KD,tag-(-log10KD,sag)).

This model has two tunable parameters: N represents the effective population size and γ represents how strongly differences in binding affinity impact fitness. We chose three parameter values to span a range of selection strengths (see Figure 5—figure supplement 1): moderate, with N=1000 and γ=1; weak, with N=20 and γ=0.5; and strong, with N and γ such that pstep reduces to a step function (one if Δ>0 and 0 otherwise). These three models all show similar results, with differences between selection scenarios becoming more exaggerated with stronger selection and less exaggerated with weaker selection, as expected (see Figure 5—figure supplement 1).

From the fixation probabilities for a given parameter regime, we have the transition probability (up to a constant factor) for all sequences s,t over all antigens ag,

(13) Ps,tag={pstep(Δs,tag,γ,N)if t has one more somatic mutation than s0otherwise,

which we use for all the calculations described below for results presented in Figure 5 and supplements.

Scenario, mutation, and variant probabilities

Request a detailed protocol

It is particularly useful to store the probabilities Ps,tag as (sparse) transition matrices Pag of dimension 2N×2N for each antigen, where entries are nonzero only where sequence t has one more somatic mutation than s.

First, we wish to obtain a measure of total probability for a particular antigen scenario, as shown in Figure 5E,F. We calculate this by computing the matrix product over all mutational steps i for a particular sequence of antigen contexts {ag1,,agL}:

(14) 𝒫tot=paths(stepsPstep)=[i=1LPagi]sg,ss,

where []s,s corresponds to taking the matrix element in the row corresponding to variant s and column corresponding to variant s. In the right-most term, the products are matrix operations and sg, ss are respectively the indices of the germline and somatic variants.

We note that the transition probabilities Pagi are not normalized at each step. In practice, this means that mutations are optional: many outcomes will not reach the somatic sequence and the likelihood encodes the probability of reaching the somatic state. This makes it possible to compare different scenarios, as some scenarios are more likely than others to reach the somatic state. However, because these values do not represent true probabilities — the units are arbitrary — they cannot be compared between antibodies or between selection models. The exception is for the strong scenario, where the total probability for each path is one if all steps are uphill (Δs,tag>0) and 0 otherwise. Thus, here 𝒫tot has a natural interpretation as the total number of uphill paths. When we present results from the strong model (Figure 5C,D, Figure 5—figure supplement 1, and numbers of uphill paths for H1-only scenarios as discussed in the text), we represent uphill path numbers on a linear scale without log-transforming.

Although there are many possible antigen exposure scenarios, we restrict our analysis to several classes. First, in single-antigen scenarios, all steps i use the same antigen. Second, for sequential scenarios, antigen exposures must occur in non-repeating segments (for example, H1 - H3 - H1 is not allowed), although we consider all possible lengths and orders of segments.

Mixed scenarios are more complicated, as we do not fully understand the nature of B cell interactions with multiple antigens in the same germinal center (Wang et al., 2015; Wang, 2017; Kuraoka et al., 2016). One option is to assume that the B cell engages the antigen for which it has the highest affinity and define Δ by the maximum binding affinity across all possible antigens at each step, but this definition would trivially imply that the mixed scenario has the highest probability. Instead, we choose two alternatives: first, ‘average’ mixed, where we assume the B cell engages all antigens and use the average binding affinity change over all three (for CR9114) or two (for CR6261) antigens, Δmixed=1NagagΔag; and second, ‘random’ mixed, where we assume the B cell randomly engages a single antigen and hence the antigen at each mutational step is chosen randomly. For the latter definition, we calculate 𝒫tot as described above for 1000 randomly drawn scenarios and average the resulting log probability. When we illustrate mutational paths and mutation orders, we choose a representative scenario (with close to median probability) from the 1000 random draws.

We estimate the error of these probabilities by bootstrapping. Specifically, for 10 bootstrap iterations, we resample each binding affinity -log10KD,sag from a normal distribution according to its value and standard deviation. We then recalculate the total probability 𝒫tot, average over the 10 values to obtain mean and s.e.m. values, and transform by the natural log for plotting, as shown in Figure 5 and Figure 5—figure supplement 1. We note that for the strong selection scenario (where probabilities represent total numbers of uphill paths), values are not log-transformed, and many scenarios have total path numbers of exactly zero. We refrain from studying the ‘average’ mixed scenario for strong selection because it is essentially equivalent to choosing the antigen with maximum improvement: the quantitative effect of averaging is undone when the transition probability is binarized. For CR6261, all mutations at the first mutational step are neutral (with the exception of one mutation that improves affinity for H1 only), and so we allow all mutations with equal probability for the first step in the strong selection model.

Next, to identify the most likely paths under a given exposure scenario, we reframe this Markov process as a directed weighted graph. Each sequence s is a node, and a directed edge exists toward all sequences t that can be reached by one additional somatic mutation. The edge weight is calculated from the transition probability, wst=-log(Ps,tag+ϵ), where ϵ is an extremely small value to ensure weights are finite. In this graph framework, we can use fast algorithms to obtain the ‘shortest’ paths from the germline to the somatic node (those for which the sum of weights is lowest, that is the total probability is highest). Specifically, we use the shortest_simple_paths function from the Python package networkx (Hagberg et al., 2008) to compute the k shortest paths, as shown in Figure 5G,H. This method is exact and uses the algorithm described in Yen, 1971.

Next, we wish to obtain the probability that a mutation at site m happened at a specific step j (Figure 5I,J). As we are focusing on one antigen context, we can normalize the transition matrices and define:

(15) P~s,tag=Ps,tag×(tPs,tag)-1,

if Ps,tag0 and 0 otherwise. We can further restrict the transition matrix at step j, P~agj, to have nonzero the mutation that occurs is at a particular residue α, P~αagj. The total relative probability for that site at that mutational step under an antigen exposure scenario is then

(16) j,α=[(i=1j1P~agi)P~αagj(i=j+1LP~agi)]sg,ss,

where, again, products are matrix operations. Because a sequence of L steps starting from the germline can only lead to the somatic state, P~ verifies [i=1LP~agi]sg,ss=1. With the relation αP~αagj=P~agj this implies that these probabilities are already normalized: α𝒫j,α=1.

Finally, we wish to determine the total probability of each variant (Figure 5—figure supplement 2), that is the sum of probabilities of all paths passing through that variant, for a given selection scenario. For a variant s that contains j somatic mutations, we calculate

(17) 𝒫s=([i=1jP~agi]sg,s)([i=j+1LP~agi]s,ss),

where the first term is the probability of reaching sequence s at mutational step j, and the second term is the probability of reaching the somatic sequence after passing through sequence s. When representing this number we add an additional normalisation factor, 𝒫s=𝒫s×nj, where nj=(Lj) is the number of sequences with j mutations, so that variants with different numbers of mutations have comparable values. 𝒫s thus represents the ratio of the probability in a selective model to the probability in a neutral model (which is 1/nj). Thus, sequences with log10(𝒫s)>0 are favored by the given selection scenario, and those with log10(𝒫s)<0 are disfavored, as shown in Figure 5—figure supplement 2 for moderate selection under the optimal sequential scenario.

Appendix 1

Maximum likelihood approach to binding affinity inference

In this approach we make the assumption that the fluorescence emitted by cells of a specific genotype is distributed log-normally, with parameters μs,c and σs,c (the mean and standard deviation of the associated normal distribution respectively). At concentration c, a cell with genotype s will fall into the bin b (log10-fluorescence values fs,c ranging from lb to hb) with probability:

(18) P[lb<fs,c<hb]=lbhb12πσs,c2e12(fs,cμs,cσs,c)2dfs,c
(19) =12(erf(hb-μs,cσs,c2)-erf(lb-μs,cσs,c2)).

Each cell sorted is an independent event, so the number of cells in each bin will be multinomially distributed, and thus the likelihood of sorting nb,s|c cells of sequence s into bin b at concentration c is given by

(20) =s,c(P[lb<fs,c<hb])nb,s|c,

and the log-likelihood is

(21) log=s,c,bnb,s|clogP[lb<fs,c<hb]s,c,bpb,s|clogP[lb<fs,c<hb].

The probability pb,s|c is estimated as in the mean-bin approach (see Methods) and the log-likelihood is then maximized as a function of μs,c and σs,c (BFGS method). The values of A, KD, and B are then estimated similarly as the mean-bin approach (see Methods), replacing F¯s,c by μs,c.

The -log10KD inferred by this maximum likelihood (ML) approach correlate well with isogenic flow cytometry -log10KD (see Appendix 1—figure 1), but not as well as those inferred by the mean-bin approach (Figure 1—figure supplement 2B). The ML approach is predicated on the assumption that the fluorescence distribution for each variant is log-normal, which is often not the case (see Appendix 1—figure 2). For these reasons, in addition to favoring a simple approach, we performed all analyses with -log10KD inferred by the mean-bin approach.

Appendix 1—figure 1
Correlation between -log10KD from ML inference on Tite-Seq data vs. -log10KD from isogenic flow cytometry.

-log10KD to H1 (salmon), H3 (green), and Flu B (blue) shown for select variants, identical to those shown in Figure 1—figure supplement 2B. Pearson’s r = 0.97.

Appendix 1—figure 2
Distributions of PE-A fluorescence (HA binding) for isogenic CR9114 strains incubated with H3.

PE-A fluorescence distributions from flow cytometry of isogenic CR9114 germline (left) and somatic (right) strains following incubation with 1 μM, 100 nM, and 10 nM H3, as described in Methods. Shape of distribution varies for different clones and is not strictly log-normal, hence deviating from assumptions made in the maximum-likelihood binding affinity inference.

Appendix 2

Alternative approaches to epistasis inference

Statistical epistasis and variance partitioning

The contrast between biochemical and statistical frameworks for epistasis is well described in Poelwijk et al., 2016. In particular, a biochemical epistasis approach highlights one particular sequence as the ‘‘wildtype’’ or reference sequence and measures effects relative to its phenotype, whereas a statistical epistasis approach measures effects relative to the average phenotype of all variants included. The biochemical approach benefits from easier interpretation of the coefficient values, particularly when there is a natural or relevant choice of reference sequence, but the coefficients at different orders are not statistically independent. The statistical approach allows for correct variance partitioning between interaction orders, but the interpretation of the coefficients can be sensitive to the set of sequences, particularly when not all possible sequences are represented or when a majority of sequences exhibit some uninteresting phenotype (e.g. lethal).

Here, we perform inference of statistical epistasis exactly as described above for biochemical epistasis (see Methods), except that genotypes xi,s are coded as {-1,1} instead of {0,1}. The results from this statistical epistasis inference are shown in Appendix 2—figure 1 for CR9114 and Appendix 2—figure 2 for CR6261, in plots analogous to those in Figure 3, Figure 4 and supplements. We find that the patterns of site participation in interactions are similar (although the coefficient magnitudes and signs are of course scaled differently). The group of five key sites discussed in Figure 3 (sites 30, 57, 65, 82, and 83 for CR9114 binding to H1) exhibit coefficients that are significant for all 31 mutation combinations, consistent with the result from biochemical epistasis. Overall, the numbers of significant coefficients inferred in statistical epistasis models tends to be somewhat higher than for biochemical epistasis models, perhaps due to the effect of background averaging in reducing coefficient standard errors, but neither framework is a substantially more compact representation of epistasis than the other.

In the statistical epistasis framework, we can also partition the variance explained by the model according to the interaction order. Here, we take the final inferred model at the optimal interaction order and evaluate the prediction performance (R2) of each order as a fraction of the total performance of the full model. As shown in Appendix 2—figure 3, we find that epistasis explains a substantial fraction of variance (18%–33%, depending on antibody-antigen pair). Variance explained tends to decline with increasing order, as is also observed in some other protein epistasis datasets (Sailer and Harms, 2017b). This indicates that interactions at higher order are more rare (compared to the total number of terms at each order, which scales combinatorially) and/or smaller in magnitude than those at lower order. However, this does not imply that rare, strong interactions of even higher order do not exist; for example, there may be some strong sixth-order interaction terms for CR9114 binding to H1, but not enough to compensate for the many nonsignificant sixth-order terms in our cross-validation framework.

Appendix 2—figure 1
Results from statistical epistasis models for CR9114.

(A) First-order effects, as in Figure 2A. ‘R’ indicates required mutations. (B) Second-order effects for H1 (top right) and H3 (lower left), as in Figure 2D. Interactions with required mutations for H3 are noted in dark red. (C) Cumulative higher-order effects for CR9114 binding to H1, as in Figure 3A. (D) Cumulative higher-order effects for CR9114 binding to H3, as in Figure 3—figure supplement 3. (E) Inferred interaction coefficients for the set of five key epistatic loci, as in Figure 3—figure supplement 1B with corresponding colors. Note the different y-axis scales for the two subplots. Different interaction orders are separated by dotted lines. (F) Number of significant coefficients at all orders for the biochemical and statistical epistasis models. The maximal order of interaction for each model is indicated in parentheses.

Appendix 2—figure 2
Results from statistical epistasis models for CR6261.

(A) First-order effects, as in Figure 2B. (B) Second-order effects for H1 (top right) and H9 (lower left), as in Figure 2E. (C) Cumulative higher-order effects for CR6261 binding to H1, as in Figure 4A. (D) Cumulative higher-order effects for CR9114 binding to H9, as in Figure 4—figure supplement 2A. (E) Number of significant coefficients at all orders for the biochemical and statistical epistasis models. The maximal order of interaction for each model is indicated in parentheses.

Appendix 2—figure 3
Variance partitioning of statistical epistasis models.

(A) Variance partitioning for CR9114 binding to H1 (left) and H3 (right). (B) Variance partitioning for CR9114 binding to H1 and H9, denoted by colors as indicated.

In particular, another alternative approach to the inference of epistasis is to infer a full Lth-order model rather than truncating to lower order. This approach calculates 2L epistatic coefficients, one for every datapoint, which allows for the detection of strong interactions at any order with the caveat that many coefficients may simply reflect experimental noise, especially for higher-order terms. We explore this approach by following Poelwijk et al., 2019: we calculate epistatic coefficients using a Walsh-Hadamard transform of the -log10KD values, and calculate standard errors on each coefficient via error propagation using the standard errors of the data. We define significant coefficients by a p-value cutoff of 0.05, with Bonferroni correction by the total number of parameters in the model (here 2L). We find that for all antibody-antigen combinations, this approach finds more significant coefficients than the optimal truncated models, many of which are at higher interaction orders than allowed in the truncated model (Appendix 2—figure 4). This analysis requires a measurement of -log10KD for every single variant, so we use data that has not been filtered for goodness-of-fit or error in the inference of binding affinity (see Methods), including some sequences that have substantial error. Therefore we prefer to use the more conservative regression approach for our in-depth analysis of epistasis; this inference at full order confirms the existence, strength, and identity of the high-order interactions we discuss from the regression approach, while also indicating that additional and even higher-order terms may yet exist.

Appendix 2—figure 4
Epistasis inference at full order.

(A,B) Numbers of significant coefficients for the full-order inference compared to optimal truncated regression models for (A) CR9114 and (B) CR6261. Significance for both model types is determined by p<0.05 with Bonferroni correction by the number of model parameters. (C,D) Distribution of interaction orders of significant coefficients for (C) CR9114 and (D) CR6261.

Nonlinear models

An alternative approach to understanding epistasis is to view nonlinearities in observed phenotype data as arising from a simple nonlinear transformation applied to an underlying, unobserved additive phenotype. In this view, a simple nonlinear ‘‘global epistasis’’ function with few parameters may describe the landscape as well or better than models of the sort described above, with their large number of ‘‘idiosyncratic epistasis’’ parameters. Many studies in other proteins have attempted to disentangle such global epistasis from idiosyncratic effects (Sailer and Harms, 2017a; Domingo et al., 2019; Sarkisyan et al., 2016; Otwinowski et al., 2018; Otwinowski, 2018; Adams et al., 2019).

We already implement one global nonlinear transformation, by log-transforming our binding affinity measurements so that they are proportional to free energy changes, as described above. However, it is possible that another nonlinear transformation would capture the effects of many specific interaction coefficients, if there is a single underlying additive scale. In this section, we explore this possibility following the approach taken by Sailer and Harms, 2017a: we infer a nonlinear transformation that fits the phenotype data, invert it to ‘‘linearize’’ the phenotypes, re-fit interaction models on the linearized phenotypes, and then compare those model coefficients to the original coefficients to evaluate the role of the nonlinear transformation.

Our new model is

(22) ys=Φ(ys,add;km)=Φ(β0+iLβixi,s;km),

where ys are the observed phenotypes (-log10KD values), Φ is a nonlinear function with a small number of associated parameters km, and ys,add are the underlying additive-scale phenotypes, parametrized as before by additive coefficients βi.

To specify Φ, we must choose a family of nonlinear functions. Typical choices include splines (Otwinowski et al., 2018) or power transforms (Sailer and Harms, 2017b). We found that logistic (sigmoid) functions fit our data better than power transforms or splines, and they are monotonic and invertible. Specifically, our logistic function with four parameters is

(23) Φ(y;A,B,μ,σ)=A1+e(yμ)σ+B.

Logistic functions capture two features that we observe: first, there is a saturation effect at low values of -log10KD, corresponding to nonspecific binding that our measurements are unable to distinguish (Batista and Neuberger, 1998); and second, for most antibody-antigen combinations we observe a saturation effect at moderately high values of -log10KD. This latter effect is not due to limits on our measurement capabilities, as illustrated by higher values of -log10KD measured for the CR6261 library to H9 compared to values of -log10KD measured for the CR9114 library to H1, but instead due to widespread ‘‘diminishing returns’’ epistasis.

After specifying the functional form of Φ, we must fit both the nonlinear parameters km and underlying linear parameters βi. In principle, one could fit all parameters jointly, using for example a maximum likelihood approach (Otwinowski et al., 2018). However, we take the simpler approach as implemented in the software package from Sailer and Harms, 2017a, which first infers the additive parameters βi from the observed phenotypes and then infers the nonlinear function parameters km. We show the resulting fit of Φ in Appendix 2—figure 5a for two representative examples, by plotting our estimate of the additive phenotypes ys,add on the x-axis and our observed phenotypes from data on the y-axis. We found that this simple procedure identified well-fitting Φ in a single step, and successive iterations did not significantly improve the fit.

After fitting the nonlinear transformation, we apply the inverse transformation to our observed phenotypes to obtain ‘‘linearized’’ phenotypes ys,lin:

(24) ys,lin=Φ-1(ys,km).

Because the fit of Φ is not perfect, the linearized phenotypes ys,lin are not exactly equal to the estimated additive phenotypes ys,add, although linear regression on both quantities produces extremely similar values of βi. For values that lie above the domain of Φ-1, we pin them to the largest estimated additive phenotype.

Finally, we can take our linearized phenotypes ys,lin and infer interaction model coefficients β of various orders, exactly as described above for the untransformed ‘‘raw’’ phenotypes:

(25) ys,lin=β0+iβixi,s+i<jLβijxi,sxj,s+i<j<kLβijkxi,sxj,sxk,s++ε.

We again perform this analysis in both the biochemical and statistical epistasis frameworks. If the inverse transformation has removed most or all of the nonlinearity, then the resulting optimal interaction models should be smaller (lower maximum order of interaction and/or fewer significant interaction coefficients).

Instead, we find that in all cases, the optimal order of interaction is unchanged or only decreased by one when inferring on linearized vs raw phenotypes. Specifically, the new (vs old) optimal orders are: 4th (vs 5th) for CR9114 binding to H1, 4th (vs 4th) for CR9114 binding to H3, 3rd (vs 4th) for CR6261 binding to H1, 3rd (vs 4th) for CR6261 binding to H9 in the biochemical epistasis framework, and 4th (vs 4th) for CR6261 binding to H9 in the statistical epistasis framework. We can compare the numbers of significant coefficients in these optimal models inferred on linearized phenotypes to the models with the same maximum order inferred on raw phenotypes (Appendix 2—figure 5d,e), where we see that the numbers are relatively comparable.

We next examine changes in the individual coefficients between these models. In Appendix 2—figure 5b, we show two representative scatterplots between the raw phenotype coefficients β and the linearized phenotype coefficients β, where only significant coefficients are shown for clarity. While some coefficients show dramatic changes, overall the two sets of coefficients are quite well correlated. To see which sites are involved in strong changes, we can also represent coefficient changes in a heatmap format (Appendix 2—figure 5c). Here, diagonal cells show the change in coefficient for single sites (βi-βi), while off-diagonal cells show the sum of coefficient changes over all pairwise and higher terms involving each pair of mutations. We observe that for some antibody-antigen pairs, such as CR9114 binding to H1, the strongest net changes are negative, though not negative enough to remove the many significant coefficients. For other antibody-antigen pairs such as CR6261 binding to H1, there are both positive and negative net changes, indicating that the nonlinear transformation is changing the epistatic landscape rather than correcting for it.

Appendix 2—figure 5
Results from epistasis models with nonlinear transformations.

(A), Fitting logistic functions to additive predicted phenotypes. Red lines indicate the optimized logistic function Φ, with R2 as indicated. (B), Scatterplot of coefficients β from the optimal order model inferred on linearized data (after inverting the best-fit nonlinear transformation) against original coefficients β for the model with the same maximum order. (C), Net changes of coefficients by site. Diagonal cells show changes in linear coefficients. Off-diagonal cells show the sum of changes over terms at all orders (2nd and above) in which the given pair of mutations is involved. For (A-C), we show two representative antibody-antigen combinations: CR9114 binding to H1, top, and CR6261 binding to H1, bottom. (D,E), Number of significant coefficients in optimal order models fit to phenotypes transformed by the inverse nonlinear function (light bars), compared to original coefficients from linear models with the same maximal order (dark bars), for (D) CR9114 and (E) CR6261. The epistasis type and model order are indicated on the x-axis.

In summary, we find that nonlinear logistic transformations can account for a portion of the nonlinearities observed in our data, sometimes reducing the maximal order of interaction by one. However, all antigen-antibody pairs still exhibit strong idiosyncratic epistasis up to at least third order after correcting for global epistasis, and the resulting numbers and magnitudes of significant coefficients are not drastically changed. Thus, it does not appear that global epistasis can explain our data much more simply than models with individual interactions, and so we confine our main analysis to idiosyncratic epistasis models.

Data availability

Data and code used for this study are available at https://github.com/klawrence26/bnab-landscapes (copy archived at https://archive.softwareheritage.org/swh:1:rev:61c1673a101ea739d5b7e9b282f6bcfad41d7e90). CR9114 data are also available in an interactive data browser at https://yodabrowser.netlify.app/yoda_browser/. FASTQ files from high-throughput sequencing have been deposited in the NCBI BioProject database with accession number PRJNA741613.

The following data sets were generated
    1. Phillips AM
    2. Lawrence KR
    3. Moulana A
    4. Dupic T
    5. Chang J
    6. Johnson MS
    7. Cvijovic I
    8. Mora T
    9. Walczak AM
    10. Desai MM
    (2021) NCBI BioProject
    ID PRJNA741613. Binding affinity landscapes constrain the evolution of broadly neutralizing anti-influenza antibodies.

References

    1. Bondi A
    (1964) van der Waals Volumes and Radii
    The Journal of Physical Chemistry 68:441–451.
    https://doi.org/10.1021/j100785a001
  1. Book
    1. Fairhead M
    2. Howarth M
    (2015) Site-specific biotinylation of purified proteins using BirA
    In: Howarth M, editors. Site-Specific Protein Labeling. Germany: Springer. pp. 171–184.
    https://doi.org/10.1007/978-1-4939-2272-7
  2. Software
    1. Phillips A
    (2021)
    bnab-landscapes, version swh:1:rev:61c1673a101ea739d5b7e9b282f6bcfad41d7e90 https://archive.softwareheritage.org/swh:1:rev:61c1673a101ea739d5b7e9b282f6bcfad41d7e90
    Software Heritage.
  3. Software
    1. Schrodinger, LLC
    (2015)
    The PyMOL Molecular Graphics System, version 1.8
    The PyMOL Molecular Graphics System.

Decision letter

  1. Sarel Jacob Fleishman
    Reviewing Editor; Weizmann Institute of Science, Israel
  2. Satyajit Rath
    Senior Editor; Indian Institute of Science Education and Research (IISER), India
  3. Jesse D Bloom
    Reviewer; Fred Hutchinson Cancer Research Center, United States
  4. Nicholas C Wu
    Reviewer; University of Illinois at Urbana-Champaign, United States

Our editorial process produces two outputs: (i) public reviews designed to be posted alongside the preprint for the benefit of readers; (ii) feedback on the manuscript for the authors, including requests for revisions, shown below. We also include an acceptance summary that explains what the editors found interesting or important about the work.

Acceptance summary:

CR6261 and CR9114 are two antibodies that bind to the conserved stem of influenza hemagglutinin (HA) through their VH regions and differ by 14-18 mutations from their inferred germline sequences. The authors constructed large combinatorial libraries containing combinations of 11 and 16 mutations for CR6261 and CR9114, respectively. These were used in yeast surface display titrations to infer individual and epistatic contributions to binding diverse HAs and to infer possible evolutionary trajectories going from germline to the mature antibodies. The study provides a wealth of knowledge on amino acid contributions to binding affinity. The study informs our understanding of biochemical epistasis, and could potentially serve as a starting point for a more detailed understanding of antibody affinity maturation more generally.

Decision letter after peer review:

Thank you for submitting your article "Binding affinity landscapes constrain the evolution of broadly neutralizing anti-influenza antibodies" for consideration by eLife. Your article has been reviewed by 3 peer reviewers, and the evaluation has been overseen by a Reviewing Editor and Satyajit Rath as the Senior Editor. The following individuals involved in review of your submission have agreed to reveal their identity: Jesse D Bloom (Reviewer #2); Nicholas C Wu (Reviewer #4).

The reviewers have discussed their reviews with one another, and the Reviewing Editor has drafted this to help you prepare a revised submission.

Essential revisions:

1. Following are a number of factors that may confound or limit the interpretation of the results. The authors should mention these factors and the limits they place on the analysis of evolutionary paths during antibody selection. They should also tone down the language regarding the relevance to vaccine design.

– Unlike other studies where antibodies were isolated from single-cell sorting of memory B-cells, the present bNAbs were isolated from phage display libraries. These libraries (Throsby et al., 2008; Dreyfus et al., 2012) were constructed from pooled IgM+ from 10 (not 3 as incorrectly stated in manuscript) (CR6261) or 3 (CR9114) healthy donors and scFv fragments were cloned and screened using phage display against H5HA for CR6261 (Throsby et al., 2008) and against sequential panning against Has from H1, H3 and both B lineages for CR9114 (Dreyfus et al., 2012) respectively. Use of phage display methodology which involves multiple rounds of PCR as well as panning means that mutations can be introduced during library construction. Hence the resulting sequences isolated need not accurately reflect antibody gene sequences present in the donors. Further, the multi-round panning process with diverse HAs (especially for CR9114) biases the resulting sequences selected, altering the order of the panning steps might result in a different selected sequence.

– Additionally, it is unclear what the natural selection pressures are. It is likely that they would be for increased neutralization breadth and potency which is not straightforwardly related to improved breadth of binding to soluble HA where the stem accessibility is much higher than it is in the viral context.

– The known polyreactivity of many bNAbs including stem-directed bNAbs (PMID: 33049994) is another confounding factor.

– Unlike studies with bNAbs isolated from elite neutralizers in the case of HIV-1, there is no evidence that the individuals from whom the pooled phage libaries were made had significant breadth of serum neutralizing activity and so it remains possible that the breadth of the final isolated antibodies arose from mutations introduced during the PCR amplifications and enriched through the panning steps.

2. CR6261 was selected by panning against H5 HA whereas the vaccinated individuals presumably primarily had experienced H1. Since the antibody binds to the conserved stem of HA what is relevant to interpret the mutational landscapes is not the overall antigenic diversity of HA (Figure 1B) but the diversity of the stem epitopic region. If these regions are similar in H1 and H9 HA that would explain the observation that multiple mutations in the antibody are tolerated in both cases. In the case of CR9114, the mature antibody binds best to H1, less well to H3 and weakest to B HA. Unsurprisingly the same pattern is reflected in the mutational data. Given the greater diversity in HA stem epitope sequence between H1, H3 and B relative to that between H1 and H9 it is also unsurprising that there should be the sequential acquisition of breadth in the latter CR9114 case (see also 1 above).

3. In lines 69-70, "bnAbs tend to have many more mutations than specific antibodies". This statement is mostly true for HIV bnAbs but not for influenza bnAbs. As described by Lingwood et al., (PMID: 22932267), "Influenza IGHV1-69-based broadly neutralizing antibodies undergo a relatively low degree of somatic mutation (an average of 14 amino acids in the heavy chain, n = 9)". In fact, many influenza antibodies have similar number of somatic mutations as CR9114 and CR6261 (PMID: 30795982).

4. It is surprising to see significant second-order effects at relatively large distances (Figure 2F). What is the suggested explanation?

5. It is never clearly stated in results whether it's single chain (scFv) antibody, and if the HA is trimeric. If so, is there a potential for avidity so that the measurements are Kd,apparent for the multivalent interaction rather than monomeric Kd?

Other comments:

In both the abstract and beginning of results, it would be helpful to describe the libraries a bit more clearly: all combinations of mutations separating the germline and mature antibody in the VH domain among sites contacting the epitope.

Avnir et al., PLoS Pathog 2014 (PMID: 24788925) has analyzed the somatic mutation pattern in IGHV1-69 influenza bnAbs. The authors should cite Avnir et al., to further substantiate the findings in this study. For example, Arnir et al., identified several commonly occurring somatic mutations in IGHV1-69 influenza bnAbs, including T29P, S35R, and S83F (i.e. T28P, S30R, S74F, respectively, in Kabat numbering), which have strong additive effect in CR6261 (Figure 2B).

Since evolution proceeds through single-nucleotide rather than amino acid exchanges, the germline and mature antibody nucleotide sequences should be presented (if known). The authors may consider adding a comment that evolution may not move directly through amino acid exchanges if the DNA sequence does not permit such a change through a single-nucleotide mutation.

I personally was very curious how the specific epistasis models the authors used compare to global epistasis models. This is well explained in the appendix, but might be helpful to mention in another sentence or two in main text as well.

In the non-global-epistasis model, is anything done to handle the censoring of the data at the high and low end of the affinity scale? I was confused by this because line 115 says values outside the range are pinned to the boundaries, but then in Appendix (line 1282) it seems to suggest censoring isn't an issue.

The interactive browser is cool! Would be nice if mutations were either labeled by the amino acids, or there was clearer explanation which amino acid identity is 0 and which is 1 for the violin plots. Also, might be worth adding text explaining the browser is only available for CR9114.

Line 41: I think there are actually now a lot of broadly neutralizing antibodies, so I'm not sure how accurate "a handful" is here.

Line 70: "less broad" might be better than "specific" because even bNAbs are specific (for flu HA for instance) rather than "sticky" to everything.

I had a hard time understanding the numbers in Figure 1A.

I found Figure 1G confusing, mostly because I didn't (and still don't) really understand all the boxes with solid and dotted black lines labeled with various HAs.

Line 288-289: I think we can be virtually certain that the person wasn't infected with H9 since it's not a human virus subtype.

It would be useful to list the surface area each residue in the paratope for both antibodies contributes to binding where such information is available.

Is it possible to convert the effect scores into a free energy of binding contribution? Are there error estimates for the effect scores that would allow one to assess whether apparent differences in effect scores are statistically significant?

Larger epistatic effect scores occur between residues with the largest contributions to binding. Is this expected?

The authors may consider changing "CR-9114" and "CR-6261" to "CR9114" and "CR6261", respectively, which are the more commonly used nomenclatures for these two antibodies in the field.

It is unclear to me which antibody numbering scheme (e.g. Kabat, Chothia, IMGT) is being used for numbering amino acid residues in this study. Also, naming a position as "112.1" in CR6261 seems a bit odd. Should that be "112a"?

In panel B of Figure 1—figure supplement 5, the correlation between Tite-Seq mean expression and isogenic expression fluorescence is quite high for CR9114 variants but low for CR6261 variants. Is there a reason for that discrepancy?

In lines 224-225, "In contrast, a specific set of many mutations with strong synergistic interactions is required to bind H3, and to an even greater extent, influenza B". It is not very clear to me which analysis supports this claim. Can the authors clarify a bit?

In caption of Figure 5, while the meaning of "R" is described (random mixed scenario), the meaning of "O" and "A" should also be explained.

Among the somatic mutations in CR9114, I57S, K82I, and S83F are particularly important for binding to H3 (lines 182-183 and Figure 2D). Can the author provide a plausible structural explanation?

Similar to the comment above, it is interesting to see that T29P almost exclusively occurs as the first mutation under various selection scenarios (Figure 5J and Figure 5—figure supplement 3C-D). Can the author provide a plausible structural explanation?

Reviewer #1 Recommendations for the authors:

The interactive browser is cool! It would be nice if mutations were either labeled by the amino acids, or if there was a clearer explanation which amino acid identity is 0 and which is 1 for the violin plots. Also, might be worth adding text explaining the browser is only available for CR9114.

Line 41: I think there are actually now a lot of broadly neutralizing antibodies, so I'm not sure how accurate "a handful" is here.

Line 70: "less broad" might be better than "specific" because even bNAbs are specific (for flu HA for instance) rather than "sticky" to everything.

I had a hard time understanding the numbers in Figure 1A.

I found Figure 1G confusing, mostly because I didn't (and still don't) really understand all the boxes with solid and dotted black lines labeled with various HAs.

Line 288-289: I think we can be virtually certain that the person wasn't infected with H9 since it's not a human virus subtype.

Reviewer #4 Recommendations for the authors:

1. The authors may consider changing "CR-9114" and "CR-6261" to "CR9114" and "CR6261", respectively, which are the more commonly used nomenclatures for these two antibodies in the field.

2. It is unclear to me which antibody numbering scheme (e.g. Kabat, Chothia, IMGT) is being used for numbering amino acid residues in this study. Also, naming a position as "112.1" in CR6261 seems a bit odd. Should that be "112a"?

3. In lines 69-70, "bnAbs tend to have many more mutations than specific antibodies". This statement is mostly true for HIV bnAbs but not for influenza bnAbs. As described by Lingwood et al., (PMID: 22932267), "Influenza IGHV1-69-based broadly neutralizing antibodies undergo a relatively low degree of somatic mutation (an average of 14 amino acids in the heavy chain, n = 9)". In fact, many influenza antibodies have similar number of somatic mutations as CR9114 and CR6261 (PMID: 30795982).

4. In panel B of Figure 1—figure supplement 5, the correlation between Tite-Seq mean expression and isogenic expression fluorescence is quite high for CR9114 variants but low for CR6261 variants. Is there a reason for that discrepancy?

5. In lines 224-225, "In contrast, a specific set of many mutations with strong synergistic interactions is required to bind H3, and to an even greater extent, influenza B". It is not very clear to me which analysis supports this claim. Can the authors clarify a bit?

6. In caption of Figure 5, while the meaning of "R" is described (random mixed scenario), the meaning of "O" and "A" should also be explained.

7. Among the somatic mutations in CR9114, I57S, K82I, and S83F are particularly important for binding to H3 (lines 182-183 and Figure 2D). Can the author provide a plausible structural explanation?

8. Similar to the comment above, it is interesting to see that T29P almost exclusively occurs as the first mutation under various selection scenarios (Figure 5J and Figure 5—figure supplement 3C-D). Can the author provide a plausible structural explanation?

9. Avnir et al., PLoS Pathog 2014 (PMID: 24788925) has analyzed the somatic mutation pattern in IGHV1-69 influenza bnAbs. The authors should cite Avnir et al., to further substantiate the findings in this study. For example, Arnir et al., identified several commonly occurring somatic mutations in IGHV1-69 influenza bnAbs, including T29P, S35R, and S83F (i.e. T28P, S30R, S74F, respectively, in Kabat numbering), which have strong additive effect in CR6261 (Figure 2B).

https://doi.org/10.7554/eLife.71393.sa1

Author response

Essential revisions:

1. Following are a number of factors that may confound or limit the interpretation of the results. The authors should mention these factors and the limits they place on the analysis of evolutionary paths during antibody selection. They should also tone down the language regarding the relevance to vaccine design.

We have toned down the language regarding the relevance to vaccine design. We have also addressed each of the specific caveats:

– Unlike other studies where antibodies were isolated from single-cell sorting of memory B-cells, the present bNAbs were isolated from phage display libraries. These libraries (Throsby et al., 2008; Dreyfus et al., 2012) were constructed from pooled IgM+ from 10 (not 3 as incorrectly stated in manuscript) (CR6261) or 3 (CR9114) healthy donors and scFv fragments were cloned and screened using phage display against H5HA for CR6261 (Throsby et al., 2008) and against sequential panning against Has from H1, H3 and both B lineages for CR9114 (Dreyfus et al., 2012) respectively. Use of phage display methodology which involves multiple rounds of PCR as well as panning means that mutations can be introduced during library construction. Hence the resulting sequences isolated need not accurately reflect antibody gene sequences present in the donors. Further, the multi-round panning process with diverse HAs (especially for CR9114) biases the resulting sequences selected, altering the order of the panning steps might result in a different selected sequence.

We have commented on the potentially confounding factors associated with phage display in the Discussion. We note that however biased the discovery of these sequences may have been, these antibodies occupy uniquely interesting regions of sequence space because of their associated breadth and are thus useful for understanding the relationship between sequence, affinity, and breadth. We have also made the noted correction to the Methods section (i.e. that CR6261 was isolated from pooled PBMC from 10 donors rather than 3), and thank the reviewer for bringing this to our attention.

– Additionally, it is unclear what the natural selection pressures are. It is likely that they would be for increased neutralization breadth and potency which is not straightforwardly related to improved breadth of binding to soluble HA where the stem accessibility is much higher than it is in the viral context.

We have commented on this potential caveat in the Discussion and have referred to several studies that find affinity to soluble recombinant HA to be indicative of viral neutralization.

– The known polyreactivity of many bNAbs including stem-directed bNAbs (PMID: 33049994) is another confounding factor.

Selection against polyreactivity is likely an important factor shaping bnAb maturation that is not measured in this study and thus is not captured by our models. We have commented on this potential caveat in the Discussion, referring to studies examining the polyreactivity of stem-targeting bnAbs.

– Unlike studies with bNAbs isolated from elite neutralizers in the case of HIV-1, there is no evidence that the individuals from whom the pooled phage libaries were made had significant breadth of serum neutralizing activity and so it remains possible that the breadth of the final isolated antibodies arose from mutations introduced during the PCR amplifications and enriched through the panning steps.

We have added a comment on the potential caveats associated with phage display PCR and panning to the Discussion.

2. CR6261 was selected by panning against H5 HA whereas the vaccinated individuals presumably primarily had experienced H1. Since the antibody binds to the conserved stem of HA what is relevant to interpret the mutational landscapes is not the overall antigenic diversity of HA (Figure 1B) but the diversity of the stem epitopic region. If these regions are similar in H1 and H9 HA that would explain the observation that multiple mutations in the antibody are tolerated in both cases. In the case of CR9114, the mature antibody binds best to H1, less well to H3 and weakest to B HA. Unsurprisingly the same pattern is reflected in the mutational data. Given the greater diversity in HA stem epitope sequence between H1, H3 and B relative to that between H1 and H9 it is also unsurprising that there should be the sequential acquisition of breadth in the latter CR9114 case (see also 1 above).

We agree that the diversity of the stem epitope is important for comparing these antigens and have included this information in Figure 1—Figure supplement 1 and have referenced this in main text. In Figure 1B we retained the antigenic diversity of HA to illustrate the varying levels of breadth of the two antibodies, to motivate the choice of HA antigen for screening each antibody library, and to orient readers unfamiliar with influenza groups and HA subtypes. While we agree that retrospectively the nested sets of mutations required for CR9114 breadth may seem unsurprising given CR9114 affinity for each of these antigens and the diversity of the corresponding stem epitopes, this finding is novel and provides new molecular insight into the constraints on mutational pathways to antibodies with varying levels of breadth.

3. In lines 69-70, "bnAbs tend to have many more mutations than specific antibodies". This statement is mostly true for HIV bnAbs but not for influenza bnAbs. As described by Lingwood et al., (PMID: 22932267), "Influenza IGHV1-69-based broadly neutralizing antibodies undergo a relatively low degree of somatic mutation (an average of 14 amino acids in the heavy chain, n = 9)". In fact, many influenza antibodies have similar number of somatic mutations as CR9114 and CR6261 (PMID: 30795982).

We have removed this statement.

4. It is surprising to see significant second-order effects at relatively large distances (Figure 2F). What is the suggested explanation?

Second-order effects could be caused by direct interactions between neighboring residues but also by indirect interactions between residues, either through their interactions with antigen or through conformational rearrangements. The second-order effects we observe at relatively large distances likely result from indirect interactions, and we have clarified this in the text.

5. It is never clearly stated in results whether it's single chain (scFv) antibody, and if the HA is trimeric. If so, is there a potential for avidity so that the measurements are Kd,apparent for the multivalent interaction rather than monomeric Kd?

We have clarified that the constructs are scFv antibodies and that the HA is trimeric in the results. In regard to potential avidity, due to the density of scFv antibodies on the yeast cell (104-105, Bodor and Wittrup 1997) and the surface area of a budding yeast cell (60 mm2, Klis et al., 2014), it is unlikely that two scFv bind a single HA trimer (distance between stem epitopes on adjacent monomers < 10 nm). This potential issue is further avoided by sorting single cells, such that scFv on different yeast cells that could potentially interact with a single HA trimer would be discarded during sorting. Thus, we expect that the KD measurements made here reflect monomeric binding interactions between trimeric HA and a single scFv. Additionally, the original development of the yeast display system used here established that this method enables measurement of equilibrium dissociation constants (KD) that are comparable to solution-based measurements (Bodor and Wittrup 1997; Bodor and Wittrup 2003).

Other comments:

In both the abstract and beginning of results, it would be helpful to describe the libraries a bit more clearly: all combinations of mutations separating the germline and mature antibody in the VH domain among sites contacting the epitope.

We have edited the abstract to specify that the mutations are located in the heavy-chain for both antibodies. In the results, we have clarified that we are examining mutations separating the germline and somatic sequences. Although we did exclude 2 (CR9114) and 3 (CR6261) distant framework mutations from our libraries, we did not limit our libraries to sites contacting the epitope.

Avnir et al., PLoS Pathog 2014 (PMID: 24788925) has analyzed the somatic mutation pattern in IGHV1-69 influenza bnAbs. The authors should cite Avnir et al., to further substantiate the findings in this study. For example, Arnir et al., identified several commonly occurring somatic mutations in IGHV1-69 influenza bnAbs, including T29P, S35R, and S83F (i.e. T28P, S30R, S74F, respectively, in Kabat numbering), which have strong additive effect in CR6261 (Figure 2B).

We thank reviewers for pointing out this study and have added this citation to our discussion of single mutational effects.

Since evolution proceeds through single-nucleotide rather than amino acid exchanges, the germline and mature antibody nucleotide sequences should be presented (if known). The authors may consider adding a comment that evolution may not move directly through amino acid exchanges if the DNA sequence does not permit such a change through a single-nucleotide mutation.

The mature nucleotide sequences are published and the associated GenBank IDs have been added to our Materials and methods section for additional clarity. The germline nucleotide sequences are not published but can be inferred from the mature sequences using tools such as IMGT or IgBlast (as we did here for CR9114; we have added a description of this process to our Materials and methods). We have included the inferred germline heavy chain nucleotide sequence for CR9114 as Supplementary file 12. For CR6261, we did not infer the germline nucleotide sequence but instead designed our libraries based on the published germline amino acid sequence (Lingwood et al., 2012). Further, all nucleotide sequences used here to generate the antibody libraries (available as plasmid maps in Supplementary files 4-7) were codon optimized for expression in yeast. Thus, the nucleotide sequences of our libraries are relevant for examining evolution at the amino acid level, not the nucleotide level. Still, based on the inferred germline sequence for CR9114, several (4/16) of the amino acid changes occurred via multiple nucleotide mutations, which may occur in a single round of somatic hypermutation (Spisak 2020, Unniraman 2007), or over two distinct rounds. A comment to reflect this point has been added to the ‘Affinity to diverse antigens was likely acquired sequentially’ section of the Results.

I personally was very curious how the specific epistasis models the authors used compare to global epistasis models. This is well explained in the appendix, but might be helpful to mention in another sentence or two in main text as well.

We have added a sentence summarizing the results from the global epistasis analysis, in the paragraph where we introduce the epistasis modeling approach.

In the non-global-epistasis model, is anything done to handle the censoring of the data at the high and low end of the affinity scale? I was confused by this because line 115 says values outside the range are pinned to the boundaries, but then in Appendix (line 1282) it seems to suggest censoring isn't an issue.

For all epistasis analysis approaches, we pin inferred –logKD values outside of the titration range to our titration boundaries, as other Tite-Seq experiments have done previously (Adams et al., 2016, 2019). For the high-affinity boundary (10-11 or 10-12 M), we never observe binding affinities within an order of magnitude of the boundary, so pinning does not occur. The Appendix sentence in question is referring to the “diminishing returns” pattern among high-affinity variants, which therefore is due to epistasis rather than data censoring. However, at the low-affinity boundary, we do observe many variants with no detectable binding, especially for CR9114 binding to H3 and influenza B. As these concentrations are high enough that nonspecific binding becomes appreciable (Batista 1998), we believe it is not meaningful to assign –logKD values lower than the boundary (6 or 7 depending on the antigen). Rather than discard these variants altogether, we choose to preserve some measure of their low binding affinity (albeit biased) by assigning them the boundary value and proceeding with inference as described.

The interactive browser is cool! Would be nice if mutations were either labeled by the amino acids, or there was clearer explanation which amino acid identity is 0 and which is 1 for the violin plots. Also, might be worth adding text explaining the browser is only available for CR9114.

We are clarifying the amino acid identity on the violin plots in the data browser (will be completed prior to publication). We have also referred to the data browser in the legend of Figure 3 and in the Data Availability statement we specify that the browser is only available for CR9114 (we find that the smaller size of the CR6261 dataset makes the data browser less helpful).

Line 41: I think there are actually now a lot of broadly neutralizing antibodies, so I'm not sure how accurate "a handful" is here.

This language was chosen to reflect the rare frequency of bnAbs in natural repertoires. Upon review we agree that this is confusing and have changed this to ‘numerous’.

Line 70: "less broad" might be better than "specific" because even bNAbs are specific (for flu HA for instance) rather than "sticky" to everything.

We have made this change.

I had a hard time understanding the numbers in Figure 1A.

To clarify the numbering in Figure 1A we added lines to connect the residue number to the corresponding amino acid. This numbering is IMGT unique numbering, which is defined such that different receptors and chains can be easily compared. We have added a note to the figure caption regarding the numbering system.

I found Figure 1G confusing, mostly because I didn't (and still don't) really understand all the boxes with solid and dotted black lines labeled with various HAs.

We agree that the caption does not clearly explain Figure 1G and have added more detail to explain the insets.

Line 288-289: I think we can be virtually certain that the person wasn't infected with H9 since it's not a human virus subtype.

Although we agree that infection with H9 is unlikely, we also don’t wish to imply that we can definitely rule it out. We have adjusted the language to 'this breadth was likely acquired by chance’.

It would be useful to list the surface area each residue in the paratope for both antibodies contributes to binding where such information is available.

This information is not available in the published structural characterizations of these antibodies (Ekiert et al., 2009, Dreyfus et al., 2012), but we have calculated this using ChimeraX (Pettersen 2021) and find that it correlates well with our previous analysis of the number of proximal HA residues (both metrics are now available in Figure 2–source data 3). Because it is a more standard calculation, we decided to plot this contact surface area in Figure 2C and Figure 2—figure supplement 1 instead of the number of proximal HA residues, and we thank reviewers for this suggestion.

Is it possible to convert the effect scores into a free energy of binding contribution? Are there error estimates for the effect scores that would allow one to assess whether apparent differences in effect scores are statistically significant?

The –logKD values are directly proportional to DGbinding, as DG = –RTlnK, and mutation coefficients are inferred from linear regression on these values. Thus effect scores (additive and epistatic) are directly proportional to changes in free energy of binding upon mutation. We have clarified this relationship in the main text. For all effect scores, the standard error, p-value, and 95% confidence interval resulting from ordinary least squares regression (with Bonferroni correction by the number of parameters) is provided in Figure 2–source data 1 (CR9114) and Figure 2–source data 2 (CR6261). In all figures, we either display effect score standard errors as error bars where possible (eg Figure 2A,B, Figure 3F) or we display only statistically significant effect scores (assessed as a Bonferroni-corrected p-value < 0.05, eg Figure 2D,E, Figure 3A), as indicated in the respective captions.

Larger epistatic effect scores occur between residues with the largest contributions to binding. Is this expected?

This is not necessarily expected. In our datasets, this is often true but is not always the case, as illustrated by the strong third-order interaction between CR9114 mutations 57, 64, and 65, which have only weak individual effects (Figure 3E). We also do not expect this effect to arise as an artifact of our statistical inference approach, as we jointly infer coefficients at all orders for all sites.

The authors may consider changing "CR-9114" and "CR-6261" to "CR9114" and "CR6261", respectively, which are the more commonly used nomenclatures for these two antibodies in the field.

We have made this change.

It is unclear to me which antibody numbering scheme (e.g. Kabat, Chothia, IMGT) is being used for numbering amino acid residues in this study. Also, naming a position as "112.1" in CR6261 seems a bit odd. Should that be "112a"?

This numbering is IMGT unique numbering, which is defined such that different receptors and chains can be easily compared. We have added a note to the figure caption regarding the numbering system.

In panel B of Figure 1—figure supplement 5, the correlation between Tite-Seq mean expression and isogenic expression fluorescence is quite high for CR9114 variants but low for CR6261 variants. Is there a reason for that discrepancy?

These specific variants were selected to span a wide range of –logKD to validate the Tite-Seq measurements with isogenic measurements. Across both CR9114 and CR6261 libraries, the variation in expression is quite modest and thus has relatively low correlation between replicate Tite-Seq measurements (Figure 1—figure supplement 5A) and with isogenic fluorescence measurements (Figure 1—figure supplement 5B). For this reason, and because the antibody sequences are displayed on yeast in an scFv format, we do not draw major conclusions from the expression data. We suspect that the higher correlation for CR9114 variants is related to the specific variants chosen, rather than a general feature of the CR9114 library (e.g. CR9114 S83F was one of the isogenic variants and has the strongest impact on expression; CR6261 S83F also has the strongest impact on expression but was not reconstructed for isogenic measurements). Still, we present these data to demonstrate that (1) scFv expression was not strongly impacted by sequence identity and that (2) some mutations have modest effects on expression that may be inversely correlated their impact on binding affinity (Figure 1—figure supplement 5E). The second point is statistically significant when examining all sequences in a given library (Figure 1—figure supplement 5E), and although we refrain from equating yeast scFv expression with antibody stability, this finding may be of interest to others in the protein evolution community and is generally consistent with other studies of protein evolution (Tokuriki and Tawfik 2009).

In lines 224-225, "In contrast, a specific set of many mutations with strong synergistic interactions is required to bind H3, and to an even greater extent, influenza B". It is not very clear to me which analysis supports this claim. Can the authors clarify a bit?

We have added a reference to Figure 2A, which shows the nested sets of required mutations for binding H3 and influenza B, to support this claim more clearly.

In caption of Figure 5, while the meaning of "R" is described (random mixed scenario), the meaning of "O" and "A" should also be explained.

This has been added to the caption of Figure 5.

Among the somatic mutations in CR9114, I57S, K82I, and S83F are particularly important for binding to H3 (lines 182-183 and Figure 2D). Can the author provide a plausible structural explanation?

CR9114 binds H3 through the flexibility of its combining site, which enables it to make favorable interactions with different conformations of Trp21 and accommodate Asn49 as well as the Asn38 glycan in group 2 HAs (Dreyfus et al., 2012). Here, we find that K82I and S83F are important for binding H1, H3 and influenza B HA, which is likely due to hydrophobic interactions between these mutated residues and a conserved hydrophobic patch in HA. Similarly, we find that I57S is important for binding each antigen, though for H1 it only improves affinity in specific sequence backgrounds. I57S is in CDR2 of CR9114, and although the contact surface area with H3 is quite modest (Figure 2—figure supplement 1), it may enhance binding of CR9114 to H3 by reorienting neighboring residues in the CDR2 loop, which interacts directly with Trp21 and Asn49, and likely accommodates the glycan at Asn38. Because we do not have the co-crystal structure with the germline antibody, we can only speculate on the impact of each mutation and thus have added a brief plausible structural explanation to the text.

Similar to the comment above, it is interesting to see that T29P almost exclusively occurs as the first mutation under various selection scenarios (Figure 5J and Figure 5—figure supplement 3C-D). Can the author provide a plausible structural explanation?

T29P is the only single mutation that confers any detectable H1 binding affinity and is thus the most likely mutation to occur first in all selection models that begin with H1. Importantly, the affinity of this single mutant is only modestly improved relative to the germline sequence (–logKD = 7.53 +/- 0.04, compared to 7.0 +/- 0.0 for the germline). Still, this improvement is significant and reflects prior structural characterization of CR6261, which demonstrates that CDR1 (where T29P is located) makes extensive hydrogen bonding interactions with the A helix of HA2 (Ekiert et al., 2009). The contact area between Pro29 and HA are annotated in Figure 2C, and in the pathways analysis section of the results, we have added a sentence to clarify the strong preference for this mutation to occur first.

https://doi.org/10.7554/eLife.71393.sa2

Article and author information

Author details

  1. Angela M Phillips

    Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, United States
    Contribution
    Conceptualization, Software, Formal analysis, Supervision, Validation, Investigation, Methodology, Writing - original draft, Writing - review and editing
    Contributed equally with
    Katherine R Lawrence
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-9806-7574
  2. Katherine R Lawrence

    1. Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, United States
    2. NSF-Simons Center for Mathematical and Statistical Analysis of Biology, Harvard University, Cambridge, United States
    3. Quantitative Biology Initiative, Harvard University, Cambridge, United States
    4. Department of Physics, Massachusetts Institute of Technology, Cambridge, United States
    Contribution
    Conceptualization, Software, Formal analysis, Supervision, Validation, Investigation, Methodology, Writing - original draft, Writing - review and editing
    Contributed equally with
    Angela M Phillips
    Competing interests
    No competing interests declared
  3. Alief Moulana

    Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, United States
    Contribution
    Software, Formal analysis, Validation, Investigation, Writing - review and editing
    Competing interests
    No competing interests declared
  4. Thomas Dupic

    Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, United States
    Contribution
    Software, Formal analysis, Methodology, Writing - review and editing
    Competing interests
    No competing interests declared
  5. Jeffrey Chang

    Department of Physics, Harvard University, Cambridge, United States
    Contribution
    Formal analysis, Validation, Investigation, Writing - review and editing
    Competing interests
    No competing interests declared
  6. Milo S Johnson

    Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, United States
    Contribution
    Software, Visualization, Writing - review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-0169-2494
  7. Ivana Cvijovic

    Department of Applied Physics, Stanford University, Stanford, United States
    Contribution
    Conceptualization, Investigation, Methodology, Writing - review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-6272-2979
  8. Thierry Mora

    Laboratoire de physique de ÍÉcole Normale Supérieure, CNRS, PSL University, Sorbonne Université, and Université de Paris, Paris, France
    Contribution
    Conceptualization, Supervision, Funding acquisition, Writing - review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-5456-9361
  9. Aleksandra M Walczak

    Laboratoire de physique de ÍÉcole Normale Supérieure, CNRS, PSL University, Sorbonne Université, and Université de Paris, Paris, France
    Contribution
    Conceptualization, Supervision, Funding acquisition, Writing - review and editing
    Competing interests
    Senior editor, eLife
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-2686-5702
  10. Michael M Desai

    1. Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, United States
    2. NSF-Simons Center for Mathematical and Statistical Analysis of Biology, Harvard University, Cambridge, United States
    3. Quantitative Biology Initiative, Harvard University, Cambridge, United States
    4. Department of Physics, Harvard University, Cambridge, United States
    Contribution
    Conceptualization, Supervision, Funding acquisition, Writing - review and editing
    For correspondence
    mdesai@oeb.harvard.edu
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-9581-1150

Funding

Howard Hughes Medical Institute (Hanna H. Gray Postdoctoral Fellowship)

  • Angela M Phillips

Hertz Foundation (Graduate Fellowship Award)

  • Katherine R Lawrence

National Science Foundation (Graduate Research Fellowship Program)

  • Katherine R Lawrence
  • Jeffrey Chang
  • Milo S Johnson

European Research Council (COG 724208)

  • Thierry Mora
  • Aleksandra M Walczak

National Institutes of Health (GM104239)

  • Michael M Desai

National Science Foundation (PHY-1914916)

  • Michael M Desai

Stanford University (Stanford Science Fellowship)

  • Ivana Cvijovic

Human Frontier Science Program

  • Thomas Dupic

NSF-Simons Center for Mathematical and Statistical Analysis of Biology at Harvard (1764269)

  • Katherine R Lawrence

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We thank Rhys Adams for helpful discussion of the Tite-Seq experiments, Zach Niziolek for assistance with flow cytometry, Kevin McCarthy for help with antigen production, Matt Melissa for help acquiring strains and protocols, and Tyler Starr and members of the Denic, Gaudet, and Wittrup labs for help with experimental protocols. We also thank Jesse Bloom, Andrew Murray, and Michael Laub for helpful discussion and members of the Desai lab for comments on the manuscript. The computations in this paper were run on the FASRC Cannon cluster supported by the FAS Division of Science Research Computing Group at Harvard University.

Senior Editor

  1. Satyajit Rath, Indian Institute of Science Education and Research (IISER), India

Reviewing Editor

  1. Sarel Jacob Fleishman, Weizmann Institute of Science, Israel

Reviewers

  1. Jesse D Bloom, Fred Hutchinson Cancer Research Center, United States
  2. Nicholas C Wu, University of Illinois at Urbana-Champaign, United States

Publication history

  1. Preprint posted: May 25, 2021 (view preprint)
  2. Received: June 17, 2021
  3. Accepted: September 5, 2021
  4. Accepted Manuscript published: September 7, 2021 (version 1)
  5. Version of Record published: September 27, 2021 (version 2)

Copyright

© 2021, Phillips et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 209
    Page views
  • 44
    Downloads
  • 0
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)

Further reading

    1. Evolutionary Biology
    2. Microbiology and Infectious Disease
    Emily R Ebel et al.
    Research Article

    The replication of Plasmodium falciparum parasites within red blood cells (RBCs) causes severe disease in humans, especially in Africa. Deleterious alleles like hemoglobin S are well-known to confer strong resistance to malaria, but the effects of common RBC variation are largely undetermined. Here we collected fresh blood samples from 121 healthy donors, most with African ancestry, and performed exome sequencing, detailed RBC phenotyping, and parasite fitness assays. Over one third of healthy donors unknowingly carried alleles for G6PD deficiency or hemoglobinopathies, which were associated with characteristic RBC phenotypes. Among non-carriers alone, variation in RBC hydration, membrane deformability, and volume was strongly associated with P. falciparum growth rate. Common genetic variants in PIEZO1, SPTA1/SPTB, and several P. falciparum invasion receptors were also associated with parasite growth rate. Interestingly, we observed little or negative evidence for divergent selection on non-pathogenic RBC variation between Africans and Europeans. These findings suggest a model in which globally widespread variation in a moderate number of genes and phenotypes modulates P. falciparum fitness in RBCs.

    1. Evolutionary Biology
    2. Genetics and Genomics
    Milton Tan et al.
    Research Article Updated

    Chondrichthyes (cartilaginous fishes) are fundamental for understanding vertebrate evolution, yet their genomes are understudied. We report long-read sequencing of the whale shark genome to generate the best gapless chondrichthyan genome assembly yet with higher contig contiguity than all other cartilaginous fish genomes, and studied vertebrate genomic evolution of ancestral gene families, immunity, and gigantism. We found a major increase in gene families at the origin of gnathostomes (jawed vertebrates) independent of their genome duplication. We studied vertebrate pathogen recognition receptors (PRRs), which are key in initiating innate immune defense, and found diverse patterns of gene family evolution, demonstrating that adaptive immunity in gnathostomes did not fully displace germline-encoded PRR innovation. We also discovered a new toll-like receptor (TLR29) and three NOD1 copies in the whale shark. We found chondrichthyan and giant vertebrate genomes had decreased substitution rates compared to other vertebrates, but gene family expansion rates varied among vertebrate giants, suggesting substitution and expansion rates of gene families are decoupled in vertebrate genomes. Finally, we found gene families that shifted in expansion rate in vertebrate giants were enriched for human cancer-related genes, consistent with gigantism requiring adaptations to suppress cancer.