Figures and data

Beneficial mutations occur in clusters along the genome.
Inferred fitness effects of HIV-1 mutations in CH505 (A) and CH848 (B). The position along the radius of each circle specifies the strength of selection: mutations plotted closer to the center are more deleterious, while those closer to the edge are more beneficial. For both individuals, clusters of beneficial mutations are observed in the variable loops of Env, some of which are associated with antibody escape. For CH848, a group of strongly beneficial mutations also appears in Nef.

Visualization of the effects of HIV-1 mutations in CH505 on the Env trimer.
A, Side view of the Env trimer 53, with detail views of selection for mutations in the CH103 binding site (B), CH235 binding site (C), and sites associated with escape from autologous strain-specific antibodies (D). E, Top view of the trimer, with detail views of variable loops (F-I) and the CD4 binding site (J). Generally, beneficial mutations appear more frequently near exposed regions at the top of the Env trimer, and deleterious mutations appear in more protected regions. Mutations near the V1/V2 apex can affect the binding and neutralization of antibodies targeting the CD4 binding site 22.

Trajectories and inferred selection coefficients for immune escape mutations and mutations affecting Env glycosylation.
A, For CH505, early, strongly selected mutations include ones that escape from CD8+ T cells and autologous strain-specific antibodies (ssAbs). More moderately selected bnAb resistance mutations tend to arise later. Note that mutations that affect bnAb resistance can appear in the viral population before bnAbs are generated. Open circles and error bars reflect the mean and standard deviation of inferred selection coefficients in each category. B, For CH848, mutations affecting glycosylation dominate the early phase of evolution, followed later by mutations affecting bnAb resistance. For easier visualization, frequency trajectories are shown with exponential smoothing with a time scale of t = 50 days. See Supplementary Fig. 3 for a detailed view of mutation frequency trajectories by mutation type without smoothing.

Example trajectories and selection coefficients for SHIV mutations that affect viral load, bnAb recognition, or glycosylation.
A, In RM5695, infected with SHIV.CH505, mutations known to increase viral load 60 and ones affecting glycosylation were rapidly selected. Mutations affecting resistance to broad antibodies 20 arose later under moderate selection. Open circles and error bars reflect the mean and standard deviation of inferred selection coefficients in each category. B, Slower but qualitatively similar evolutionary patterns were observed in RM6163, infected with SHIV.CH848. For easier visualization, frequency trajectories are shown with exponential smoothing with a time scale of t = 50 days.

Inferred fitness landscapes in HIV-1 and SHIV are highly similar.
A, Fitness of SHIV.CH505 sequences relative to the TF sequence across 7 RMs, including 2 that developed bnAbs, 1 that developed tier 2 nAbs that lacked a critical mutation for breadth, and 4 that did not develop broad antibody responses, evaluated using fitness effects of mutations using data from CH505 and using RM data. The fitness values are strongly correlated, indicating the similarity of Env fitness landscapes inferred using HIV-1 or SHIV data. Values are normalized such that the fitness gain of the TF sequence is zero. B, Fitness values for SHIV.CH848 sequences also show strong agreement between CH848 and SHIV.CH848 landscapes.

Rapid SHIV fitness gains precede the development of broadly neutralizing antibodies.
For both SHIV.CH505 (A) and SHIV.CH848 (B), viral fitness gains over time display distinct patterns in RM hosts that developed bnAbs versus those that did not. Notably, the differences in SHIV fitness gains between hosts with and without broad antibody responses appear before the development of antibody breadth, and cannot be attributed to selection for bnAb resistance mutations. RM6072 is an unusual case, exhibiting antibody development that was highly similar to CH505. Although RM6072 developed tier 2 nAbs, they lacked key mutations critical for breadth 20. Points and error bars show the mean and standard deviation of fitness gains across SHIV samples in each RM at each time.

Weak correlation between sequence variability, as measured by entropy, and inferred selection.
To quantify sequence variability, we computed “site-dependent entropy” scores 

CH848 exhibits similar spatial patterns of selection coefficients; significantly strongly selected mutations are located at the apex, particularly on the edge of the Env protein.
A and B, Side and top views of the Env protein with inferred selection coefficient values. The apex region is enriched with moderately and strongly beneficial mutations. C-F, Mutations that are presumably resistant to bnAbs and autologous nAbs are highlighted. The selection values in the autologous nAbs binding regions were slightly larger than those in the bnAbs binding regions. G-K, Top views highlight the individual variable regions.

Frequencies of different types of HIV-1 mutations over time.
CH505 (A) and CH848 (B) mutation frequencies over time. Mutation types are the same as in Fig. 3 in the main text, but with all mutations affecting resistance to bnAb lineage antibodies grouped together.

Weak association between the number of RMs in which a SHIV mutation was observed and its inferred fitness effect.
Inferred selection coefficients for SHIV.CH505 (A) and SHIV.CH848 (B) mutations, sorted by the number of RMs in which the mutation was observed.

Broad similarity between HIV-1 and SHIV fitness landscapes with the same TF Env sequence.
A, Fitness estimates for a sample of artificial Env sequences, obtained by independently shuffling observed amino acids in SHIV.CH505 sequences at each residue. This random sequence ensemble conserves single-residue frequencies, but not correlations between mutations. Even on these artificial sequences, the fitness estimates using a model trained on HIV-1 data strongly agree with the SHIV model (Pearson’s r = 0.84, p < 10−20). This implies that the similarity of the fitness landscapes is not confined to the specific genotypes observed in HIV-1 or SHIV evolution, but also extends to more distant sequences. B, Similar results also hold for fitness landscapes based on CH848 and SHIV.CH848 data (Pearson’s R = 0.74, p < 10−20).

Little correlation in fitness values estimated from evolutionarily distant sequences.
Comparison between estimated fitness values for HIV-1 sequences using fitness landscapes learned from CH505 and CH848 data. There is little correlation between the inferred landscapes. Furthermore, the CH505 landscape captures little variation in fitness for CH848 sequences (and vice versa). 199 mutations are shared between CH505 and CH848, among 868 and 1,406 total Env mutations, respectively.

Contributions of different types of SHIV mutations to viral fitness gains over time.
SHIV.CH505 (A) and SHIV.CH848 (B) mutations were grouped into four categories to assess their contributions to SHIV fitness gains over time. For SHIV.CH505, the mutations N334S, H417R, K302N, Y330H, N279D, and N130D are classified as viral load (VL) enhancing mutations, following ref. 109. For both SHIV.CH505 and SHIV.CH848, we then separated out the contributions of known antibody resistance mutations, including mutations that affect N-linked glycosylation motifs. We then computed the collective fitness contributions from subsets of mutations that affect N-linked glycosylation motifs that were not known to affect resistance to specific antibodies, and reversions to the HIV-1 subtype consensus sequence. Each mutation appears in only one category in this figure, sorted in the order above. For example, a mutation that affects an N-linked glycosylation motif and which is a reversion to the subtype consensus sequence, but which has not been established to affect resistance to a specific antibody, would have its contribution to fitness counted in the glycan category.

Most of the inferred epistatic interactions concentrate around zero, with only a small fraction exhibiting moderately larger absolute values.
represents the distribution of inferred epistatic interactions forCH505 (A), CH848 (B), SHIV.CH505 (C), and SHIV.CH848 (D).

Effective selection coefficients obtained from the epistatic model align well with the selection coefficients in the additive model.
We compared selection coefficients from the additive model analyzed throughout the manuscript to the effective selection coefficients from the epistatic model for CH505 (A) and CH848 (B). Intuitively, the effective selection coefficient is defined as the average difference in fitness when a particular mutation is replaced by the TF nucleotide/amino acid. For definiteness, let g represent an arbitrary sequence and let gi* represent a sequence that is identical to g, except that the nucleotide/amino acid at site i has been replaced by the TF one. The effective selection coefficient is then defined as 

Fitness values are consistent between the additive and epistatic models.
Compare the fitness values obtained from the additive and epistatic models for CH505 (A), CH848 (B), SHIV.CH505 (C), and SHIV.CH848 (D), respectively.

Selection coefficients are robust to finite sampling noise.
Selection coefficients from the full and bootstrap-resampled data are compared for CH505 (A), CH848 (B), SHIV.CH505 (C), and SHIV.CH848 (D). Each point and error bar represents the mean and confidence interval, respectively, based on 10 independently inferred selection coefficients from bootstrap samples. The bootstrap samples are obtained by uniformly resampling the same number of sequences from the sequence ensemble for each subject at each time point.

CH103 and CH235 resistance mutations.
List of mutations considered to be CH103 or CH235 resistance mutations, and supporting references. Here “X” refers to any amino acid.

Strain-specific antibody resistance mutations in CH505.
List of mutations considered to be strain-specific antibody resistance mutations and supporting references. Here “X” refers to any amino acid.

DH272, DH475, and strain-specific antibody resistance mutations in CH848.
List of mutations considered to be DH272, DH475, and strain-specific antibody resistance mutations, and supporting references. Here “X” refers to any amino acid.

Biological effects of the HIV-1 mutations inferred to be the most beneficial in CH505.
The table lists the 25 mutations in CH505 with the highest (i.e., most beneficial) selection coefficients, among over 2500 mutations in total. All of the top mutations are nonsynonymous. Mutations associated with immune evasion, changes in glycosylation, and reversions to the subtype C consensus sequence are frequently observed among these top mutations.

Biological effects of the HIV-1 mutations inferred to be the most beneficial in CH848.
This table is analogous to Supplementary Table 4, but for CH848. Unlike for CH505, longer sequences were available for CH848, including genes beyond Env. We observed multiple beneficial mutations in Nef and Env.

CH505 and SHIV.CH505 sequence statistics.
The number of sequences and time points listed here are given after filtering. Variable sites are defined as sites where at least two different nucleotides (including deletions) were observed.

CH848 and SHIV.CH848 sequence statistics.
The format of this table is the same as in Supplementary Table 6.

Biological effects of strongly selected SHIV.CH505 mutations.
Five out of the top six mutations have been shown to increase viral load in RMs 109. Selection coefficients and ranks are based on the joint evolutionary model across SHIV.CH505-infected RMs. These analyses focus on common mutations observed in three or more RMs.

Biological effects of strongly selected SHIV.CH848 mutations.
Selection coefficients and ranks are based on the joint evolutionary model across SHIV.CH848-infected RMs.

Selective advantage of mutations that confer resistance to antibodies in SHIV.CH505.
Identification of resistance mutations was performed by Roark and colleagues 88.

Selective advantage of mutations that confer resistance to antibodies in SHIV.CH848.
Identification of resistance mutations was performed by Roark and colleagues 88.

List of selected sites using LASSIE in SHIV.CH505.
This analysis was performed by Roark et al. 88.

List of selected sites using LASSIE in SHIV.CH848.
This analysis was performed by Roark et al. 88.