(A) The evolutionary probabilities (EP) of each amino acid in the S protein sequence are calculated by taking the multiple sequence alignment of the S proteins through their evolutionary tree and using Bayesian inferences to determine the likelihood of finding a particular residue at a particular location within a given sequence. Simply, if the residue is found at a location ‘x’ in closely related sequences, it will have a higher EP at location ‘x’ in the target sequence. Residues with an EP <0.05 in the target sequence are CAPs (Red). (B) The distribution of EP scores of the wild-type residues in the S protein. Here, lower EP scores are shown in red, and higher EP scores in blue. While the vast majority of the wild-type (reference) protein consists of high EP residues, a few residues have low EP. (C) The CAPs are also highlighted as red spheres in the open configuration of the S protein, with the open chain in a darker shade. We observe that a majority of the CAP positions reside at the receptor binding domain (RBD) and the Furin cleavage site (676-689; (Wrobel et al. 2020)) shown as transparent light gray spheres.

(A) Schematic representation of DCI asymmetry. (B) DCI asymmetry of CAP residue positions with the binding interface of RBD in the open chain. Residues in the closed chains with a low EP amino acid in the reference sequence dominate the binding site interface of RBD in the open chain. There is a significant difference between the asymmetry profiles of the closed (M = -0.06, SD = 0.33) and open (M = -1.68, SD = 0.89) conformations (p < .001).

The average number of variants observed among residues of different flexibility. Residues were sorted into one of five bins based on flexibility. After that, the average number of variants for residues within that bin was calculated. Here, the number of variants is defined as the number of different amino acid varieties found at that site. Mutational data was calculated across approximately 24,000 SARS-CoV-2 S protein sequences from the NCBI Datasets Project (Brister et al. 2015). Residue flexibility, as reported here via %DFI, was computed using structure id 6vsb from the Protein DataBank (Berman, H.M. et al. 2000). More rigid residues tend to have fewer variants. (r = 0.94).

(A) DCI asymmetry with low EP characteristic mutation sites of Delta or Omicron strains in the closed chains and the binding interface of RBD in the open chain. Delta displays a second peak closer to zero, suggesting that Delta mutation sites (M = -0.98, SD = 0.80) have less allosteric control over the hACE2 binding sites than Omicron mutation sites (M = -1.74, SD = 1.00) (p < .001). However, both sets of sites have far more control over hACE2 binding sites than expected, based on a random control group (M = 0.03, SD = 0.85) (p < .001). (B) S protein structure showing binding interface sites (transparent gray), Delta mutation sites (magenta), Omicron mutation sites (cyan), and sites mutated in both Omicron and Delta (Blue).

(A) Schematic representation of cross-chain EpiScore, describing i, j, in chain B and C respectively and its impact in RBD binding position k in the open RBD conformer chain A. (B) Colors indicate EpiScore values for given mutation pairs, averaged over hACE2 binding sites. Cross-chain residue pairs in the upper right tend to be highly epistatic, similar to pairwise second-order interaction coefficients from Moulana et al. (Nature comm 2022).

EpiScores with I = low EP Delta mutation sites (magenta), low EP Omicron mutation sites (Cyan), and a random selection of sites (gray), j = low EP sites in the Wuhan variant, and k = the binding interface of the open chain. EpiScores using sites of either variant (Delta: M = 0.70, SD = 0.50, Omicron M = 0.86, SD = 0.46) are significantly different (p<.001) from a set of EpiScores using random sites (M = 0.84, SD = 0.41), but the distribution for Delta variants differs much more from the other two. EpiScores for other variants can be found in Supplementary Figure S1.

EpiScores with i = low EP Delta mutation sites (magenta), low EP Omicron mutation sites (Cyan), and a random selection of sites (gray), j = site 486, and k = the binding interface of the open chain. CAP and hACE2 and antibody binding site 486 displays epistasis with almost all XBB 1.5 variant sites at almost every hACE2 binding site (M = 1.40, SD = 0.46) and presents a significantly different profile from other variant sites (p <.01). EpiScores involving 486 for other variants can be found in Supplementary Figure S1.

The %DFI calculations for variants Omicron, XBB, and XBB 1.5. (A) %DFI profile of the variants are plotted in the same panel. The grey shaded areas and dashed lines indicate the ACE2 binding regions, whereas the red dashed lines show the antibody binding residues. (B) The sum of %DFI values of RBD-ACE2 interface residues. The trend of total %DFI with the log of Kd values overlaps with the one seen with the experiments (R=0.97). (C) The RBD antibody binding residues are used to calculate the sum of %DFI. The ranking captured with the total %DFI agrees with the log of IC50 values from the experiments.

EpiScores with i = characteristic mutation sites, j = low EP sites, and k = the binding interface of the open chain. EpiScores using variant sites are significantly different (p<.001) from a set of EpiScores using random sites.

EpiScores with i = characteristic mutation sites within the NTD, j = low EP sites, and k = the binding interface of the open chain. NTD domain mutation sites result in markedly lower EpiScores compared to elsewhere.

EpiScores with i = characteristic mutation sites, j = site 486, and k = the binding interface of the open chain. CAP and hACE2 and antibody binding site 486 displays epistasis with almost all XBB 1.5 variant sites at almost every hACE2 binding site. EpiScores using variant sites are significantly different (p<.001) from a set of EpiScores using random sites.