The amino acid composition of all positions (y-axis) was quantified by calculating the frequency that amino acids appeared within the core 1a.3.V genes of the reference sequence, HIMB83, whereas amino acid composition in SAAVs (x-axis) was quantified by calculating the frequency that an amino acid was one of the two dominant alleles across the 738,324 SAAVs. The black diagonal line signifies a one-to-one correspondence between these two variables, and black vertical lines illustrate the deviation of amino acids from this null expectation. Vertical lines are labeled with percent differences between frequencies of amino acids in SAAVs relative to all positions. The linear correlation coefficient of panel A is 0.81, with an R-squared of 0.65, and a probability that no correlation exists of . Panel B shows a comparison between the average solvent accessibility of amino acids (x-axis) to the percent difference between frequencies of amino acids in SAAVs relative to all positions (y-axis). Average solvent accessibilities for amino acids were taken from Table 2 of Bordo and Argos (1991). The linear correlation coefficient of panel B excluding isoleucine and valine (in red) is 0.64, with an R-squared of 0.41, and a probability that no correlation exists of 0.004. Including isoleucine and valine, these values are 0.40, 0.16, and 0.08, respectively. The blue line shows the line of best fit excluding isoleucine and valine, with shaded in regions representing 95% confidence intervals.