Overview of the emergence of SIV into apes, ultimately giving rise to several strains of HIV-1.

The figure shows, in the green box, the SIV reservoir that exists in African monkeys. Chimpanzees became infected from this reservoir and new virus was created, SIVcpz (Bailes et al., 2003; Sharp et al., 2005). From there, chimpanzees infected both gorillas and humans. The final two great ape species, orangutans and bonobos, are not known to harbor any form of SIV.

Receptor mediated resistance to SIVcpz entry is a trait acquired during ape speciation.

(A) Cladogram of CD4 sequences among apes and the nodes for which ancestral sequences were reconstructed. The virion diagram next to some ape species represent apes that are infected by SIV/HIV. (B) An amino acid alignment of the CD4 D1 domain of human, chimpanzee, gorilla, and the inferred ancestral CD4 sequences. Dots represent identical residues compared to human and distinct amino acids and numerical positions are noted. Bolded residues on the human sequence represent sites known to directly interact with HIV-1 Envelope (Liu et al., 2017). (C) Cladogram of HIV-1 and SIVcpz was based on previously published work (Takehisa et al., 2007), highlighting genetic relationships of the envelope (Env) clones used in this study. (D, E) HIV-1ΔEnv-GFP viruses were pseudotyped with Env’s (top of graphs) from diverse (D) SIVcpz or (E) HIV-1 strains. Cf2Th cells stably expressing human CCR5 and various CD4s (X-axis) were infected with various volumes of these pseudoviruses and then analyzed by flow cytometry 48 hours post-infection. GFP positive cells were enumerated and virus titers (transducing units per milliliter; TDU/mL) were determined for those samples falling within the linear infection range (n = 2 titration points). The mean virus titers obtained from each of three independent experiments were plotted (dots), with error bars representing the standard error of the mean (SEM). Dotted lines represent the lower limit of detection for this assay. SIVcpz-Ptt and SIVcpz-Pts refer to SIVs derived from the chimpanzee subspecies Pan troglodytes troglodytes or Pan troglodytes schweinfurthii, respectively.

Identification of diverse gorilla CD4 alleles.

(A) Eight unique protein variants of gorilla CD4 were identified. The polymorphic sites (red arrows) are shown in the alignment, where dots indicate amino acid residues that are identical to human. (B) The frequencies of the eight CD4 protein haplotypes are shown for three gorilla subspecies, Gorilla gorilla gorilla (n = 28), Gorilla beringei graueri (n = 3), and Gorilla gorilla diehli (n = 1). (C) Cryo-EM structure of an HIV-1 Env trimer in complex with human CD4 (PDB 5U1F) was visualized in ChimeraX (Goddard et al., 2017). Individual gp120 and gp41 subunits are colored in light and dark blue, respectively. The CD4 D1 domain (red) and D2-D4 domains (gray) are shown, with gorilla SNPs shown on the human sequence as purple spheres.

Gorilla CD4 alleles differentially support entry of SIVcpz.

(A) HIV-1ΔEnv-GFP viruses were pseudotyped with Envs (top of graphs) from diverse SIVcpz strains. Cf2Th cells stably expressing human CCR5 and various CD4s (X-axis) were infected with various volumes of these pseudoviruses and then analyzed by flow cytometry 48 hours post infection. GFP positive cells were enumerated and virus titers (transducing units per milliliter; TDU/mL) were determined for those samples falling within the linear infection range (n = 2 titration points). The mean virus titers obtained from each of three independent experiments were plotted (dots), with error bars representing the standard error of the mean (SEM). (B) Data from each pseudotyped Env in A were used to calculate virus titer means normalized to human CD4 expressing cells and were plotted as a heat map, where red and white represent high susceptibility or resistance to viral entry, respectively. CD4 alleles and Envs were hierarchically clustered to depict similarities in phenotype.

The CD4 SNPs found in gorilla populations are functionally significant.

(A-E) HIV-1ι1Env-GFP viruses were pseudotyped with Envs from diverse SIVcpz isolates (MB897, blue; EK505, orange; MT145, green; TAN1.910, purple). Cf2Th cells stably expressing human CCR5 and wild-type (wt) or human or gorilla CD4s with point mutations (X-axis) were infected with these pseudoviruses and then the percent cells infected (GFP-positive cells) were enumerated by flow cytometry 48 hours post infection. Data represent the mean +/- SEM from two independent experiments, each with two technical replicates. Stars above data sets signify that both independent experiments showed significant statistical differences (p < 0.05) when compared to wild-type by one-way ANOVA. (C) Lysates of Cf2Th cells stably expressing the indicated CD4 receptors in “A” and “B” were treated with PNGase F (to remove N-specific glycans) or left untreated and then probed for CD4 expression by western blotting. The number of N-specific glycosylation sites within the D1-domain of CD4 was determined computationally (Gupta and Brunak, 2002) and is shown under the blot. β-Actin served as a loading control.

Positive selection has shaped CD4 polymorphism in host species where SIV has long been endemic.

(A) Mean and standard error of mean of synonymous and nonsynonymous nucleotide heterozygosity (pi) at CD4 and neighboring loci across species endemically infected with SIV (chimpanzee and gorilla) or recently/uninfected (human, bonobo and orangutans). Schematic along bottom of each graph depicts the relative location of each locus as follows 5’ to 3’: ZNF384, PIANP, COPS7A, MLF2, PTMS, CD4, GPR162, GNB3, CDCA3, TPI1, LRRC23 and ENO2. Mann-Whitney test indicates whether heterozygosity at CD4 is significantly different than neighboring loci. (B) Schematic of CD4 domain regions. Ticks above and below the CD4 box indicate the location of polymorphic sites for the infected and un/recent infected species groups, respectively. One of the polymorphic residues in gorilla contains two non-synonymous changes in a single codon, marked by a star above the tick. (C) 2x2 contingency table and test results comparing synonymous and nonsynonymous polymorphism location relative to domain 1 between infected and recently/uninfected species groups.

Flow cytometry gating strategy.

A) Collected events were selected for live cells and singlets based on forward and side scatter values. Singlets were gated for CD4 and CCR5 fluorescent signal and then the double-positive poulation (Q2) was further analyzed for viral infection based on a shift in GFP fluorescence compared to virus exposed cells lacking CD4/CCR5 receptors (empty vector transduced cells). B) Expression levels for CD4 and CCR5 were compared amongst all stable cell lines under uninfected and infected conditions, demonstrating that viral infection does not impact receptor expression levels. For empty vector control cells, singlets were used for comparison. Data shown are representative of multiple independent experiments.

Gorilla CD4 alleles differentially support entry of HIV-1.

(A) HIV-11′Env-GFP viruses were pseudotyped with Envs (top of graphs) from globally diverse HIV strains. Cf2Th cells stably expressing human CCR5 and various CD4s (X-axis) were infected with various volumes of these pseudoviruses and then analyzed by flow cytometry 48 hours post infection. GFP positive cells were enumerated and virus titers (transducing units per milliliter; TDU/mL) were determined for those samples falling within the linear infection range (n = 2 titration points). The mean virus titers obtained from each of three independent experiments were plotted (dots), with error bars representing the standard error of the mean (SEM). (B) Data from each pseudotyped Env in D were used to calculate virus titer means normalized to human CD4 expressing cells and were plotted as a heat map. CD4 alleles and Envs were hierarchically clustered to depict similarities in phenotype.

Single nucleotide polymorphisms in ape species.

Population nucleotide diversity at a locus is estimated either based on the number of single nucleotide polymorphisms (SNPs; Watterson’s 𝜭ω(Watterson, 1975)) or mean pairwise difference between individuals (𝜭Π (Tajima, 1983)). Level of variability based on the number of single nucleotide polymorphisms at a locus (𝜭Π) is not significantly different between CD4 and neighboring loci. Y-axis shows the mean and standard error of 𝜭Π for synonymous and nonsynonymous nucleotide variants at CD4 and neighboring loci across species endemically infected with SIV (chimpanzee and gorilla) or recently/uninfected (human, bonobo and orangutans). Schematic along bottom of each graph depicts the relative location of each locus and are as follows 5’ to 3’: ZNF384, PIANP, COPS7A, MLF2, PTMS, CD4, GPR162, GNB3, CDCA3, TPI1, LRRC23 and ENO2. Mann-Whitney test indicates whether heterozygosity at CD4 is significantly different than neighboring loci. We observe no difference in the total number of nonsynonymous and synonymous SNPs, represented by 𝜭ω between CD4 and its genomic neighbors in the endemic and un/recently infected species (see also Table S1).

CD4 is under positive selection in primates.

Previous studies have identified CD4 as an HIV-1 cofactor that is evolving under positive (diversifying) selection (Meyerson et al., 2014; Zhang et al., 2008). However, these studies were limited in that they analyzed CD4 sequences from a narrow set of primate species (Zhang et al., 2008), or included a larger species panel but lacked the complete CD4 coding sequence (Meyerson et al., 2014). To extend on these studies, we collected full-length CD4 sequences from 25 primate species (Fig. 1A) and tested for evidence of site-specific selective pressures using the codeml program on the Phylogenetic Analysis by Maximum Likelihood (PAML) package. (A) Cladogram of the primate species (n = 25) analyzed in this study. (B) The most amino-terminal extracellular domain of CD4 (domain 1, D1) is bound by the primate lentivirus (HIV/SIV) envelope glycoprotein (Env) during entry (Bour et al., 1995). We next sought to assess whether D1 alone is evolving under positive selection (presumably due to selective pressures exerted by SIVs), or if other regions of CD4 are also experiencing selective pressures for diversification. Site-specific selective pressures in primate CD4 (full gene; top), the CD4 D1 domain alone (amino acids 26-123; middle), and CD4 minus the signal peptide and the D1 domain (amino acids 123-458; bottom) were detected using the phylogenetic analysis maximum-likelihood (PAML) program (Yang, 2007a). Positive selection among amino acid sites was tested using two model comparisons, M7 vs. M8 and M8a vs. M8. In each of these comparisons, the null models (M7, M8a) do not allow for sites under positive selection, while the alternative model (M8) does. Tables summarize the likelihood ratio test between the M7-M8 and M8a-M8 models. The 21′lnL value (twice the difference in the natural log of the likelihoods) is shown, along with the p- value with which the neutral models (M7 or M8a) are rejected in favor of the model of positive selection (M8). (C) To further identify codon sites in CD4 under positive selection, we calculated the posterior probability of ω > 1 (where ω is the dN [nonsynonymous]/dS [synonymous] rate ratio, and values > 1 in the model M8 indicate sites under selection) using the Bayes empirical Bayes approach. Plot of posterior probabilities (ω>1 under maximum likelihood random-sites model M8) for all CD4 sites. Sites under positive selection (pω > 0.9) are shown in red. (D) The posterior mean of ω over a sliding window of 80 amino acids is shown (green line), along with the overall mean of ω across the entire gene (grey line). In both panels C and D, the amino acid positions are shown in relationship to human CD4, and the D1 domain of CD4 is highlighted in orange. (E) Cryo-EM structure of an HIV-1 Env trimer in complex with human CD4 (PDB 5U1F) was visualized in ChimeraX (Goddard et al., 2017). Individual gp120 and gp41 subunits are colored in light and dark blue, respectively. The CD4 D1 domain (red) and D2-D4 domains (gray) are shown, with sites under positive selection (Pω > 0.9) shown on the human sequence as red spheres. 9 of the 12 sites passing this stringent cutoff map to the Env- CD4 D1 domain interface.