The MHC region in humans (HLA).

A) Each point at top represents the location of a gene. The different types of HLA genes are distinguished by different colors, shown in the key at left. The 19 functional HLA genes are labeled with their name (omitting their “HLA” prefix due to space constraints). Gray points represent non-HLA genes and pseudogenes in the region. The black line shows nucleotide diversity (Nei and Li’s π) across the region, while the pink horizontal line shows the genome-wide average nucleotide diversity (π ≈ 0.001) (Sachidanandam et al., 2001). B) Nucleotide diversity around classical Class I gene HLA-A, with exon structure shown. C) Nucleotide diversity around classical Class II gene HLA-DRB1, with exon structure shown. D) Species tree showing the phylogenetic relationships among selected primates from this study (Kuderna et al., 2023). The colors of the icons are consistent with colors used throughout the paper to distinguish species. The pink vertical dashed lines indicate split times of the NWM from the apes/OWM (39 MYA), OWM from the apes (31 MYA), and the lesser apes (gibbons) from the great apes (23 MYA).

BEAST2 allele summary trees using sequences from exon 2.

A) MHC-C, B) MHC-DQB1, and C) MHC-DOA. Each tip represents an allele, with color and four-letter abbreviation representing the species. The species label is followed by the allele name. For simplicity, monophyletic groups of similar alleles are collapsed with a triangle and labeled with their one-field allele name. The color/abbreviation key (center) also depicts the species tree (Kuderna et al., 2023). Human alleles (HLA; red) are bolded for emphasis. Dashed outgroup branches are scaled by a factor of to clarify tree structure within the clade of interest. The smaller inset tree in panel B highlights the relationships between two human allele groups (red) and two OWM allele groups (green). The indicated human and OWM lineages coalesce more recently between groups than within each group. Pri., primate backbone sequences; Mam., mammal outgroup sequences.

Strong support for TSP at Class I genes MHC-B and -C.

Bayes factors computed over the set of BEAST trees indicate deep TSP. Different species comparisons are listed on the y-axis, and different gene regions are listed on the x-axis. High Bayes factors (orange) indicate support for TSP among the given species for that gene region, while low Bayes factors (teal) indicate that alleles assort according to the species tree, as expected. Bayes factors above 100 are considered decisive. Tan values show poor support for either hypothesis, while white boxes indicate that there are not enough alleles in that category with which to calculate Bayes factors. MHC-A is not present in the NWMs, and MHC-C was not present before the human-orangutan ancestor, so it is not possible to calculate Bayes factors for these species comparisons.

Strong support for TSP at the classical Class II genes.

Bayes factors computed over the set of BEAST trees indicate deep TSP. Different species comparisons are listed on the y-axis, and different gene regions are listed on the x-axis. High Bayes factors (orange) indicate support for TSP among the given species for that gene region, while low Bayes factors (teal) indicate that alleles assort according to the species tree, as expected. Bayes factors above 100 are considered decisive. Tan values show poor support for either hypothesis, while white boxes indicate that there are not enough alleles in that category with which to calculate Bayes factors.

Rapidly-evolving sites in the Class I genes.

A) Rapidly-evolving sites are primarily located in exons 2 and 3. Here, the exons are concatenated such that the cumulative position along the coding region is on the x-axis. The dashed orange lines denote exon boundaries. The three genes are aligned such that the same vertical position indicates an evolutionarily equivalent site. The y-axis shows the substitution rate at each site, expressed as a fold-change (the base-2 logarithm of each site’s evolutionary rate divided by the mean rate among mostly-gap sites in each alignment; see Methods). B) Rapidly-evolving sites are located in each protein’s peptide-binding pocket. Structures are Protein Data Bank (Berman et al., 2000) 4BCE (Teze et al., 2014) for HLA-B, 4NT6 (Choo et al., 2014) for HLA-C, and 7P4B (Walters et al., 2022) for HLA-E, with images created in PyMOL (Sch, 2021). Substitution rates for each amino acid are computed as the mean substitution rate of the three sites composing the codon. Orange indicates rapidly-evolving amino acids, while teal indicates conserved amino acids. C) Rapidly-evolving amino acids are significantly closer to the peptide than conserved amino acids. The y-axis shows the BEAST2 substitution rate and the x axis shows the minimum distance to the bound peptide, measured in PyMOL (Sch, 2021). Each point is an amino acid, and distances are averaged over several structures (see Methods). The orange line is a linear regression of substitution rate on minimum distance, with slope and p-value annotated on each panel.

Rapidly-evolving sites in the Class II genes.

A) Rapidly-evolving sites are primarily located in exon 2. Here, the exons are concatenated such that the cumulative position along the coding region is on the x-axis. The dashed orange lines denote exon boundaries. The α genes (top two plots) are aligned such that the same vertical position indicates an evolutionarily equivalent site; the same is true for the β genes (bottom two plots). The y-axis shows the substitution rate at each site, expressed as a fold-change (the base-2 logarithm of each site’s evolutionary rate divided by the mean rate among mostly-gap sites in each alignment; see Methods). B) Rapidly-evolving sites are located in each protein’s peptide-binding pocket. Structures are Protein Data Bank (Berman et al., 2000) 5JLZ (Gerstner et al., 2016) for HLA-DR and 2NNA (Henderson et al., 2007) for HLA-DQ, with images created in PyMOL (Sch, 2021). Substitution rates for each amino acid are computed as the mean substitution rate of the three sites composing the codon. Orange indicates rapidly-evolving amino acids, while teal indicates conserved amino acids. C) Rapidly-evolving amino acids are significantly closer to the peptide than conserved amino acids. The y-axis shows the BEAST2 substitution rate and the x axis shows the minimum distance to the bound peptide, measured in PyMOL (Sch, 2021). Each point is an amino acid, and distances are averaged over several structures (see Methods). The orange line is a linear regression of substitution rate on minimum distance, with slope and p-value annotated on each panel.

Rapidly-evolving amino acids in MHC-B and their trait and disease associations.

Shown here are all amino acid positions in the MHC-B group evolving at more than twice the baseline rate (fold-change ≥ 1). Many corresponding positions in human HLA-B have associations with autoimmune or infectious diseases, biomarkers, or TCR phenotypes. Disease associations were collected from a literature search of HLA fine-mapping studies with over 1,000 cases (see Methods).

Possible unrooted trees of 4 alleles.

There is one tree where the human alleles are monophyletic, and two trees where they are non-monophyletic.