The MHC region in humans (HLA).

A) Each point at top represents the location of a gene. The different types of HLA genes are distinguished by different colors, shown in the key at left. The 19 functional HLA genes are labeled with their name (omitting their “HLA” prefix due to space constraints). Gray points represent non-HLA genes and pseudogenes in the region. The black line shows nucleotide diversity (Nei and Li’s π) across the region, while the pink horizontal line shows the genome-wide average nucleotide diversity (π ≈ 0.001) (Sachidanandam et al., 2001). B) Nucleotide diversity around classical Class I gene HLA-A, with exon structure shown. C) Nucleotide diversity around classical Class II gene HLA-DRB1, with exon structure shown. D) Species tree showing the phylogenetic relationships among selected primates from this study (Kuderna et al., 2023). The colors of the icons are consistent with colors used throughout the paper to distinguish species. The pink vertical dashed lines indicate split times of the new-world monkeys (NWM) from the apes/old-world monkeys (OWM) (39 MYA), OWM from the apes (31 MYA), and the lesser apes (gibbons) from the great apes (23 MYA).

Figure 1—figure supplement 1. Nucleotide diversity around the rest of the HLA genes.

Figure 1—figure supplement 2. Color and abbreviation key/tree for all species used in this study.

Rapidly-evolving amino acids in MHC-B and their trait and disease associations.

Shown here are all amino acid positions in the MHC-B group evolving at more than twice the baseline rate (fold-change ≥ 1). Many corresponding positions in human HLA-B have associations with autoimmune or infectious diseases, biomarkers, or TCR phenotypes. Disease associations were collected from a literature search of HLA fine-mapping studies with over 1,000 cases (see Methods).

Data summary for Class I.

Each row represents a species, and each column represents a gene group. Each cell lists the number of alleles included for each gene represented by that gene group. Bolded entries are “backbone” sequences that are included in every group.

Data summary for Class IIA.

Each row represents a species, and each column represents a gene group. Each cell lists the number of alleles included for each gene represented by that gene group. Bolded entries are “backbone” sequences that are included in every group.

Data summary for Class IIB.

Each row represents a species, and each column represents a gene group. Each cell lists the number of alleles included for each gene represented by that gene group. Bolded entries are “backbone” sequences that are included in every group.

BEAST2 allele summary trees using sequences from exon 2.

A) MHC-C, B) MHC-DQB1, and C) MHC-DOA. Each tip represents an allele, with color and four-letter abbreviation representing the species (see Figure 1 —figure Supplement 2 for full species key). The species label is followed by the allele name (see Appendix 1 for more details on nomenclature) or RefSeq accession number. For simplicity, monophyletic groups of similar alleles are collapsed with a triangle and labeled with their one-field allele name. The color/abbreviation key (center) also depicts the species tree (Kuderna et al., 2023). Human alleles (HLA; red) are bolded for emphasis. Dashed outgroup branches are scaled by a factor of to clarify tree structure within the clade of interest. The smaller inset tree in panel B highlights the relationships between two human allele groups (red) and two OWM allele groups (green). The indicated human and OWM lineages coalesce more recently between groups than within each group. Pri., primate backbone sequences; Mam., mammal outgroup sequences.

Figure 2—figure supplement 1. MHC-A-Related BEAST2 tree using sequences from exon 2, with top Bayes factors.

Figure 2—figure supplement 2. MHC-B-Related BEAST2 tree using sequences from exon 2, with top Bayes factors.

Figure 2—figure supplement 3. MHC-C-Related BEAST2 tree using sequences from exon 2, with top Bayes factors.

Figure 2—figure supplement 4. MHC-E-Related BEAST2 tree using sequences from exon 2, with top Bayes factors.

Figure 2—figure supplement 5. MHC-F-Related BEAST2 tree using sequences from exon 2, with top Bayes factors.

Figure 2—figure supplement 6. MHC-G-Related BEAST2 tree using sequences from exon 2, with top Bayes factors.

Figure 2—figure supplement 7. MHC-DRA-Related BEAST2 tree using sequences from exon 2, with top Bayes factors.

Figure 2—figure supplement 8. MHC-DQA-Related BEAST2 tree using sequences from exon 2, with top Bayes factors.

Figure 2—figure supplement 9. MHC-DPA-Related BEAST2 tree using sequences from exon 2, with top Bayes factors.

Figure 2—figure supplement 10. MHC-DMA-Related BEAST2 tree using sequences from exon 2, with top Bayes factors.

Figure 2—figure supplement 11. MHC-DOA-Related BEAST2 tree using sequences from exon 2, with top Bayes factors.

Figure 2—figure supplement 12. MHC-DRB-Related BEAST2 tree using sequences from exon 2, with top Bayes factors.

Figure 2—figure supplement 13. MHC-DQB-Related BEAST2 tree using sequences from exon 2, with top Bayes factors.

Figure 2—figure supplement 14. MHC-DPB-Related BEAST2 tree using sequences from exon 2, with top Bayes factors.

Figure 2—figure supplement 15. MHC-DMB-Related BEAST2 tree using sequences from exon 2, with top Bayes factors.

Figure 2—figure supplement 16. MHC-DOB-Related BEAST2 tree using sequences from exon 2, with top Bayes factors.

Strong support for TSP at Class I genes MHC-B and -C.

Bayes factors computed over the set of BEAST trees indicate deep TSP. Different species comparisons are listed on the y-axis, and different gene regions are listed on the x-axis. Each table entry is colored and labeled with the maximum Bayes factor among all tested quartets of alleles belonging to that category. High Bayes factors (orange) indicate support for TSP among the given species for that gene region, while low Bayes factors (teal) indicate that alleles assort according to the species tree, as expected. Bayes factors above 100 are considered decisive. Tan values show poor support for either hypothesis, while white boxes indicate that there are not enough alleles in that category with which to calculate Bayes factors. MHC-A is not present in the NWMs, and MHC-C was not present before the human-orangutan ancestor, so it is not possible to calculate Bayes factors for these species comparisons. Figure 3—figure supplement 1. MHC-A-Related BEAST2 tree using sequences from exon 3, with top Bayes factors.

Figure 3—figure supplement 2. MHC-A-Related BEAST2 tree using sequences from exon 4, with top Bayes factors.

Figure 3—figure supplement 3. MHC-B-Related BEAST2 tree using sequences from exon 3, with top Bayes factors.

Figure 3—figure supplement 4. MHC-B-Related BEAST2 tree using sequences from exon 4, with top Bayes factors.

Figure 3—figure supplement 5. MHC-C-Related BEAST2 tree using sequences from exon 3, with top Bayes factors.

Figure 3—figure supplement 6. MHC-C-Related BEAST2 tree using sequences from exon 4, with top Bayes factors.

Figure 3—figure supplement 7. MHC-E-Related BEAST2 tree using sequences from exon 3, with top Bayes factors.

Figure 3—figure supplement 8. MHC-E-Related BEAST2 tree using sequences from exon 4, with top Bayes factors.

Figure 3—figure supplement 9. MHC-F-Related BEAST2 tree using sequences from exon 3, with top Bayes factors.

Figure 3—figure supplement 10. MHC-F-Related BEAST2 tree using sequences from exon 4, with top Bayes factors.

Figure 3—figure supplement 11. MHC-G-Related BEAST2 tree using sequences from exon 3, with top Bayes factors.

Figure 3—figure supplement 12. MHC-G-Related BEAST2 tree using sequences from exon 4, with top Bayes factors.

Figure 3—figure supplement 13. Support for TSP between OWM clades for the Class I genes.

Figure 3—figure supplement 14. Support for TSP between NWM clades for the Class I genes.

Strong support for TSP at the classical Class II genes.

Bayes factors computed over the set of BEAST trees indicate deep TSP. Different species comparisons are listed on the y-axis, and different gene regions are listed on the x-axis. Each table entry is colored and labeled with the maximum Bayes factor among all tested quartets of alleles belonging to that category. High Bayes factors (orange) indicate support for TSP among the given species for that gene region, while low Bayes factors (teal) indicate that alleles assort according to the species tree, as expected. Bayes factors above 100 are considered decisive. Tan values show poor support for either hypothesis, while white boxes indicate that there are not enough alleles in that category with which to calculate Bayes factors.

Figure 4—figure supplement 1. MHC-DRA-Related BEAST2 tree using sequences from exon 3, with top Bayes factors.

Figure 4—figure supplement 2. MHC-DQA-Related BEAST2 tree using sequences from exon 3, with top Bayes factors.

Figure 4—figure supplement 3. MHC-DPA-Related BEAST2 tree using sequences from exon 3, with top Bayes factors.

Figure 4—figure supplement 4. MHC-DMA-Related BEAST2 tree using sequences from exon 3, with top Bayes factors.

Figure 4—figure supplement 5. MHC-DOA-Related BEAST2 tree using sequences from exon 3, with top Bayes factors.

Figure 4—figure supplement 6. MHC-DRB-Related BEAST2 tree using sequences from exon 3, with top Bayes factors.

Figure 4—figure supplement 7. MHC-DQB-Related BEAST2 tree using sequences from exon 3, with top Bayes factors.

Figure 4—figure supplement 8. MHC-DPB-Related BEAST2 tree using sequences from exon 3, with top Bayes factors.

Figure 4—figure supplement 9. MHC-DMB-Related BEAST2 tree using sequences from exon 3, with top Bayes factors.

Figure 4—figure supplement 10. MHC-DOB-Related BEAST2 tree using sequences from exon 3, with top Bayes factors.

Figure 4—figure supplement 11. Support for TSP between OWM clades for the Class II genes.

Figure 4—figure supplement 12. Support for TSP between NWM clades for the Class II genes.

Rapidly-evolving sites in the Class I genes.

A) Rapidly-evolving sites are primarily located in exons 2 and 3. Here, the exons are concatenated such that the cumulative position along the coding region is on the x-axis. The dashed orange lines denote exon boundaries. The three genes are aligned such that the same vertical position indicates an evolutionarily equivalent site. The y-axis shows the substitution rate at each site, expressed as a fold-change (the base-2 logarithm of each site’s evolutionary rate divided by the mean rate among mostly-gap sites in each alignment; see Methods). B) Rapidly-evolving sites are located in each protein’s peptide-binding pocket. Structures are Protein Data Bank (Berman et al., 2000) 4BCE (Teze et al., 2014) for HLA-B, 4NT6 (Choo et al., 2014) for HLA-C, and 7P4B (Walters et al., 2022) for HLA-E, with images created in PyMOL (Sch, 2021). Substitution rates for each amino acid are computed as the mean substitution rate of the three sites composing the codon. Orange indicates rapidly-evolving amino acids, while teal indicates conserved amino acids. C) Rapidly-evolving amino acids are significantly closer to the peptide than conserved amino acids. The y-axis shows the BEAST2 substitution rate and the x axis shows the minimum distance to the bound peptide, measured in PyMOL (Sch, 2021). Each point is an amino acid, and distances are averaged over several structures (see Table 5). The orange line is a linear regression of substitution rate on minimum distance, with slope and p-value annotated on each panel.

Figure 5—figure supplement 1. Rapidly-evolving sites along the coding region for all Class I genes.

Figure 5—figure supplement 2. Proportions of rapidly-evolving sites within each exon for Class I.

Figure 5—figure supplement 3. Rapidly-evolving sites mapped to Class I protein structures.

Figure 5—figure supplement 4. Evolutionary rate as a function of minimum distance to peptide for all Class I and Class II proteins.

Figure 5—figure supplement 5. Class I rapidly-evolving sites by binned distance to peptide.

Structures used to calculate distances to peptide.

This table lists the Protein Data Bank (Berman et al., 2000) structure codes and references for all structures used to calculate peptide distances.

Rapidly-evolving sites in the Class II genes.

A) Rapidly-evolving sites are primarily located in exon 2. Here, the exons are concatenated such that the cumulative position along the coding region is on the x-axis. The dashed orange lines denote exon boundaries. The α genes (top two plots) are aligned such that the same vertical position indicates an evolutionarily equivalent site; the same is true for the β genes (bottom two plots). The y-axis shows the substitution rate at each site, expressed as a fold-change (the base-2 logarithm of each site’s evolutionary rate divided by the mean rate among mostly-gap sites in each alignment; see Methods). B) Rapidly-evolving sites are located in each protein’s peptide-binding pocket. Structures are Protein Data Bank (Berman et al., 2000) 5JLZ (Gerstner et al., 2016) for HLA-DR and 2NNA (Henderson et al., 2007) for HLA-DQ, with images created in PyMOL (Sch, 2021). Substitution rates for each amino acid are computed as the mean substitution rate of the three sites composing the codon. Orange indicates rapidly-evolving amino acids, while teal indicates conserved amino acids. C) Rapidly-evolving amino acids are significantly closer to the peptide than conserved amino acids. The y-axis shows the BEAST2 substitution rate and the x axis shows the minimum distance to the bound peptide, measured in PyMOL (Sch, 2021). Each point is an amino acid, and distances are averaged over several structures (see Table 5). The orange line is a linear regression of substitution rate on minimum distance, with slope and p-value annotated on each panel.

Figure 6—figure supplement 1. Rapidly-evolving sites along the coding region for all Class IIA genes.

Figure 6—figure supplement 2. Proportions of rapidly-evolving sites within each exon for Class IIA.

Figure 6—figure supplement 3. Rapidly-evolving sites along the coding region for all Class IIB genes.

Figure 6—figure supplement 4. Proportions of rapidly-evolving sites within each exon for Class IIB.

Figure 6—figure supplement 5. Rapidly-evolving sites mapped to Class II protein structures.

Figure 6—figure supplement 6. Class II rapidly-evolving sites by binned distance to peptide.

Figure 6–Figure supplement 7. Number of associations per amino acid as a function of evolutionary rate.

Possible unrooted trees of 4 alleles.

There is one tree where the human alleles are monophyletic, and two trees where they are non-monophyletic.