1. Computational and Systems Biology
  2. Immunology and Inflammation
Download icon

Identifying the immune interactions underlying HLA class I disease associations

  1. Bisrat J Debebe
  2. Lies Boelen
  3. James C Lee
  4. IAVI Protocol C Investigators
  5. Chloe L Thio
  6. Jacquie Astemborski
  7. Gregory Kirk
  8. Salim I Khakoo
  9. Sharyne M Donfield
  10. James J Goedert
  11. Becca Asquith  Is a corresponding author
  1. Department of Infectious Disease, Imperial College London, United Kingdom
  2. Cambridge Institute for Therapeutic Immunology and Infectious Disease, University of Cambridge, United Kingdom
  3. Johns Hopkins University, United States
  4. Faculty of Medicine, University of Southampton, United Kingdom
  5. Rho, Chapel Hill, United States
  6. Division of Cancer Epidemiology and Genetics, National Cancer Institute, United States
Research Article
  • Cited 1
  • Views 1,355
  • Annotations
Cite this article as: eLife 2020;9:e54558 doi: 10.7554/eLife.54558

Abstract

Variation in the risk and severity of many autoimmune diseases, malignancies and infections is strongly associated with polymorphisms at the HLA class I loci. These genetic associations provide a powerful opportunity for understanding the etiology of human disease. HLA class I associations are often interpreted in the light of ‘protective’ or ‘detrimental’ CD8+ T cell responses which are restricted by the host HLA class I allotype. However, given the diverse receptors which are bound by HLA class I molecules, alternative interpretations are possible. As well as binding T cell receptors on CD8+ T cells, HLA class I molecules are important ligands for inhibitory and activating killer immunoglobulin-like receptors (KIRs) which are found on natural killer cells and some T cells; for the CD94:NKG2 family of receptors also expressed mainly by NK cells and for leukocyte immunoglobulin-like receptors (LILRs) on myeloid cells. The aim of this study is to develop an immunogenetic approach for identifying and quantifying the relative contribution of different receptor-ligand interactions to a given HLA class I disease association and then to use this approach to investigate the immune interactions underlying HLA class I disease associations in three viral infections: Human T cell Leukemia Virus type 1, Human Immunodeficiency Virus type 1 and Hepatitis C Virus as well as in the inflammatory condition Crohn’s disease.

eLife digest

When considering someone’s risk of disease, every person is different but some similarities can be found when looking across populations. Some people are more likely to develop a certain disease, while others are protected in some way. Part of this variation is explained by the individual’s genes, while their lifestyle and environment are other factors.

Numerous studies have looked for associations between different versions of genes, known as gene variants, and the occurrence of disease to identify who is at risk. There is one cluster of genes called the HLA genes that is a well-known hotspot for disease associations. The HLA cluster is named for the group of proteins it encodes, called the human leukocyte antigen (HLA) complex. These cell-surface proteins regulate the immune system in humans. These proteins are present on the surface of cells, and they help the immune system distinguish foreign invaders such as viruses and bacteria from the body’s own cells. Variants in the HLA genes are associated with more than 100 diseases, including infectious diseases like HIV, autoimmune conditions such as multiple sclerosis, and some cancers. However, while identifying which genetic variants are associated with an increased or decreased risk of disease is relatively simple, understanding why those genetic variants are associated with a particular disease is much harder.

Debebe et al. have developed a new method to find out why certain gene variants in the HLA cluster are associated with disease in humans. They used this method to investigate known genetic variants associated with three viral infections: HIV, hepatitis C, and human leukemia virus – and one inflammatory disease: Crohn’s disease. Critically, Debebe et al. looked at the interactions between different immune cells and the cell-surface proteins encoded by the HLA gene variants in different cases of these diseases. In doing so, the analysis was able to identify which cells of the immune system were responsible for the associations between gene variants and diseases.

In principle, this method could be applied to study any disease in any species. It could also be used in classic gene association studies to test for false positive results and “passenger” mutations, two common problems that beset sound interpretations from these studies.

Introduction

Genetic associations provide a powerful opportunity for understanding the etiology of human disease since, unlike most human observational studies, once linkage disequilibrium is corrected for, associated genes can be assumed to be causal rather than simply correlative.

The HLA region is a well-known hotspot for disease associations: it comprises just 0.3% of the genome yet contains 6.4% of the significant SNP associations in the EMBL-EBI genome wide association study (GWAS) catalog (MacArthur et al., 2017). Multiple candidate gene and GWAS studies have identified significant associations between polymorphisms in the classical HLA class I genes and the risk and/or severity of infectious disease, a range of autoimmune conditions and a number of forms of cancer (Matzaraki et al., 2017). A search of the literature yields over 2000 papers reporting HLA class I disease associations. Some of the most striking odds of disease are seen with autoimmune conditions such as the association between ankylosing spondylitis and possession of HLA-B*27 (odds ratio >100) or between Behçet’s Disease and HLA-B*51 (accounts for 32–50% of cases) (Matzaraki et al., 2017; de Menthon et al., 2009). Amongst infectious pathogens, HIV-1 has some of the best studied genetic associations: HLA-B*57 is strongly associated with reduced viral load and slow progression of disease in multiple cohorts whilst HLA-B*35Px is associated with high viral load and poor prognosis (Pereyra et al., 2010; Carrington and O'Brien, 2003; Martin et al., 2007; Carrington et al., 1999; Kiepiela et al., 2004).

However, interpretation of HLA class I disease associations is problematic since the classical HLA class I molecules (HLA-A, -B and –C), which bind cytosolic peptides (typically of length 8–11 amino acids) have multiple functions. HLA class I molecules are the ligands for several different receptors expressed by different immune cells including CD8+ T cells, NK cells and dendritic cells.

CD8+ T cells recognise HLA:peptide via their T cell receptor (TCR). TCR-HLA:peptide binding is exquisitely specific and depends both on the HLA allele and the sequence of the bound peptide. The affinity of an HLA class I molecule for a peptide is a significant determinant of the CD8+ T cell response elicited by that peptide (Chen et al., 2000; Müllbacher et al., 1999; Yewdell and Bennink, 1999; Deng et al., 1997; Boggiano et al., 2005); it has been shown that 85% of epitopes bind their HLA molecule with an affinity of 500 nM or stronger (Assarsson et al., 2007). However the relationship between HLA binding and immunogenicity is nontrivial, 50–66% of peptides that bind do not elicit a response (Lee et al., 2004; Hoof et al., 2010) and conversely cases where a peptide has undetectable binding but still elicits a response have also been described (Lee et al., 2004). A second crucial determinant of immunogenicity is binding of the HLA:peptide complex to the T cell receptor. It has been established that peptide positions P4-8 are most likely to be in close contact with the TCR (Rudolph et al., 2006; Garboczi et al., 1996; Calis et al., 2012). However, which of these peptide positions are critical for a T cell’s ability to bind have not been comprehensively mapped in humans; additionally non-contact positions can also impact on TCR specificity (Hausmann et al., 1999). Most studies only investigate one or two HLA molecules; the residues identified as important in these studies (for TCR recognition rather than HLA binding) include P4, 6 and 8 (Lee et al., 2004), P3, 5, 6 and 8 (Tynan et al., 2005), P3-5 (Wooldridge et al., 2010) and P3-6 and 8 (Hausmann et al., 1999). A comprehensive study of murine data by the Kesmir group convincingly identified P4-6 (Calis et al., 2013) but it is unclear how this translates to human HLA:peptide as sparsity of data prevents a similar analysis in humans.

NK cells bind HLA class I molecules via two distinct groups of receptors, killer immunoglobulin-like receptors (KIRs) and CD94:NKG2. KIRs are a family of inhibitory and activating receptors that are expressed mainly on the surface of NK cells and also some T cells. KIRs recognise broad groupings of HLA class I molecules sharing structural motifs. For instance KIR2DL1 binds HLA-C molecules with an asparagine at position 80 (designated the C2 group of alleles). Whereas KIR2DL3 binds HLA-C molecules with a lysine at position 80 (designated the C1 group) (Trowsdale, 2001; Moesta et al., 2008). Exceptions to these broad rules have been described (Sim et al., 2017). KIR binding also shows some dependence on the HLA-bound peptide, particularly positions 7 and 8, though this specificity is weak compared to that of the TCR (Fadda et al., 2010; Maenaka et al., 1999a; Maenaka et al., 1999b; Boyington et al., 2000).

The second way in which NK cells (and to a lesser extent, T cells) survey HLA class I expression is via the CD94:NKG2 family of receptors. Particularly interesting is the inhibitory CD94:NKG2A receptor which ligates the monomorphic non-classical HLA-E loaded with peptide from the leader sequence of HLA-A, -C and a subset of –B molecules (Braud et al., 1998; López-Botet et al., 2000) and which has been recently shown to play a key role in NK cell education (Horowitz et al., 2016).

Finally, HLA class I molecules are also the ligands for the leukocyte immunoglobulin-like receptors (LILR) of which LILRB1 and LILRB2 are the best characterised. Different HLA allotypes bind LILRB1 and LILRB2 with varying affinity, this is particularly true for LILRB2 which shows considerable variation across the HLA alleles (Jones et al., 2011). LILRB1 and LILRB2 are inhibitory receptors which are expressed mainly on myeloid cells including dendritic cells and macrophages; signalling via LILR affects the activation of these antigen presenting cells (Bashirova et al., 2014). To the best of our knowledge the impact of HLA-bound peptide on LILR signalling has not been investigated.

As a result of these diverse functions of HLA class I molecules the biological mechanisms underlying HLA associations with disease outcome are difficult to infer and contradictory interpretations of the same associations are common. Consider, for example, HLA-B*57-associated protection in the context of HIV-1 infection. It has been suggested that HLA-B*57 is protective because it preferentially presents CD8+ T cell epitopes from the highly conserved Gag p24 protein which is less susceptible to escape mutations (Borghans et al., 2007; Kiepiela et al., 2007; Miura et al., 2009) and that T cell responses to Gag are specifically associated with a reduced HIV-1 viral load (Kiepiela et al., 2007). Indeed, the association between B*57 and low viral load is widely cited as evidence that CD8+ T cells are important in controlling HIV-1 (Boppana and Goepfert, 2018; Walker and McMichael, 2012). However, it has also been argued (Flores-Villanueva et al., 2001) that HLA-B*57 is protective because of its role as a KIR ligand (binding both the inhibitory receptor KIR3DL1 and the activating receptor KIR3DS1); an argument which, though disputed (O'Brien et al., 2001), does appear to be supported by subsequent studies (Martin et al., 2007; Martin et al., 2002; Pelak et al., 2011). It has also been suggested that the unusually weak binding of HLA-B*57:01 for LILRB2 contributes to control of HIV-1 viral load due to its reduced inhibitory regulation of dendritic cells (Bashirova et al., 2014). Finally, recent papers have called for a re-evaluation of HLA class I clinical associations, including HLA-B*57, taking into account the fact that HLA-B*57, by virtue of having a methionine at position 21 provides peptides for HLA-E thus supplying CD94:NKG2A ligands (Horowitz et al., 2016; Yunis et al., 2007). In short, the HLA-B*57 protective effect may be attributable to CD8+ T cells, to inhibition of NK cells via KIR3DL1, to activation of NK cells via KIR3DS1, to inhibition of NK cells via CD94:NKG2A and/or to reduced inhibition of DCs via LILRB2. Observational and in vitro studies to investigate the mechanism underlying the B57-protective effect have yielded inconclusive results. The interpretation of observational studies are problematic since, for example, a preponderance of polyfunctional CD8+ T cells in HLA-B*57+ elite controllers of HIV-1 may be because polyfunctional CD8+ T cells are responsible for elite control. But it is hard to rule out the possibility that polyfunctionality is a consequence of low viral load. In vitro functional work is also difficult to interpret, with CD8+ T cells, natural killer cells and DCs all playing a role depending on the in vitro experimental conditions, with no obvious means to infer the relative importance of these different factors in vivo.

The aim of this study is to develop an immunogenetic approach for identifying and quantifying the relative contribution of different receptor-ligand interactions to a given HLA class I disease association. We applied this approach to investigate well-described associations between single HLA class I alleles and disease in 3 viral infections: Human T cell Leukemia Virus type 1 (HTLV-1), Human Immunodeficiency Virus type 1 (HIV-1) and Hepatitis C Virus (HCV) (Carrington and O'Brien, 2003; Kim et al., 2011; Jeffery et al., 1999). We then extended the scope of this work by using the method to investigate the association between the multi-gene ancestral MHC 8.1 haplotype and good prognosis in Crohn’s disease (Lee et al., 2017).

Results

Strategy

We focus on receptor-ligand pairs which are polymorphic and well-characterised. Specifically, we investigate TCR-HLA:peptide, inhibitory KIR-HLA:peptide, activating KIR-HLA:peptide, LILRB1-HLA and LILRB2-HLA. The strategy was first to develop a metric for quantifying the proximity or similarity of HLA class I alleles in terms of their TCR binding (i.e. a metric in ‘CD8+ T cell recognition space’), metrics for quantifying the proximity of HLA class I alleles in terms of their activating and inhibitory KIR binding (i.e. distance metrics in ‘NK cell recognition space’) and metrics for quantifying the proximity of HLA class I alleles in terms of their LILRB1 and LILRB2 binding (i.e. distance metrics in ‘DC recognition space’). Next, for the HLA class I allele with the disease association of interest (henceforth the ‘index’ allele), the similarity to all other HLA class I alleles in terms of TCR binding, inhibitory KIR binding (iKIR), activating KIR (aKIR), LILRB1 and LILRB2 binding was estimated. Multivariate regression was used to quantify the association between similar HLA class I alleles and clinical outcome. We hypothesised that, if an HLA class I disease association is attributable to CD8+ T cells then other HLA class I alleles with similar TCR-HLA:peptide binding to the index allele would have similar disease associations whereas HLA class I alleles with similar KIR-HLA:peptide and LILR-HLA binding would show no disease associations. Conversely, if the HLA class I allele disease association is attributable to NK cells then HLA class I alleles that are near in KIR-HLA:peptide binding space but not HLA class I alleles which are near in TCR-HLA:peptide binding space or LILR-HLA binding space would be associated with disease. And similarly for LILR binding. Inclusion of combinations of distance metrics as predictors in the regression also allows us to quantify the relative contribution of different receptor-ligand interactions and whether or not they behave independently in the case where more than one interaction was identified as playing a role. Details of the distance metrics are provided in the Methods, a brief, more intuitive, summary is provided below. Figure 1 illustrates the approach.

Schematic of method.

An overview of the method using the example given in Appendix 1 Supplementary Methods ‘Worked example of the Fraction Shared’. More details can be found in the Materials and methods and Appendix 1 Supplementary Methods. (i) Consider the example of investigating the mechanism underlying the association between HLA-A*02:07 and reduced risk of disease (HAM/TSP) in the context of HTLV-1 infection. (ii) Take the HTLV-I cohort and for each individual calculate their nearness to A*02:07 in ‘CD8 T cell recognition space’ (i.e. calculate TCR.FS). (iii) The graph shows the similarity to A*02:07 (TCR.FS) for each HLA allele in the cohort ranked from most similar on the left to least similar on the right. Individual 1 is homozygous at both their HLA-B and HLA-C loci so they have 4 unique alleles (A*11:01, A*24:02, B*07:02, C*07:02). We calculate the similarity of A*02:07 to each of these HLA alleles, they are marked on the graph as red dots. The most similar allele to A*02:07 in terms of CD8 recognition carried by individual 1 is C*07:02 with a TCR.FS = 0.273. Individual 1 thus has a TCR.FS with A*02:07 of 0.273. (iv) This is repeated for all individuals in the cohort and all 5 metrics and the cohort table completed. The table relates only to nearness to A*02:07 for the HTLV-1 proteome, for another index allele (or another proteome) the process will need to be repeated to complete the table (v) multiple regression is then performed on the cohort. Each metric is included in turn as a predictor variable. To investigate the independence or relative importance of different predictors they should be included in the regression model together (vi) Results of the regression are then interpreted. In this case the index A*02:07 is protective; other HLA class I alleles with similar TCR-HLA:peptide binding to the index allele are also protective that is, TCR.FS is a significant (protective) predictor of outcome; whereas HLA class I alleles with similar iKIR-HLA:peptide and LILRB1-HLA binding show no protective associations. Only three dimensions are sketched; similarity in aKIR and LILRB2 binding-space are also computed (and near alleles were not protective). We conclude that A*02:07-associated protection is most likely to be attributable to its TCR binding properties.

A TCR similarity metric

To quantify the similarity between two alleles in terms of TCR-HLA:peptide binding we first identify the 8-, 9-, 10- and 11-mer peptides from the proteome of interest predicted to bind each of the two HLA alleles using the prediction software netMHCpan v4.0 (Nielsen et al., 2007; Jurtz et al., 2017). For each of the predicted binders we then identify the amino acids at positions expected to impact on TCR binding. This was taken to be positions 3–6 as they have been repeatedly identified as important in TCR recognition (Lee et al., 2004; Hausmann et al., 1999; Tynan et al., 2005; Wooldridge et al., 2010; Calis et al., 2013); we also included the anchor positions 2 and the terminal position since different amino acid contacts with the HLA molecule may alter the conformation and/or orientation of the presented peptide. The method is flexible and different amino acid positions (other than 3–6 and/or excluding anchor residues) can easily be considered. The fraction of these 6 amino acid-long motifs (2, 3–6, terminal) that were shared between the alleles was then used as a measure of the similarity between the two alleles. We denote this continuous variable ‘TCR.FS’ (TCR fraction shared); it takes values between one (100% shared motifs) and zero (no shared motifs). As a basic test of the TCR.FS metric we investigated the hypothesis that the TCR.FS would be higher between alleles of the same supertype than between alleles of different supertypes. This hypothesis was strongly supported (p=3×10−7 Wilcoxon two tailed test, Appendix 2 Supplementary Results, Appendix 4—figure 1).

iKIR and aKIR similarity metrics

To quantify the similarity between two alleles in terms of inhibitory KIR-HLA:peptide and activating KIR-HLA:peptide binding we first established whether the alleles contain the same iKIR or aKIR binding motif (based on positions 77 and 80) (Trowsdale, 2001; Moesta et al., 2008). The similarity between two alleles that did not share a KIR binding motif was set to zero. For alleles that did share a KIR binding motif the fraction of motifs shared (similarity) was calculated as for the TCR metric but the peptide positions considered to affect KIR binding were taken to be 7 and 8 (Fadda et al., 2010; Maenaka et al., 1999a; Maenaka et al., 1999b; Boyington et al., 2000); as for TCR.FS we also included peptide anchor positions 2 and 9. We denote these continuous variables ‘iKIR.FS’ and ‘aKIR.FS’. As for TCR.FS, these metrics vary between one and zero.

LILRB1 and LILRB2 similarity metric

To quantify the similarity between two alleles in terms of LILRB1-HLA and LILRB2-HLA binding we compared the strength of binding between LILRB1 (LILRB2 resp.) and each HLA allele (Bashirova et al., 2014). We denote these continuous variables ‘LILRB1.S’ and ‘LILRB2.S’ (LILRB1 and LILRB2 similarity). Again, they take values between zero and one. Note, due to lack of data, the measures of LILRB1 and LILRB2 similarity are peptide-independent.

Regressions

The similarity between the index allele and the alleles of each individual in the relevant cohort was calculated using the five metrics. To ensure that near alleles were considered independent of the index, the similarity metrics were set to zero if the individual carried the index allele. The metrics were then included, along with the index allele, as an independent variable in a regression analysis to identify determinants of clinical outcome.

HTLV-1 infection: interactions underlying protective and detrimental alleles

We studied a case-control cohort of 392 HTLV-1-infected individuals from Kagoshima in Southern Japan. 178 subjects were asymptomatic carriers of the virus, and 214 subjects were diagnosed with HTLV-1-associated myelopathy/tropical spastic paraparesis (HAM/TSP) according to World Health Organisation criteria. There are well-documented associations between HLA-A*02, HLA-C*08 and HLA-B*54 and outcome (Jeffery et al., 1999; Jeffery et al., 2000). HLA-A*02 and HLA-C*08 are protective: they are associated with a reduced risk of HAM/TSP whilst HLA-B*54 is detrimental: it is associated with an increased risk of HAM/TSP. There are also reported associations with proviral load, but these associations suffer from poor robustness (next section) so we do not investigate them further. We first identified which HLA alleles at the 4 digit level were driving these disease associations. A*02:06 and A*02:07 are associated with a reduced risk of disease, C*08:01 was weakly associated with a reduced risk of disease and B*54:01 was associated with an increased risk of disease (Appendix 3—table 1). To assess which immune interactions were responsible for these associations with HAM/TSP, for each of these 4 HLA alleles, we performed 5 regressions, one for each of the similarity metrics (TCR.FS, aKIR.FS, iKIR.FS, LILRB1.S and LILRB2.S), in each case the index allele (together with age and gender) were included as covariates. Results are given in Table 1. We found that for each of the index HLA class I alleles considered, TCR.FS was strongly associated with risk of disease and in the same direction as the index allele i.e. possession of alleles with similar TCR binding to A*02:06, A*02:07 and C*08:01 are associated with a large decrease in the risk of disease whilst possession of alleles near B*54:01 in TCR binding space is associated with a significant increase in the risk of disease. In every case inclusion of TCR.FS in the multivariate analysis strengthened the effect of the index allele (i.e. increased the magnitude of the coefficient) indicating that removal of near alleles from the baseline made the ‘background’ alleles more dissimilar to the index. None of the other metrics were significant for any of the index alleles considered. We conclude that for all 4 HLA class I alleles studied in HTLV-1 infection the protection/susceptibility associated with those alleles is best explained by their TCR binding properties and therefore is most likely to be attributable to CD8+ T cells.

Table 1
Interactions underlying HLA class I disease associations in HTLV-1 infection.

Four HLA class I alleles are associated with disease (HAM/TSP) in HTLV-1 infection (model 1, index only). For each HLA allele we sought to determine the underlying mechanism by performing 5 multivariate logistic regressions (model 2–6), one for each of the distance metrics. The coefficient (Coeff) and P value for the index allele and the nearby alleles (similarity metric) are recorded below. For each of the index HLA alleles considered TCR.FS was associated with disease and in the same direction as the index allele; that is when the index was protective alleles with similar TCR binding (high TCR.FS) were protective and when the index was detrimental TCR.FS was detrimental (see row ‘TCR.FS’ in model 2, shaded). Furthermore, inclusion of TCR.FS in the multivariate analysis actually strengthened the effect of the index allele in every case (compare the magnitude of the coefficient for index in model 1 and index in model 2) indicating that removal of near alleles from the baseline made the ‘background’ alleles more dissimilar to the index. None of the other metrics were significant for any of the index alleles considered. Coeff <0 indicates reduced risk of HAM/TSP (i.e. a protective effect, ‘P’), Coeff >0 indicates increased risk of HAM/TSP (i.e. a detrimental effect, ‘D’). The odds ratio = exp(Coeff). The additional covariates age and gender were included in the regressions. Significance codes: p<0.001 ***; p<0.01 **; p<0.05 *; p<0.1. ; P values are two tailed.



Index allele
ModelCovariateA*02:06A*02:07C*08:01B*54:01
1.
Index only
IndexCoeff−0.55
P
−1.27
P
−0.52
P
0.96
D
P val0.086
·
0.0079
**
0.190.0056
**
2.
Index + TCR.FS
IndexCoeff−0.67−1.32−0.701.15
P val0.042
*
0.0057
**
0.086
·
0.0014
**
TCR.FSCoeff−5.48
P
−2.40
P
−1.66
P
1.57
D
P val0.00014
***
0.017
*
0.075
·
0.02
*
3.
Index + iKIR.FS
IndexCoeff−0.55−1.26−0.72+0.82
P val0.08
·
0.009
**
0.083
·
0.02
*
iKIR.FSCoeff−0.43
P
−0.36
P
−1.23
P
−0.87
P
P val0.510.630.120.11
4.
Index + aKIR.FS
IndexCoeff−0.55−1.25−0.51+0.83
P val0.08
·
0.009
**
0.200.019
*
aKIR.FSCoeff−0.47
P
−0.41
P
0.18
D
−0.64
P
P val0.480.600.700.13
5.
Index + LILRB1.S
IndexCoeff−0.48−1.36−0.601.06
P val0.150.008
**
0.160.005
**
LILRB1.SCoeff0.65
D
−0.47
P
−0.49
P
0.92
D
P val0.510.610.610.47
6.
Index + LILRB2.S
IndexCoeff−0.49−1.13−0.610.96
P val0.150.0290.140.009
**
LILRB2.SCoeff0.45
D
0.84
D
−0.73
P
−0.05
P
P val0.630.510.500.96

HTLV-1 infection: what determines the risk of disease across all alleles?

Having investigated the ‘extreme case’ alleles that are most strongly associated with protection or susceptibility in HTLV-1 infection we next sought to analyse the larger group of ‘average’ alleles associated with intermediate risk of HAM/TSP. Here we define an ‘average allele’ as one that is not significantly associated with outcome (p>0.05), is represented in the cohort at a sufficient frequency (N > 15) and has sufficient near alleles to permit an analysis (N > 15 with 50% or more similarity).

It is possible that there is no meaning in the rank order of protection associated with different average alleles. That is, it is possible that, other than the extremes, most HLA class I alleles confer very similar levels of protection and the order of protection that we see is simply a function of the particular cohort studied (and that analysis of another cohort would yield a different rank order). However, a subsampling strategy revealed that this was not the case and that rank order of intermediate alleles was robust and significantly more informative than random when considering the risk of disease but not when considering proviral load (Appendix 2 Supplementary Results ‘Are average HLA class I associations robust’, Appendix 4—figure 2). Consequently, it is meaningful to analyse the impact of average alleles on risk of HAM/TSP but not on proviral load.

For each HLA class I allele in the HTLV-1 cohort that was sufficiently frequent (N ≥ 15) we calculated the risk of HAM/TSP associated with that allele. We then calculated the risk of HAM/TSP associated with alleles with similar TCR binding, similar inhibitory KIR binding, similar activating KIR binding, similar LILRB1 binding and similar LILRB2 binding. Alleles which were underpowered (<15 alleles with greater than 50% similarity) were discarded. The results were striking (Figure 2). Across all alleles there was a very strong positive correlation between the protection conferred by an allele and the protection conferred by other alleles with similar TCR binding (Rs = + 0.76 p=5×10−6, Spearman Correlation two tailed). 29 alleles were considered, in every case if the index was protective then alleles with similar TCR binding were also protective and if the index allele was detrimental then alleles with similar TCR binding were also detrimental (p=4×10−9, Binomial test). No such association was seen for any of the other measures of similarity (Figure 2). However, if we restricted the NK analysis to KIR binding alleles (ie. alleles with a C1, C2 or Bw4 motif) than a weak association was also seen for iKIR (Rs = 0.6, p=0.07, Spearman Correlation two tailed) but not for aKIR. The protection/susceptibility associated with an allele’s nearest neighbours in CD8+ T cell recognition space was a significant determinant of protection/susceptibility (p=0.0006) even when all other metrics were included in the model.

Correlation between the risk of HAM/TSP associated with an allele and the risk associated with similar alleles.

The risk of HAM/TSP associated with an HLA class I allele (“Coefficient of Index Allele, x axis) was compared with the coefficient of HLA class I alleles with (A) similar TCR binding, (B) similar iKIR binding, (C) similar activating KIR binding, (D) similar LILRB1 binding and (E) similar LILRB2 binding. All alleles in the cohort of a sufficient frequency (N > 15) and with sufficient near alleles (N > 15 with 50% or more similarity) were considered. The Spearman correlation coefficient (Rs) and corresponding P value are reported in the title bar for each plot. There is a very striking positive correlation for TCR binding, i.e. the risk of HAM/TSP associated with an allele is strongly correlated with the risk associated with other alleles that share similar TCR binding properties. No such correlation was observed for any of the other metrics.

We conclude that, in HTLV-1 infection, the peptide:TCR binding properties of an allele is a significant determinant of the risk of disease associated with that allele; this is true not only for the extreme case alleles which are associated with significant protection or susceptibility but also for the intermediate ‘average’ alleles without significant associations.

HIV-1 infection: what determines the susceptibility associated with HLA class I alleles?

We studied a well-characterised cohort of HIV-1 seroconverters from sub-Saharan Africa who were identified when seronegative and followed under Protocol C of IAVI (International AIDS Vaccine Initiative, 2020; Kamali et al., 2015). Two allele groups were significantly associated with clinical outcome in this cohort, consistent with findings in several other cohorts (Carrington et al., 1999; Gao et al., 2001). HLA-B*35Px was associated with an increased viral load set point but did not have a significant impact on progression. HLA-B*57 was associated with a low early viral load set point and slow progression to low CD4 count. We first focussed on B*35Px and identified which HLA alleles at the 4 digit level were driving the detrimental association. The detrimental effect was entirely due to B*53:01 which was significantly associated with an increased early viral load set point, all other B*35Px alleles were infrequent in this cohort; (Appendix 3—table 2).

HLA-B*53:01 is unusual in that its nearest allele out of all the 151 alleles in the cohort, in terms of both TCR and KIR recognition was distant (only 46% similarity). We were therefore unable to perform the similarity analysis for TCR and KIR binding as there were no similar alleles. In contrast, we were powered to study LILRB1 and LILRB2 as there were a large number of similar alleles by both these metrics. However, alleles similar to B*53:01 in terms of LILRB1 and LILRB2 binding behaved very differently in terms of their impact on clinical outcome: near alleles had no significant impact either on early viral load set point nor on time to low CD4+ cell count; moreover, if anything, near alleles tended to be slightly protective rather than detrimental (Table 2). We conclude that the detrimental effects of B*53:01 are independent of its LILRB binding properties but that we were not powered to study whether TCR or KIR binding effects were determinants of susceptibility.

Table 2
Interactions underlying HLA class I disease associations in HIV-1 infection.

Four HLA class I alleles have been associated with early viral load set point in HIV-1 infection (model 1, index only). For each HLA allele we sought to determine the underlying mechanism by performing five multivariate linear regressions (model 2–6), one for each of the distance metrics. The coefficient (Coeff) and P value for the index allele and the similarity metric are recorded below. A Coeff >0 indicates an increase in viral load i.e. a detrimental effect (D), a Coeff <0 indicates a protective effect (P). In all cases gender was included as an additional covariate in the model. A slash (/) indicates that there were an insufficient number of alleles to perform the analysis. For B53:01 there were not enough similar alleles to perform the TCR or KIR analysis. Alleles with similar LILRB1 and LILRB2 binding to B53:01 were not similarly detrimental. For the protective B57 alleles, we found a clear picture that alleles with similar aKIR binding to B*57:01, B*57:02 and B*57:03 were significantly protective (model 4, ‘aKIR.FS’ row, shaded). There was also a trend for alleles with similar TCR binding and similar LILRB2 binding to be protective (model 2 TCR.FS row and model 6 LILRB2.S row). Significance codes: p<0.001 ***; p<0.01 **; p<0.05 *; p<0.1. ; P values are two tailed.



Index allele
ModelCOV.B*53:01B*57:01B*57:02B*57:03
1.
Index only
indexCoeff+0.23
D
/−0.63
P
−0.46
P
P val0.02
*
/0.005
**
0.002
**
2.
Index+TCR.FS
IndexCoeff//−0.65−0.48
P val//0.003
**
0.001
**
TCR.FSCoeff/−0.24
P
−0.33
P
−0.24
P
P val/0.069
·
0.082
·
0.15
3.
Index+iKIR.FS
IndexCoeff//−0.64−0.47
P val//0.004
**
0.002
**
iKIR.FSCoeff/−0.12
P
−0.11
P
−0.08
P
P val/0.190.320.47
4.
Index+aKIR.FS
IndexCoeff//−0.62−0.45
P val//0.005
**
0.002
**
aKIR.FSCoeff/−0.41
P
−0.43
P
−0.39
P
P val/0.006
**
0.017
*
0.025
*
5.
Index+LILRB1.S
IndexCoeff+0.21/−0.67−0.49
P val0.045
*
/0.003
**
0.001
**
LILRB1.SCoeff−0.11
P
−0.25
P
−0.59
P
−0.5
P
P val0.650.270.220.27
6.
Index+LILRB2.S
IndexCoeff+0.22/−0.67−0.5
P val0.03
*
/0.003
**
0.0007
***
LILRB2.SCoeff−0.06
P
−0.40
P
−0.56
P
−0.7
P
P val0.840.08
·
0.08
·
0.08
·

HIV-1 infection: why are HLA-B*57 alleles protective?

Moving on to the protective B*57 allele group, we found that both B*57:02 and B*57:03 were significantly associated with a reduced early viral set point (Appendix 3—table 2). We also studied alleles similar to B*57:01 since, although B*57:01 is very infrequent in this African cohort (N = 1 carrier), it is well described to be protective in other cohorts and whilst there was no power to study B*57:01 directly in this cohort there was power to study near alleles by all five metrics. We found a clear picture that alleles with similar aKIR binding to B*57:01, B*57:02 and B*57:03 were significantly protective (Table 2). There was also a trend for alleles with similar TCR binding and similar LILRB2 binding to also be protective when these metrics were considered in isolation. But when they were considered in a multivariate analysis with aKIR.FS only aKIR.FS retained significance. We conclude that the main reason for the protective effect of HLA-B*57 in this cohort is attributable to its activating KIR binding properties.

B*57:01, B*57:02 and B*57:03 all contain the Bw4-80I KIR binding motif and are thought to bind KIR3DS1. So, the aKIR.FS for the B*57 alleles will be 0 for individuals who are KIR3DS1 or who do not have an allele with a Bw4-80I motif and between 0 and 1 (depending on the degree of similarity of their alleles to B*57) for individuals who have the compound genotype KIR3DS1:Bw4-80I (i.e. possession of KIR3DS1 together with an HLA allele containing the Bw4-80I binding motif). So our finding that HLA-B*57 is protective because of its aKIR binding properties is consistent with the report, in an independent cohort, that KIR3DS1:Bw4-80I is protective (Martin et al., 2007). Interestingly, we found that aKIR.FS was more protective (Coeff = −0.41 p=0.006 **) than KIR3DS1:Bw4-80I (Coeff = −0.22 p=0.04 *). And, in a model including both terms, aKIR.FS remained protective (Coeff = −0.52 p=0.07 .) whilst the compound genotype KIR3DS1:Bw4-80I loses significance and becomes, if anything, detrimental (Coeff = +0.09, p=0.6). This indicates that though aKIR.FS and KIR3DS1:Bw4-80I are related, they are capturing slightly different features and that aKIR.FS is a stronger determinant of protection. To investigate the difference between aKIR.FS and KIR3DS1:Bw4-80I we plotted the two variables against each other for each individual in the cohort (Figure 3A). This identified three distinct groups of people that all had identical KIR3DS1:Bw4-80I status (all being KIR3DS1+ HLA-Bw4-80I+) but with high (0.6–1), medium (0.4–0.6) or low (0–0.4) aKIR.FS. Despite having the same KIR3DS1:Bw4-80I status these groupings are associated with very different effects on viral load (Figure 3B, Appendix 3—table 4). In particular people with KIR3DS1 and the group I alleles (aKIR.FS >0.6) are strongly protected; an effect that is entirely dependent on the presence of KIR3DS1 (group I allele with KIR3DS1 Coeff = −0.4 p=0.007 **; group I allele without KIR3DS1 Coeff = −0.05 p=0.5). Whereas KIR3DS1 with the group II or group III alleles offers no protection. Pooling the group II and group III individuals (to increase the numbers) did not change the finding that group I but not group II+III is associated with a significantly reduced early viral load set point (Figure 3B). Downsampling showed that this difference could not be explained simply by group size (Figure 3C). This finding supports the utility of the fraction shared approach. Here we find that aKIR.FS confirms what is known about KIR3DS1:Bw4-80I but extends it beyond a simple binary descriptor adding extra information that is clearly biologically relevant.

aKIR. FS reveals subtleties within the KIR3DS1:Bw4-80I grouping.

When considering alleles with similar activating KIR binding properties to HLA-B*57 we found that the fraction shared (aKIR.FS) was more informative than the traditional KIR3DS1:Bw4-80I compound genotype grouping. (A) On plotting individuals’ KIR3DS1:Bw4-80I content against their aKIR fraction shared it can be seen that whilst KIR3DS1:Bw4-80I is a simple binary 1 or 0 (people either have the compound genotype or they do not), aKIR.FS has more subtlety and people with KIR3DS1 and Bw4-80I can be segregated into three distinct groups (labelled group I, group II and group III in panel A). These three groupings have different impacts on viral load (i.e. different coefficients in the multivariate regression; B, left hand plot) and only group I is actually significantly protective (Coeff = −0.4 p=0.007 **) although all three groupings have identical KIR3DS1:Bw4-80I status. Group I is larger (N = 47 individuals) than group II (N = 19) or group III (N = 20). To check that lack of significance in group II and group III was not simply due to cohort size we pooled group II and group III (B) right hand plot), group II and III were still not significantly protective. Furthermore, when we downsampled the people in group I so that there were 39 people (i.e. exactly the same size as group II+group III) and calculated the impact on log[viral load] there was never an instance in 5000 runs when the estimated protective effect (decrease in log viral load) was as small as that seen in group II+III. This is illustrated in (C). The grey histogram is the distribution of coefficients seen upon repeated downsampling of group I and the vertical red line is the coefficient associated with group II+III. It can be seen that in 5000 runs the protective effect in the group I individuals far exceeds that in group II+III individuals despite (artificially) matched cohort size. Note in panel A jitter (a random number between 0 and 0.2) has been added to the KIR3DS1:Bw4-80I variable on the x axis since this number only takes value 0 or 1, plotting without jitter simply overlays the datapoints and information about the number of points in the clusters is obscured.

Finally, since there was a trend for alleles with similar TCR binding and similar LILRB2 binding to B*57 to also be protective, we investigated whether there was any evidence for a protective effect of B*57 independent of KIR3DS1. To this end we excluded everyone with KIR3DS1 from the cohort. HLA-B*57:02 and B*57:03 both remained significantly protective amongst KIR3DS1 individuals (Coeff = −0.58 p=0.027 *, Coeff = −0.39 p=0.019 *) indicating that not all of the B*57:02 and B*57:03 protective effect is attributable to their role as KIR3DS1 ligands. Amongst KIR3DS1 individuals the best model to explain the residual protective effect of B*57:02 and B*57:03 was provided by their TCR binding, i.e. alleles with similar TCR binding were significantly protective (Appendix 3—table 5).

We conclude that HLA-B*57 alleles are protective for two reasons: firstly (and most importantly) because of their activating KIR binding properties; secondly, their TCR binding properties.

HIV-1 infection: what determines the impact on viral load set point across all alleles?

As for HTLV-1 infection, we next investigated the interactions responsible for the early viral load associations of average alleles. As before, we first investigated whether the viral load associations of average alleles was meaningful. We found that rank order of intermediate alleles was highly robust and significantly more informative than random (p<10−16 two sample Kolmogorov-Smirnov Test two-tailed) Appendix 2 Supplementary Results ‘Are average HLA class I associations robust’, Appendix 4—figure 2).

Unlike HTLV-1 infection, and more in line with expectation, the picture was mixed with no single interaction able to explain all HLA associations (Appendix 4—figure 3). The only significant correlation was between the protection offered by an allele and the protection offered by alleles with similar LILRB2 binding. However, the correlation was weak (Rs = 0.29 p=0.048, Spearman two-tailed) and there were plenty of alleles which did not conform to the pattern (e.g. A*2902 is detrimental but alleles with similar LILRB2 binding are protective).

We conclude that, in HIV-1 infection, different interactions are responsible for the protection conferred by different HLA alleles. For the well-described protective effect of the B*57 alleles the dominant effect was explained by binding to activating KIR with a weaker effect attributable to TCR binding (revealed once individuals with KIR3DS1 were removed from the cohort). It is worth noting that in our cohort, unlike in white US cohorts, KIR3DL1:Bw4 was not associated with protection.

HCV infection: what determines the protective effect of HLA-B*57?

We studied a case-control cohort of 782 HCV-seropositive individuals. 257 subjects had spontaneously cleared the virus, and 525 subjects were chronically infected. A number of published studies have repeatedly found that the HLA-B*57 alleles are associated with increased odds of spontaneous clearance and reduced viral load amongst those chronically infected (Kuniholm et al., 2013; Kuniholm et al., 2010; Thio et al., 2002). Other alleles (including A*11:01, A*23:01, C*01:02, C*04:01) have also been associated with outcome in some studies but are not consistently replicated (and in particular these effects have not been reproduced in the studies with the largest cohorts) so we focus solely on the B*57 alleles. In the cohort we studied, the B*57 protective effect appeared to be attributable to B*57:02 (with B*57:03 and B*57:05 showing similar effects but were not statistically significant due to low carrier numbers); the more frequent B*57:01 allele did not appear to be protective (Appendix 3—table 6).

Unlike in HIV-1 infection, where the B*57 protective effect is mainly due to aKIR interactions, in HCV infection there was no evidence that the aKIR-binding properties of the B*57 alleles contributed to their protection (model 4 in Table 3 all the P values for aKIR.FS are very high, all the Coefficients are close to zero) despite adequate power. Instead the protective effect was explained by TCR.FS (model 2) indicating that CD8+ T cells are most likely responsible for the protective effect of HLA-B*57 in HCV infection. The models used contained a number of nominal covariates (4), it has been shown that under some circumstances, inclusion of such covariates in logistic regression can reduce power (Pirinen et al., 2012). We therefore repeated the analysis of models 1–6 omitting the covariates. This strengthened the finding that the protective effect was attributable to TCR.FS in every case leading to an increase in the size of the coefficient and a decrease in the P value (Coeff = 1.09, p=0.03 *, Coeff = 1.00 p=0.04 *, Coeff = 1.25 p=0.02* for TCR.FS with B*57:02, B*57:03 and B*57:05 respectively). In contrast aKIR.FS never became significant.

Table 3
Interactions underlying HLA class I disease associations in HCV infection HLA-B*57 is associated with increased odds of spontaneous clearance of HCV.

In this cohort the protective effect is attributable to B*57:02 with B*57:03 and B*57:05 apparently following the same trend (though due to low numbers it is impossible to be certain). For each B*57 allele of interest we sought to determine the underlying mechanism by performing five multivariate linear regressions (model 2–6), one for each of the distance metrics. The coefficient (Coeff) and P value for the index allele and the similarity metric are recorded below. HBV seropositivity, mode of infection, SNP rs1297986 and subcohort were included as additional covariates in all models. A coefficient >0 indicates a protective allele (increased odds of spontaneous clearance, P). Unlike the B*57 protective effect in HIV-1 infection, in HCV infection there appeared to be no contribution from activating KIR i.e. HLA with similar aKIR binding to B*57 were never significant protective despite sufficient power (model 4). The protective effect here appears to be entirely attributable to CD8+ T cells (model 2, shaded). In all cases the protective effect of alleles with similar TCR binding is actually more significant than the protective effect of the B*57 alleles themselves (model 1, index only). Significance codes: p<0.001 ***; p<0.01 **; p<0.05 *; p<0.1. ; P values are two tailed.



Index allele
ModelCOV.B*57:02B*57:03B*57:05
1.
Index only
indexCoeff2.02
P
0.57
P
15.5
P
P val0.08
·
0.160.99
2.
Index+TCR.FS
IndexCoeff2.100.6715.9
P val0.07
·
0.10
·
0.99
TCR.FSCoeff1.04
P
0.89
P
1.19
P
P val0.05
*
0.09
·
0.04
*
3.
Index+iKIR.FS
IndexCoeff2.000.6415.62
P val0.08
·
0.120.99
iKIR.FSCoeff0.35
P
0.40
P
0.50
P
P val0.290.240.17
4.
Index+aKIR.FS
IndexCoeff2.060.5815.49
P val0.07
·
0.150.99
aKIR.FSCoeff0.66
P
0.29
P
0.54
P
P val0.150.560.30
5.
Index+LILRB1.S
IndexCoeff2.000.5815.48
P val0.09
·
0.160.99
LILRB1.SCoeff−0.15
D
0.15
P
−0.37
D
P val0.900.910.77
6.
Index+LILRB2.S
IndexCoeff2.020.5215.53
P val0.08
·
0.200.99
LILRB2.SCoeff−0.02
D
−0.81
D
−0.02
D
P val0.980.400.98

Looking across all alleles, there was again substantial evidence that the rank order of protection conferred by average alleles was robust (p<10−16 two-sample Kolmogorov-Smirnov Test, 93.8% of runs significant, Appendix 2 Supplementary Results ‘Are average HLA class I associations robust’, Appendix 4—figure 2). As for HIV-1 infection, the picture was mixed with no single interaction able to explain all HLA associations (Appendix 4—figure 4). The strongest positive correlation was seen for alleles with similar TCR binding (Rs = 0.21 p=0.09, Spearman two-tailed) but the correlation is weak and not significant indicating that although the protection conferred by some alleles is attributable to TCR binding there are many alleles where the protection is better explained by another factor.

Crohn’s disease: can we understand the protective effect of the ancestral AH8.1 haplotype?

The ancestral haplotype AH8.1 (also known as MHC 8.1) is a multigene haplotype consisting of HLA-A*01:01 -B*08:01- C*07:01 -DRB1*03:01 -DQA1*05:01 -DQB1*02:01. Genes from AH8.1 or the complete haplotype have been associated with risk or severity of disease in a number of inflammatory and autoimmune conditions including autoimmune hepatitis, myasthenia gravis, systemic lupus erythematosus, type 1 diabetes, a range of myositis phenotypes and Crohn’s disease (Lee et al., 2017; Price et al., 1999; Manabe et al., 1993; Miller et al., 2015; Gorodezky et al., 2006). We investigated the association between good prognosis in Crohn’s disease and the HLA class I alleles of the AH8.1 haplotype. In a cohort of 2650 Crohn’s disease cases, HLA-B*08:01 and HLA-C*07:01 (which are in tight linkage disequilibrium with each other and with the class II genes of the haplotype) but not HLA-A*01:01 were significantly associated with good prognosis (Appendix 3—table 7).

Strikingly, despite good power in every case, we did not find that alleles with similar TCR binding, similar iKIR binding, similar aKIR binding or similar LILRB1 and LILRB2 binding to either B*08:01 or C*07:01 were associated with good prognosis (Table 4). The results were very clear, in every case the P values were very high and in some cases the direction of association was if, anything, reversed (nearby alleles were detrimental). This suggests that the protection marked by B*08:01 and C*07:01 is either attributable to other genes in this extended haplotype or that B*08:01 and C*07:01 confer protection by a mechanism other than immune receptor binding (Elahi et al., 2011; Candore et al., 1995).

Table 4
Interactions underlying HLA class I disease associations in Crohn’s disease cases.

AH8.1 is associated with increased odds of good prognosis amongst Crohn’s disease cases. In our cohort two classical HLA class I alleles from this haplotype, B*08:01 and C*07:01 are associated with good prognosis (Appendix 3—table 7). For both B*08:01 and C*07:01 we sought to determine the underlying mechanism by performing 5 multivariate linear regressions (model 2–6), one for each of the distance metrics. The coefficient (Coeff) and P value for the index allele and the similarity metric are recorded below. Gender was included as an additional covariate in all models. A coefficient <0 indicates a protective effect (P, decreased odds of poor prognosis), a coefficient >0 indicates a detrimental effect (D, increased odds of poor prognosis). Despite good power, similar alleles by all 5 metrics were never significantly protective (indeed in some cases tended towards being detrimental e.g. similar alleles by TCR or LILRB1 binding). Inclusion of 3 non-MHC SNPs which were significant in a GWAS as covariates did not alter these conclusions. Significance codes: p<0.001 ***; p<0.01 **; p<0.05 *; p<0.1. ; P values are two tailed.



Index allele
ModelCOV.B*08:01C*07:01
1.
Index only
indexCoeff−0.50
P
−0.34
P
P val3.56 × 10−7
****
0.0003
***
2.
Index+TCR.FS
IndexCoeff−0.49−0.34
P val9.96 × 10−7
****
0.0008
***
TCR.FSCoeff0.65
D
0.041
D
P val0.200.84
3.
Index+iKIR.FS
IndexCoeff−0.50−0.34
P val3.75 × 10−7
****
0.0007
***
iKIR.FSCoeff−0.08
P
0.02
D
P val0.890.86
4.
Index+aKIR.FS
IndexCoeff−0.50−0.34
P val3.56 × 10−7
****
0.0003
***
aKIR.FSCoeff00.001
D
P val10.99
5.
Index+LILRB1.S
IndexCoeff−0.50−0.28
P val1.46 × 10−6
****
0.007
**
LILRB1.SCoeff−0.001
P
0.28
D
P val10.09
.
6.
Index+LILRB2.S
IndexCoeff−0.50−0.28
P val2.28 × 10−6
****
0.007
**
LILRB2.SCoeff0.007
P
0.28
P
P val0.980.11

Discussion

We have developed an approach, based on biologically-plausible similarity metrics, to help identify the immune interactions responsible for the protection or susceptibility associated with a given HLA class I allele. First we applied this approach to investigate HLA disease associations in 3 viral infections. We studied a total of 11 HLA alleles from 6 allele groups. In every case, with the exception of B*53:01 in HIV-1 infection (where we had insufficient power to study TCR, iKIR and aKIR) we were able to successfully identify the most likely cause of the protective or detrimental effect. In HTLV-1 infection, the pattern was remarkably skewed. All 4 HLA disease associations were best explained by TCR binding. This pattern actually extended to all HLA alleles, with a very strong correlation between the protection conferred by an allele and the protection conferred by alleles with similar TCR binding. This is a wholly unexpected result that implies that, in HTLV-1 infection, the most important immune response in determining protection via all HLA alleles is overwhelmingly the CD8+ T cell response. In HIV-1 infection, the picture was more balanced. The protective effect of B*57:01, B*57:02 and B*57:03 was mainly due to aKIR with evidence for a weaker effect of TCR binding. Across all alleles no single interaction was responsible for the degree of HLA-mediated protection conferred. In HCV infection the B*57 alleles were also protective, but unexpectedly for a different reason to in HIV-1. In contrast to HIV-1 infection, there was no evidence that protection was attributable to aKIR instead TCR binding appeared to be the main determinant. We then applied the method to investigate the association between the classical HLA class I alleles of the AH8.1 haplotype and good prognosis amongst cases of Crohn’s disease. The resulting picture was very clear: none of the immune interactions investigated explained the protective effect. We conclude that either HLA-B*08:01 and C*07:01 mark a protective allele but are themselves not protective or that they protect via a mechanism independent of their receptor binding. A number of studies have reported that the AH8.1 haplotype is associated with impaired immune activation, perhaps due to a defect in the TCR signal transduction pathway (Candore et al., 1995; Lio et al., 1995; Egea et al., 1991) in agreement with our conclusion that the protection associated with HLA-B*08:01 and C*07:01 is not attributable to their receptor binding.

It is interesting to note that in HIV-1 infection all the B*57 alleles with high carrier frequency (B*57:01, B*57:02, B*57:03) are associated with protection both in the cohort we study and in the work of others (Carrington and O'Brien, 2003; Kiepiela et al., 2004; Costello et al., 1999; Fellay et al., 2007). However, in HCV infection, B*57:02 and B*57:03 are protective, but the evidence for B*57:01-mediated protection is less clear. B*57:01 is not significantly protective in the cohort we study despite a large number of carriers and this has also been reported by others, for example, Kuniholm et al. found both B*57:01 and B*57:03 were protective in a univariate analysis but in a multivariate analysis B*57:01 lost significance (presumably due to linkage with other HLA alleles) whilst B*57:03 retained significance (Kuniholm et al., 2013). Our metrics provide an explanation for this divergent behaviour. The fraction shared metrics all depend upon the proteome of interest. For the HIV-1 proteome, all the B*57 alleles are very similar (e.g. when the index is B*57:02 the TCR.FS with B*57:01 is 0.74 and only 2 alleles –B*58:02 and B*57:03- are more similar). In contrast for HCV, whilst B*57:02 and B*57:03 are similar, B*57:01 is more distant (e.g. when the index is B*57:02 the TCR.FS with B*57:01 drops to 0.58 and 10 alleles are more similar to B*57:02 than B*57:01). This provides a plausible explanation for why B*57:01 confers a similar degree of protection to B*57:02 and B*57:03 in the context of HIV-1 infection but appears to have less similarity in the context of HCV infection. Another interesting observation was that, for all three viral infections, the rank order of ‘average’ alleles was robust, that is, they could be robustly classified as protective or detrimental.

Although there are a large number of reported HLA class I associations, it is not known which immune interactions are responsible for any of these associations so there does not exist a ‘gold standard’ test data set with which we can formally validate our approach. However, a number of observations suggest that our approach is identifying biologically meaningful features. Firstly, the TCR.FS metric finds that alleles within the same supertype are more similar (have a higher TCR.FS) than alleles between supertypes (p=3×10−7). Secondly, for HTLV-1 infection we find a very strong positive correlation between the risk of HAM/TSP associated with an allele and the risk of HAM/TSP associated with similar alleles (by the TCR.FS metric). Such a strong correlation is unlikely to be generated by random. Thirdly, in HIV-1 infection we found that similarity in terms of activating KIR binding (aKIR.FS) revealed clinically relevant subtleties lost within the traditional KIR3DS1:Bw4-80I protective group, essentially splitting this group into 3 groups with decreasing levels of protection; again it is difficult to see how such a result could be generated other than by a method that was reflecting biological reality.

We stress that this method should not be used in isolation as the sole method for determining the mechanism underlying an HLA association. Rather it provides a line of evidence to be used alongside other lines of evidence to triangulate to the most likely answer. In common with classical disease association approaches (which use HLA alleles as predictors), this method should be used with care. Many of the problems arising in classical HLA disease association studies, such as linkage disequilibrium, unmeasured confounding variables or population stratification are less of a problem with these metrics. This is because many alleles will be similar and contribute to the similarity metric and it is unlikely that all similar alleles will all be in linkage disequilibrium with the same polymorphism or all correlated with the same confounder. Nevertheless linkage disequilibrium and other correlations can and will distort the results and should be investigated. In short, the method should not be used blindly, it should be used with care by someone with knowledge both of the biological problem and the structure of the dataset. Another limitation of this approach is the relatively simple definition of the KIR binding groups which does not take into account the KIR allele or variations in KIR-HLA binding that break the C1/C2 and Bw4/Bw6 rules. Nevertheless, these simplistic groupings have proved very powerful in other studies (Martin et al., 2007; Martin et al., 2002; Pelak et al., 2011; Vince et al., 2014; Khakoo et al., 2004; Ahlenstiel et al., 2008; Nakimuli et al., 2015), indicating that, to a first approximation, they are informative.

We do not study all known receptor-HLA interactions; instead we focus on receptor-ligand pairs which are polymorphic and well-characterised. Specifically we investigate TCR-HLA:peptide, inhibitory KIR-HLA:peptide, activating KIR-HLA:peptide, LILRB1-HLA and LILRB2-HLA. Inclusion of other receptors-ligand interactions would follow the same approach but requires more data and better characterisation of the receptors. In particular we did not investigate the CD94:NKG2A-HLA interaction since, with current knowledge, we could only split alleles in a binary fashion into binders or non-binders which would provide no power in subsequent analysis. The strength of binding is known to be more subtle than this and to depend on the peptide interaction with both CD94 and NKG2A (Petrie et al., 2008) but there is no comprehensive data to allow us to characterise this binding. We do not study atypical effects of the HLA molecules (Elahi et al., 2011). Monomorphic receptor-ligand pairs are not studied as these cannot explain polymorphic HLA associations.

In addition to identifying immune interactions underlying HLA class I disease associations this approach could also be used alongside classical GWAS or candidate gene approaches as a tool for investigating identified HLA associations. Lack of association between ‘near’ alleles and outcome may indicated that the identified allele is a passenger marking the causal variant or a false positive.

Our approach differs from another important attempt to move from HLA associations to understanding function by Raychaudhuri et al. (2012). Raychaudrhuri et al extended the usual approach of identifying alleles associated with disease traits and instead fine mapped associations down to the level of amino acids. This high resolution approach identified 5 amino acids, all in the peptide binding grove of HLA molecules, that were associated with seropositive rheumatoid arthritis. However, the authors assumed that variation in the CD8+ T cell response elicited must explain the observed associations.

Studies of human disease are typically observational and correlative in nature. An exception to this is genetic association studies which provide a rare and powerful opportunity to uncover causal factors. Here we provide an approach for interrogating some of the most important genes for disease associations: the HLA class I genes. Ultimately, knowing the causal factors will lead to a better understanding of the molecular pathways involved in disease pathogenesis and help to identify potential therapeutic targets.

Materials and methods

Key resources table
Reagent type
(species) or
resource
DesignationSource or
reference
IdentifiersAdditional information
Software, algorithmFraction sharedThis paperRRID:SCR_018250https://github.com/bjohnnyd/fs-tool
Software, algorithmNetMHCpan v4.0https://services.healthtech.dtu.dk/service.php?NetMHCpan-4.0

KIR and HLA genotyping

Request a detailed protocol

The HTLV-1, HIV-1 and HCV cohorts were previously KIR and HLA genotyped (Martin et al., 2002; Jeffery et al., 1999; Thio et al., 2002; Khakoo et al., 2004; Seich Al Basatena et al., 2011; Prentice et al., 2014; Boelen et al., 2018). For the Crohn’s cohort, subjects were genotyped by imputation using the Immunochip (Illumina), GeneChip 500K array (Affymetrix) and the UK Axiom Biobank array (Affymetrix). HLA genotypes were imputed previously (Lee et al., 2017). For KIR imputation we first estimated haplotypes for all individuals using SHAPEIT with parameters states = 500, burn = 10, prune = 10, main = 50 and the HapMap b37 recombination map (Delaneau et al., 2013). Then, for 984 individuals who were genotyped with the Immunochip, KIR imputations was performed using 231 SNPs on chromosome 19. For 818 individuals typed using GeneChip 500K Array, KIR imputation was based on 141 SNPs on chromosome 19. For 410 individuals genotyped with the UK Axiom Biobank array, KIR imputations was based on 145 SNPs on chromosome 19. In all cases KIR imputation was performed using KIR*IMP (Vukcevic et al., 2015) with a probability threshold of 0.5. The KIR haplotype imputation accuracy has been reported as 92% at a 0.5 probability threshold with a call rate greater than 90%. However, this accuracy is for copy-number imputation, we used KIR*IMP for imputing presence or absence alone, suggesting we would achieve an accuracy greater than 92%.

Subjects

Ethics

This study was approved by the NHS Research Ethics Committee (13/WS/0064) and the Imperial College Research Ethics Committee (ICREC_11_1_2). Informed consent was obtained at the study sites from all individuals. The study was conducted according to the principles of the Declaration of Helsinki.

IAVI (HIV-1 seroconverters)

Request a detailed protocol

IAVI is a prospective cohort of HIV-1 seroconverters who were identified when seronegative and followed under Protocol C of IAVI (International AIDS Vaccine Initiative, 2020). All individuals were treatment naïve except for short-term prevention of mother-to-child transmission (these time points were excluded). Two outcome metrics were studied: early viral load set point and time to low CD4 count. An individual’s ‘Early viral load set point’ is the mean log10 of viral load measurements taken between 3–9 months post infection (84–252 dpi); ‘time to low CD4 count’, was defined as time from estimated date of infection to CD4 count <350 cells/mm3. 44 individuals were missing early viral load set point information but not time to low CD4; therefore for analysis where the outcome metric was early viral load these 44 individuals were removed. All individuals who were missing HLA class I or KIR genotype were excluded. The IAVI cohort consisted of N = 568 individuals with time to low CD4 count information and N = 524 individuals with early viral load information. Cohort information was obtained from the IAVI Protocol C investigators.

Kagoshima (HTLV-1 seropositives)

Request a detailed protocol

The HTLV-1 cohort (N = 392) consists of individuals of Japanese origin who resided in the Kagoshima Prefecture, Japan. The cohort consists of 214 HAM/TSP patients and 178 asymptomatic carriers. HAM/TSP was diagnosed according to World Health Organization criteria; asymptomatic controls were recruited from the same geographical region. Two outcome measures were considered: disease status (HAM/TSP or asymptomatic carrier) and log10(proviral load). Cohort information was obtained from the original investigators who established the cohort.

HCV (HCV seropositives)

Request a detailed protocol

The HCV cohort consists of four sub-cohorts of HCV-seropositive subjects: AIDS Link to Intravenous Experience (ALIVE, N = 226) (Vlahov et al., 1991), Multicenter Hemophilia Cohort Study (MHCS, N = 295) (Goedert et al., 1989), Hemophilia Growth and Development Study (HGDS, N = 100) (Hilgartner et al., 1993) and a UK cohort (N = 161) (Khakoo et al., 2004; Seich Al Basatena et al., 2011; Boelen et al., 2018). One outcome measure was considered: the odds of spontaneous viral clearance. Cohort information was obtained from the lead investigators representing each of the four cohorts following separate applications.

Crohn’s disease

Request a detailed protocol

The Crohn’s disease cohort consists of two, previously reported cohorts of Crohn’s disease cases (a total of 788 poor-prognosis cases and 1424 good-prognosis cases) (Lee et al., 2017; Wellcome Trust Case Control Consortium, 2007). One outcome measure was considered: the odds of poor prognosis. Cohort information was obtained from the original investigators, KIR gene content was imputed as described above.

The similarity metrics

Request a detailed protocol

We constructed 5 similarity metrics to quantify how similar two alleles are in terms of their TCR binding, iKIR binding, aKIR binding, LILRB1 binding and LILRB2 binding.

T cell receptor fraction shared (TCR.FS)

Request a detailed protocol

The T cell receptor fraction shared (TCR.FS) is defined as the fraction of peptides bound by the index allele whose motifs appear in the peptides bound by the query (non-index) allele. The motif coordinates for TCR.FS are flexible, in the results reported here it is determined by the anchor positions 2 and the C terminus and the TCR contact residues position 3–6. Peptide length is also a variable of the equation. In the results reported here a peptide length of 9 amino acids was used. In the GitHub script (see key resources table above) peptide lengths of 8–11 amino acids are allowed.

TCR.FS ranges from 0 to 1. TCR.FS of 0 means that none of the motifs in the index-bound peptides appear in the peptides bound by the non-index allele. If the TCR.FS is 1, then all the motifs in the index-bound peptides appear in the peptides bound by the non-index allele.

Specifically, TCR.FS for an index allele Ai and a query allele Aj is defined as

TCR.FS (AiAj)=| MiMj ||Mi|

Where Mi is the list of TCR recognition motifs of index allele Ai, Mj is the list of TCR recognition motifs of query allele Aj, MiMj is the motifs in the list Mi that are also in the list Mj and |Mk| denotes the number of motifs in the list Mk. Note the measure is asymmetric.

As a basic test of the TCR.FS metric we investigated the hypothesis that the TCR.FS would be higher between alleles of the same supertype than between alleles of different supertypes. This hypothesis was strongly supported (Supplementary Results, Appendix 4—figure 1).

Inhibitory KIR fraction shared (iKIR.FS) and activating KIR fraction shared (aKIR.FS)

Request a detailed protocol

The inhibitory KIR fraction shared (iKIR.FS) is based on the inhibitory KIRs: KIR2DL1, KIR2DL2, KIR2DL3, KIR3DL1 and KIR3DL2. The activating KIR fraction shared (aKIR.FS) is based on the activating KIRs: KIR2DS1, KIR2DS2, KIR2DS4 and KIR3DS1. If the index allele and query allele bind distinct inhibitory KIR (based on positions 77 and 80) (Trowsdale, 2001; Moesta et al., 2008) their iKIR.FS is set to zero; similarly if the index and query bind distinct activating KIRs, their aKIR.FS is set to 0. If the index and query allele are both not known to interact with any inhibitory KIRs, their iKIR.FS is assigned a value of 1 and the same being true for activating KIRs and aKIR.FS. Otherwise the iKIR.FS and aKIR.FS is set to a value to reflect the number of shared KIR recognition peptide motifs similar to the TCR.FS. KIR recognition motifs being defined by the KIR contact positions 7 and 8 and the peptide anchor positions 2 and C. Specifically, iKIR.FS for an index allele Ai and a query allele Aj is defined asA

iKIR.FS(Ai,Aj)={0AiandAjbinddifferentiKIR0AiandAjbindthesameiKIR,individualisiKIR1neitherAinorAjbindanyiKIRMiMj|Mi|AiandAjbindthesameiKIR,individualisiKIR+

Where Mi is the list of KIR recognition motifs of index allele Ai, Mj is the list of KIR recognition motifs of query allele Aj, MiMj is motifs in the list Mi that are in the list Mj and |Mk| denotes the number of motifs in the list Mk. aKIR.FS is defined similarly.

LILRB1 and LILRB2 similarity scores (LILRB1.S, LILRB2.S)

Request a detailed protocol

The LILRB1 and LILRB2 similarity scores are based on the LILR-HLA binding data reported by Bashirova et al. in their Supplementary Table S1 (Bashirova et al., 2014).

The LILRB1 similarity score (LILRB1.S) between an index allele Ai and a query allele Aj is defined as:

LILRB1.S(Ai,Aj)=1|B(LILRB1,Ai)B(LILRB1,Aj)|max{|B(LILRB1,Ak)B(LILRB1,Am)|}

where B(LILRB1,Ai) is the binding score between LILRB1 and the HLA allele Ai reported (Bashirova et al., 2014) and alleles Ak and Am are all HLA alleles whose LILRB1 binding score has been measured (Bashirova et al., 2014). The denominator (which is the same for all allele pairs Ai, Aj) normalises the score to the maximum difference in binding scores observed so that LILRB1.S is on the same scale as the other metrics; i.e. 1 corresponds to maximum similarity and zero corresponds to maximum disparity. The LILRB1 similarity scores are calculated on a locus by locus basis. If the LILRB1 binding score for a particular HLA allele has not been measured by Bashirova et al. (2014) then the similarity score was calculated by taking the mean of all similarity scores for alleles in the same 2-digit group. The LILRB2 similarity scores are defined in an analogous way.

Worked examples of the similarity score calculations are provided in Appendix 1 Supplementary Methods.

Prediction of peptide: HLA class I molecule binding

Request a detailed protocol

NetMHCpan version 4.0 was used to predict the peptides bound by a given HLA class I molecule. The stand-alone software package was used to perform binding affinity predictions (-BA) for peptides of length 8 to 11. A percentile rank of 2% was used as the threshold for bound peptides with peptides below this rank being considered to be bound (Nielsen et al., 2007; Jurtz et al., 2017).

Proteomes

Request a detailed protocol

Whole proteomes were obtained from Uniprot, 2018. For HTLV-1, the subjects were from Kagoshima and so the HTLV-1 subtype prevalent in Japan was used (Uniprot accession: J02029). For IAVI (subjects from East and South Africa), Zambian HIV-C subtype was used (Uniprot accession: AB254142). For HCV (subjects from US and UK) HCV subtype 1a was used (Uniprot Accession: M62321). For Crohn’s Disease 100 random human proteins were used. For small proteomes (e.g. the viral proteomes considered here which were all less than 5000 amino acids) then results differ depending on the choice of proteome (e.g. alleles which are similar in terms of binding to HCV are not necessarily similar in terms of binding to HIV-1) but once the proteome becomes large, then the exact choice of proteome does not impact the results.

Application of the metrics to a cohort

Request a detailed protocol

Each individual in each cohort was allocated 5 measures representing the distance of the nearest allele in their genotype to the index allele in TCR recognition space, iKIR recognition space etc (for all 5 similarity metrics). That is, an individual heterozygote at all 3 loci (HLA-A, HLA-B, HLA-C) would have 6 different TCR.FS values for a specific index allele; the maximum of these six TCR.FS values (i.e. the nearest allele in their genotype) would be allocated to the individual. This methodology was applied for all metrics.

Regression

Request a detailed protocol

Multivariate linear, logistic and Cox regression was used to quantify the impact of the index allele and near alleles by the different similarity metrics on the outcome of interest. Potentially confounding covariates were identified and included in the analysis (listed in Appendix 1 Supplementary Methods). The analysis was of the form:

OutcomeIndexallele+FSmetric+covariates

And was repeated for each of the 5 metrics. Combinations of metric were also considered to determine independence and relative sizes. In the scenario where the same index HLA allele is protective or detrimental via different mechanisms in different people, then this would be detectable as two independently protective (or detrimental) FS metrics (i.e. that did not lose significance when included in the regression together); as an example of this please see the results section”HIV-1 infection: why are HLA-B*57 alleles protective’ in which different mechanisms of protection are identified in different people. All reported P values are two tailed. Calculations were performed using R v3.4.1 (R Development Core Team, 2014).

Workflow app

Request a detailed protocol

An implementation of this method is available on Github at https://github.com/bjohnnyd/fs-tool (Debebe et al., 2020; copy archived at https://github.com/elifesciences-publications/fs-tool).

Appendix 1

Supplementary methods

Regression

Impact of index allele and near alleles on outcome was analysed by multivariate regression using R v3.4.1 (R Development Core Team, 2014). Continuous variables (log HIV viral load) were analysed by linear regression (using the function lm), dichotomous variables (HAM/TSP and spontaneous clearance of HCV) by logistic regression (using the function glm) and survival analysis by Cox regression (using the function coxph). All reported P values are two tailed.

Covariates

Covariates which were significant predictors of outcome were identified and included in the analysis. The following covariates were included:

InfectionOutcomeCovariates
 HIV-1Early viral load set pointgender
 HIV-1Time to CD4 count < 350 cells/mm3age at infection, clinic site*
 HTLV-1HAM/TSPgender, age
 HCVSpontaneous clearancehepatitis B virus seropositivity, mode of infection, SNP rs1297986, subcohort
  1. *HIV clinic site is essentially collinear with viral subtype but available for more subjects

An example

Consider the case when we wish to investigate the immune interactions underlying the protective effect of HLA-C*08:01 on the risk of HAM/TSP in the HTLV-1 infected cohort. In the first instance the following six logistic regressions would be conducted:

1. index onlyDisease status ~ C*08:01 + age +gender
2. Index + TCR.FSDisease status ~ C*08:01 + TCR.FS + age +gender
3. Index + iKIR.FSDisease status ~ C*08:01 + iKIR.FS + age +gender
4. Index + aKIR.FSDisease status ~ C*08:01 + aKIR.FS + age +gender
5. Index + LILRB1.SDisease status ~ C*08:01 + LILRB1.S + age +gender
6. Index + LILRB2.SDisease status ~ C*08:01 + LILRB2.S + age +gender

If the situation arose where more than one of the similarity metrics was significant then at the next stage both metrics would be included in a regression model (along with the index and covariates) to test if they were acting independently and if not which was the driving association.

Worked examples of the fraction shared calculations

Two worked examples are presented in Supplementary file 1.

Details of the similarity metric

In formulating the similarity metrics a number of decisions had to be made on their exact details. Here we describe some of those details and explain the reasoning behind them.

Duplicate peptides

It is possible that the same peptide will appear more than once in a proteome; either because different proteins are read from the same reading frame (e.g. p8 and p12 in HTLV-1) or simply because the length of the proteome and the finite number of peptides means that eventually peptides will be repeated. We decided to retain duplicate peptides rather than restricting the analysis to unique peptides. We reasoned that restriction to unique peptides would ignore the possibility that mutation in repeated (overlapping) peptides may carry a heavier fitness cost, that repeated peptides may be more exposed to presentation (e.g. accessibility to proteases may vary for different proteins) and that repeated targeting of the same peptide is part of an HLA molecule’s recognition profile.

Nearest single allele

When considering the proximity of an individual’s HLA to a given index HLA we could consider either the single nearest HLA they possess (i.e. take the maximum fraction shared across all alleles they code for) or try to account for all HLA alleles they possess (e.g. take the average). We decided to consider the single nearest allele for comparability with a standard presence/absence analysis (the identity of the other alleles is irrelevant provided the index is present).

KIR alleles

KIRs are polymorphic and these polymorphisms can affect the strength of HLA binding. We have not considered differences in KIR alleles for several reasons: firstly many cohorts are only typed to the level of KIR gene presence/absence and so requirement for allelic information would exclude many datasets, secondly we do not attempt to capture strength of HLA binding (either to peptide or to KIR) since the relationship between strength of binding and strength of signalling is unclear and thirdly the strength of KIR allele: HLA allele binding has only been measured for a very limited number of pairs and typically without considering the impact of peptide so it would not be possible to accurately include KIR allele information.

Appendix 2

Supplementary results

Analysis of TCR.FS within and between supertypes

An HLA class I supertype is a group of HLA class I alleles which have been identified to have similar peptide binding properties (Sette and Sidney, 1999; Sidney et al., 2008). We hypothesised that the TCR.FS metric would be higher between alleles within the same supertype than between alleles belonging to different supertypes. To test this hypothesis, we chose 15 alleles to represent each of the 10 supertypes as defined by Sidney et al. (2008). that is a total of 150 alleles. For each pair of supertypes, we calculated the inter-supertype TCR.FS by taking the median of the TCR.FS between each representative allele of the first supertype and each representative allele of the second supertype (i.e. a median of a total of 15 × 15 = 225 measurements yielding one number which represents the distance between that pair of supertypes). Similarly, for the intra-supertype distance, we take the median of the TCR.FS between each pair of representative alleles of the supertype (i.e. the median of a total of 15 × 14 = 210 measurements). So, for each supertype there is 1 intrasupertype distance and 9 intersupertype distances (the distance to each of the 9 other supertypes). The results are plotted in Appendix 4—figure 1. It can be seen that for every supertype the intra-supertype TCR similarity (TCR.FS) is always larger than the inter-supertype similarity (p=3×10−7) confirming our hypothesis.

Are ‘average’ HLA class I associations robust?

We investigated whether there was any meaning in the rank order of protection associated with different ‘average’ alleles; that is alleles not significantly associated with protection or susceptibility. For each of the cohorts (of size N) of interest we randomly picked two subcohorts (of size N/2) and identified alleles that were present in both subcohorts at sufficient frequency (N ≥ 15). We discarded ‘extreme’ alleles which are significantly associated with outcome in the cohort (i.e. HTLV-1: A*02:06, A*02:07, C*08:01, B*54:01. HIV-1: B*53:01, B*57:02, B*57:03. HCV: B*57:01) thus leaving only ‘average’ alleles. Then, for each of the remaining alleles, in each subcohort we calculated the coefficient of that allele’s protective or detrimental effect using multivariate regression. The covariates included were the covariates already shown to be significant for that measure of outcome in that cohort (see Methods) plus the significant protective/detrimental alleles as listed above. We then asked whether the coefficients calculated in each subcohort were correlated using Pearson’s correlation coefficient (i.e whether the protection associated with a given average allele in one subcohort was correlated with the protection associated with the same allele in the other subcohort). This process was repeated 500 times and the correlation coefficients and P values recorded. The results are plotted in Appendix 4—figure 2, ‘observed’, green shading. For comparison we also repeated the process but this time looking for a correlation between mismatched alleles; these results are plotted as ‘random’, red shading in the same figure.

In all cases the observed correlation was highly significantly different from random (p<10−16, comparing the red and green shaded areas in Appendix 4—figure 2). However, although clearly distinct from random, the observed correlations were not always significant (histogram of green observed P values, right hand column Appendix 4—figure 2, contains a number of P values > 0.05). The percentage of runs with P values < 0.05 are as follows: HTLV-1 disease 62.6%, HTLV-1 proviral load asymptomatic carriers 22%, HTLV-1 proviral load HAM/TSP patients 30.6%, IAVI early viral load set point 98.6%, IAVI time to low CD4 count 99.0%, HCV odds of spontaneous clearance 93.8%.

So, with the exception of the two HTLV-1 proviral load outcomes, we found that the rank order of ‘average’ HLA class I molecules was robust, with the majority of runs yielding a statistically significant correlation. This was particularly true for the alleles in HIV-1 and HCV infection. Henceforth, we did not consider the impact of average alleles on HTLV-1 proviral load as the results were not robust (possibly because of low cohort size). For all other outcome metrics in all other cohorts we consider the measured impact of ‘average’ alleles on outcome to be meaningful.

In contrast, for the random data (i.e. mismatched alleles), the percentage of runs with P values < 0.05 are as follows: HTLV-1 disease 6.0%, HTLV-1 proviral load asymptomatic carriers 5.5%, HTLV-1 proviral load HAM/TSP patients 4.8%, IAVI early viral load set point 6.2%, IAVI time to low CD4 count 4.8%, HCV odds of spontaneous clearance 4.8%; that is close to 5% in each case as expected.

Appendix 3

Supplementary tables

Appendix 3—table 1
Which HLA subtype alleles are responsible for the allele group associations in HTLV-1 infection?

Risk of HAM/TSP was determined by logistic regression; we report the coefficient [odds ratio of HAM/TSP = exp(Coeff)], P value and number of individuals in the cohort with and without the HLA allele of interest. In all models, age and gender were included as additional covariates. Significance codes: p<0.001 ***; p<0.01 **; p<0.05 *; p<0.1. ; P values are two tailed.

Risk of HAM/TSPN
CoeffP valueHLA+HLA-
A*02−0.850.0014
**
150242
A*02:01−0.330.4049343
A*02:06−0.550.086
·
74318
A*02:07−1.270.0079
**
30362
C*08−0.700.064
·
56336
C*08:01−0.520.1950342
B*540.960.0056
**
82310
B*54:010.960.0056
**
82310
Appendix 3—table 2
Which HLA subtype alleles are responsible for the allele group associations in HIV-1 infection?

Predictors of Log10(early viral load set point) were determined by linear regression. Other significant covariates (gender) were included in the model. A negative coefficient (Coeff) indicates a protective effect (decrease in log viral load associated with possession of the allele); a positive coefficient indicates a detrimental effect. Predictors of the rate of progression to CD4 cell count <350 cells/mm3 was determined by Cox regression; hazard ratio = exp(Coeff). Other significant covariates (age at infection, and HIV-1 clinic site which is essentially collinear with viral subtype but available for more subjects) were included in the model. A hazard ratio (HR) less than 1 indicates a protective effect (reduced risk of progression to low CD4 count associated with possession of the allele); a HR greater than 1 indicates a detrimental effect. N is the number of individuals with early viral load information (numbers with time to low CD4 cell count are slightly higher, total cohort size=568). Significance codes: p<0.001 ***; p<0.01 **; p<0.05 *; p<0.1. ; P values are two tailed.

Early viral load set pointTime to low CD4 cell countN
CoeffP valHRP valHLA+HLA-
B*57−0.510.00005
***
0.480.01
*
50474
B*57:01Insufficient numbers for analysis1523
B*57:02−0.630.005
**
0.400.0715509
B*57:03−0.460.002
**
0.500.04
*
36488
B*35Px+0.230.02
*
1.01.098426
B*35:02Insufficient numbers for analysis3521
B*53:01+0.230.02
*
+0.930.795429
Appendix 3—table 3
Interactions underlying HLA class I associations with disease progression in HIV-1 infection.

We investigated the interactions underlying HLA class I alleles which were significantly associated with disease progression in HIV-1 infection. HLA-B*53:01 was not included in this analysis as, in this cohort, it has no impact on progression to low CD4+ cell count (HR = 0.9, p=0.7).

Index allele
ModelCOV.B*57:01B*57:02B*57:03
1.
Index only
indexHR/0.400.50
P val/0.07
·
0.04
*
2.
Index+TCR.FS
IndexHR/0.400.49
P val/0.070.04
*
TCR.FSHR0.790.790.92
P val0.330.50.8
3.
Index+iKIR.FS
IndexHR/0.400.50
P val/0.07
·
0.04
*
iKIR.FSHR0.971.01
P val0.870.97
4.
Index+aKIR.FS
IndexHR/0.410.51
P val/0.08
·
0.05
*
aKIR.FSHR0.600.610.61
P val0.098
·
0.170.16
5.
Index+LILRB1.S
IndexHR/0.350.39
P val/0.04
*
0.006
**
LILRB1.SHR0.830.190.004
P val0.670.06
·
0.0001
***
6.
Index+LILRB2.S
IndexHR/0.400.49
P val/0.07
·
0.04
*
LILRB2.SHR0.950.920.79
P val0.900.900.76
Appendix 3—table 4
Impact on log10 viral load associated with different aKIR.FS groupings.

Investigation of individuals with similar aKIR binding to HLA*B57:01 revealed three distinct grouping (Figure 3A) which we named I, II and III. In a multivariate linear model to predict log10(viral load) these three different groupings (treated as three levels of a factor) had different coefficients, with only group I being significantly protective (top 3 rows of table below). Pooling groups II and III to increase the number of individuals in this grouping did not change the conclusion that only group I had a significant protective effect (bottom two rows of table below). Comparisons are with respect to the baseline (KIR3DS1:Bw4-80I- individuals, N = 438). In all cases gender was also included as a covariate in the model.

CoefficientP valueN in group
Group I−0.380.007 **47
Group II−0.160.4519
Group III+0.050.7820
Group I−0.380.007 **47
Group II+III−0.040.7639
Appendix 3—table 5
Interactions underlying HLA class I associations with early viral load set point in HIV-1 infection in a KIR3DS1- cohort.

We investigated the interactions underlying HLA class I alleles which were significantly associated with early viral load set point in the absence of KIR3DS1 in HIV-1 infection. For all three alleles the winning model was one in which both TSC.FS and iKIR.FS were covariates (model 7) but in this case only TCR.FS (i.e. alleles with similar TCR binding, shaded) were significantly protective, alleles with similar iKIR binding actually tended to be detrimental (though not significantly) so they cannot explain the protective effect of the B*57 alleles.

Index allele
ModelCOV.B*57:01B*57:02B*57:03
1.
Index only
indexCoeff/−0.58−0.39
P val/0.027
*
0.019
*
2.
Index+TCR.FS
IndexCoeff/−0.59−0.43
P val/0.023
*
0.011
*
TCR.FSCoeff−0.22−0.32−0.29
P val0.130.120.11
3.
Index+iKIR.FS
IndexCoeff/−0.59−0.40
P val/0.025
*
0.017
*
iKIR.FSCoeff−0.09−0.088−0.05
P val0.370.470.65
4.
Index+aKIR.FS
IndexCoeff/−0.58−0.39
P val/0.027
*
0.019
*
aKIR.FSCoeff///
P val///
5.
Index+LILRB1.S
IndexCoeff/−0.68−0.42
P val/0.01
*
0.013
*
LILRB1.SCoeff−0.46−1.22−0.51
P val0.058
·
0.02
*
0.3
6.
Index+LILRB2.S
IndexCoeff/−0.66−0.46
P val/0.012
*
0.006
**
LILRB2.SCoeff−0.56−0.80−1.04
P val0.023
*
0.02
*
0.02
*
7. Index+TCR.FS+iKIR.FSIndexCoeff/−0.57−0.42
P val/0.027
*
0.01
*
TCR.FSCoeff−0.86−0.62−0.81
P val0.046
*
0.087
·
0.022
*
iKIR.FSCoeff0.480.210.38
P val0.120.320.086.
Appendix 3—table 6
Which HLA subtype alleles are responsible for the protective effects of HLA-B*57 in HCV infection?

Odds of spontaneous clearance of HCV was determined by logistic regression; we report the coefficient (Coeff), P value and number of individuals in the cohort with and without the HLA allele of interest. In all models, HBV seropositivity, mode of infection, SNP rs1297986 and subcohort were included as additional covariates. A coefficient >0 indicates a protective allele (increased odds of spontaneous clearance). The odds ratio of spontaneous clearance = exp(Coeff). Significance codes: p<0.001 ***; p<0.01 **; p<0.05 *; p<0.1. ; P values are two tailed.

Odds of spontaneous clearanceN
CoeffP valueHLA+HLA-
B*570.620.01
*
84698
B*57:010.410.2049733
B*57:022.020.08
·
5777
B*57:030.570.1629753
B*57:0515.530.991781
Appendix 3—table 7
Which HLA class I alleles of the AH8.1 haplotype are associate with reduced odds of poor prognosis?

Odds of poor prognosis was determined by logistic regression; we report the Coefficient (Coeff), Odds ratio (OR), P value and number of individuals in the cohort with and without the HLA class I allele of interest. OR = exp(Coeff). An OR <1 (or equivalently a Coeff <0) indicates a protective allele (odds of poor prognosis reduced). To maximise power, only individuals missing information at loci of interest were removed hence there is some variation in cohort size depending on the analysis. 1 3 SNPs included: rs5929166, rs147856773 and rs75764599. See Lee et al., 2017 for details. In all cases gender was included as a covariate. Since inclusion of the 3 SNPs had little impact on the OR but reduced power (due to a loss of individuals) we do not include the 3 SNPs as covariates in subsequent analysis but we do check that results are robust to their inclusion.

Odds of poor prognosisN
HLACoeffORP valueHLA+HLA-Total
A*01:01−0.080.920.3385916432502
B*08:01−0.480.624.75 × 10−7
****
62018372457
C*07:01−0.290.759.8 × 10−4
***
78418662650
With inclusion of three non-MHC SNPs significant in GWAS as covariates
A*0101−0.050.950.5581315312344
B*08:01−0.430.651.57 × 10−5
****
58417142298
C*07:01−0.240.788.3 × 10−3
**
74017382478

Appendix 4

Supplementary figures

Appendix 4—figure 1
TCR. FS Intra-Supertype and Inter-Supertype.

(A) For each of the 10 supertypes (x axis) we calculated the similarity (median TCR.FS, y axis) to alleles within that supertype (red triangles) and to alleles within all other 9 supertypes (green circles). It can be seen that for every supertype the intra-supertype similarity is larger than all the inter-supertype similarities for that supertype. (B) Pooling the intra-supertype similarities and comparing them with pooled inter-supertype similarities we find that the intra-supertype similarities are significantly higher (p=3×10−7, Wilcoxon two tailed test), horizontal lines represent the medians of the two groups.

Appendix 4—figure 2
Is the rank order of ‘average’ HLA class I alleles robust?

For each cohort and each outcome measure (panels A-F) the cohort was split in half and the correlation between the protection/susceptibility for the average alleles was calculated using Pearson’s method; this process was repeated 500 times (resulting in 500 correlation coefficients and 500 P values). The distribution of the correlation coefficients (left hand plot in each panel) and the corresponding P values (right hand plot in each panel) was calculated and plotted (green shading). The distributions for randomly paired alleles were also calculated (red shading). In every case the observed distributions were significantly different from random (p<10−16). However, although distinct from random, the observed correlations were not always significant (histogram of green observed P values, right hand plots, contains a number of P values > 0.05). The percentage of runs with P values < 0.05 are as follows: HTLV-1 disease 62.6%, HTLV-1 proviral load asymptomatic carriers 22%, HTLV-1 proviral load HAM/TSP patients 30.6%, IAVI early viral load set point 98.6%, IAVI time to low CD4 count 99.0%, HCV odds of spontaneous clearance 93.8%.

Appendix 4—figure 3
Correlation between the HIV-1 viral load associated with an allele and the viral load associated with similar alleles.

The early viral load set point associated with an HLA class I allele ('Coefficient of Index Allele, x axis) was correlated with the coefficient of HLA class I alleles with similar TCR binding (A), similar iKIR binding (B), similar activating KIR binding (C), similar LILRB1 binding (D) and similar LILRB2 binding (E). All alleles in the cohort of a sufficient frequency (N > 15) and with sufficient near alleles (N > 15 with 50% or more similarity) were considered. The Spearman correlation coefficient (Rs) and corresponding P value are reported in the title bar for each plot. Unlike HTLV-1 infection, and more in line with expectation, the picture was mixed with no single interaction able to explain all HLA associations. The only significant correlation was between the protection offered by an alleles and the protection offered by alleles with similar LILRB2 binding (E). However, the correlation was weak and there were plenty of alleles which did not conform to the pattern.

Appendix 4—figure 4
Correlation between the odds of HCV clearance associated with an allele and the odds associated with similar alleles.

The odds of spontaneous clearance of HCV associated with an HLA class I allele (“Coefficient of Index Allele", x axis) was correlated with the coefficient of HLA class I alleles with similar TCR binding (A), similar iKIR binding (B), similar activating KIR binding (C), similar LILRB1 binding (D) and similar LILRB2 binding (E). All alleles in the cohort of a sufficient frequency (N > 15) and with sufficient near alleles (N > 15 with 50% or more similarity) were considered. The Spearman correlation coefficient (Rs) and corresponding P value are reported in the title bar for each plot. Unlike HTLV-1 infection, and more in line with expectation, the picture was mixed with no single interaction able to explain all HLA associations. The strongest positive correlation was seen for alleles with similar TCR binding but the correlation is weak and not significant indicating that although the protection conferred by some alleles is attributable to TCR binding there are many alleles where the protection is better explained by another interaction.

References

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
  11. 11
  12. 12
  13. 13
  14. 14
  15. 15
  16. 16
  17. 17
  18. 18
  19. 19
  20. 20
    MHC affinity, peptide liberation, T cell repertoire, and immunodominance all contribute to the paucity of MHC class I-restricted peptides recognized by antiviral CTL
    1. Y Deng
    2. JW Yewdell
    3. LC Eisenlohr
    4. JR Bennink
    (1997)
    Journal of Immunology 158:1507–1515.
  21. 21
  22. 22
  23. 23
  24. 24
  25. 25
  26. 26
  27. 27
  28. 28
  29. 29
  30. 30
    Peptide recognition by two HLA-A2/Tax11–19-Specific T Cell Clones in Relationship to Their MHC/Peptide/TCR Crystal Structures
    1. S Hausmann
    2. WE Biddison
    3. KJ Smith
    4. YH Ding
    5. DN Garboczi
    6. U Utz
    7. DC Wiley
    8. KW Wucherpfennig
    (1999)
    Journal of Immunology 162:5389–5397.
  31. 31
  32. 32
  33. 33
  34. 34
  35. 35
  36. 36
  37. 37
  38. 38
  39. 39
  40. 40
  41. 41
  42. 42
  43. 43
  44. 44
  45. 45
  46. 46
  47. 47
  48. 48
  49. 49
  50. 50
  51. 51
  52. 52
  53. 53
  54. 54
  55. 55
  56. 56
  57. 57
  58. 58
  59. 59
  60. 60
  61. 61
  62. 62
  63. 63
  64. 64
  65. 65
    The major genetic determinants of HIV-1 control affect HLA class I peptide presentation
    1. F Pereyra
    2. X Jia
    3. PJ McLaren
    4. A Telenti
    5. PI de Bakker
    6. BD Walker
    7. S Ripke
    8. CJ Brumme
    9. SL Pulit
    10. M Carrington
    11. CM Kadie
    12. JM Carlson
    13. D Heckerman
    14. RR Graham
    15. RM Plenge
    16. SG Deeks
    17. L Gianniny
    18. G Crawford
    19. J Sullivan
    20. E Gonzalez
    21. L Davies
    22. A Camargo
    23. JM Moore
    24. N Beattie
    25. S Gupta
    26. A Crenshaw
    27. NP Burtt
    28. C Guiducci
    29. N Gupta
    30. X Gao
    31. Y Qi
    32. Y Yuki
    33. A Piechocka-Trocha
    34. E Cutrell
    35. R Rosenberg
    36. KL Moss
    37. P Lemay
    38. J O'Leary
    39. T Schaefer
    40. P Verma
    41. I Toth
    42. B Block
    43. B Baker
    44. A Rothchild
    45. J Lian
    46. J Proudfoot
    47. DM Alvino
    48. S Vine
    49. MM Addo
    50. TM Allen
    51. M Altfeld
    52. MR Henn
    53. S Le Gall
    54. H Streeck
    55. DW Haas
    56. DR Kuritzkes
    57. GK Robbins
    58. RW Shafer
    59. RM Gulick
    60. CM Shikuma
    61. R Haubrich
    62. S Riddler
    63. PE Sax
    64. ES Daar
    65. HJ Ribaudo
    66. B Agan
    67. S Agarwal
    68. RL Ahern
    69. BL Allen
    70. S Altidor
    71. EL Altschuler
    72. S Ambardar
    73. K Anastos
    74. B Anderson
    75. V Anderson
    76. U Andrady
    77. D Antoniskis
    78. D Bangsberg
    79. D Barbaro
    80. W Barrie
    81. J Bartczak
    82. S Barton
    83. P Basden
    84. N Basgoz
    85. S Bazner
    86. NC Bellos
    87. AM Benson
    88. J Berger
    89. NF Bernard
    90. AM Bernard
    91. C Birch
    92. SJ Bodner
    93. RK Bolan
    94. ET Boudreaux
    95. M Bradley
    96. JF Braun
    97. JE Brndjar
    98. SJ Brown
    99. K Brown
    100. ST Brown
    101. J Burack
    102. LM Bush
    103. V Cafaro
    104. O Campbell
    105. J Campbell
    106. RH Carlson
    107. JK Carmichael
    108. KK Casey
    109. C Cavacuiti
    110. G Celestin
    111. ST Chambers
    112. N Chez
    113. LM Chirch
    114. PJ Cimoch
    115. D Cohen
    116. LE Cohn
    117. B Conway
    118. DA Cooper
    119. B Cornelson
    120. DT Cox
    121. MV Cristofano
    122. G Cuchural
    123. JL Czartoski
    124. JM Dahman
    125. JS Daly
    126. BT Davis
    127. K Davis
    128. SM Davod
    129. E DeJesus
    130. CA Dietz
    131. E Dunham
    132. ME Dunn
    133. TB Ellerin
    134. JJ Eron
    135. JJ Fangman
    136. CE Farel
    137. H Ferlazzo
    138. S Fidler
    139. A Fleenor-Ford
    140. R Frankel
    141. KA Freedberg
    142. NK French
    143. JD Fuchs
    144. JD Fuller
    145. J Gaberman
    146. JE Gallant
    147. RT Gandhi
    148. E Garcia
    149. D Garmon
    150. JC Gathe
    151. CR Gaultier
    152. W Gebre
    153. FD Gilman
    154. I Gilson
    155. PA Goepfert
    156. MS Gottlieb
    157. C Goulston
    158. RK Groger
    159. TD Gurley
    160. S Haber
    161. R Hardwicke
    162. WD Hardy
    163. PR Harrigan
    164. TN Hawkins
    165. S Heath
    166. FM Hecht
    167. WK Henry
    168. M Hladek
    169. RP Hoffman
    170. JM Horton
    171. RK Hsu
    172. GD Huhn
    173. P Hunt
    174. MJ Hupert
    175. ML Illeman
    176. H Jaeger
    177. RM Jellinger
    178. M John
    179. JA Johnson
    180. KL Johnson
    181. H Johnson
    182. K Johnson
    183. J Joly
    184. WC Jordan
    185. CA Kauffman
    186. H Khanlou
    187. RK Killian
    188. AY Kim
    189. DD Kim
    190. CA Kinder
    191. JT Kirchner
    192. L Kogelman
    193. EM Kojic
    194. PT Korthuis
    195. W Kurisu
    196. DS Kwon
    197. M LaMar
    198. H Lampiris
    199. M Lanzafame
    200. MM Lederman
    201. DM Lee
    202. JM Lee
    203. MJ Lee
    204. ET Lee
    205. J Lemoine
    206. JA Levy
    207. JM Llibre
    208. MA Liguori
    209. SJ Little
    210. AY Liu
    211. AJ Lopez
    212. MR Loutfy
    213. D Loy
    214. DY Mohammed
    215. A Man
    216. MK Mansour
    217. VC Marconi
    218. M Markowitz
    219. R Marques
    220. JN Martin
    221. HL Martin
    222. KH Mayer
    223. MJ McElrath
    224. TA McGhee
    225. BH McGovern
    226. K McGowan
    227. D McIntyre
    228. GX Mcleod
    229. P Menezes
    230. G Mesa
    231. CE Metroka
    232. D Meyer-Olson
    233. AO Miller
    234. K Montgomery
    235. KC Mounzer
    236. EH Nagami
    237. I Nagin
    238. RG Nahass
    239. MO Nelson
    240. C Nielsen
    241. DL Norene
    242. DH O'Connor
    243. BO Ojikutu
    244. J Okulicz
    245. OO Oladehin
    246. EC Oldfield
    247. SA Olender
    248. M Ostrowski
    249. WF Owen
    250. E Pae
    251. J Parsonnet
    252. AM Pavlatos
    253. AM Perlmutter
    254. MN Pierce
    255. JM Pincus
    256. L Pisani
    257. LJ Price
    258. L Proia
    259. RC Prokesch
    260. HC Pujet
    261. M Ramgopal
    262. A Rathod
    263. M Rausch
    264. J Ravishankar
    265. FS Rhame
    266. CS Richards
    267. DD Richman
    268. B Rodes
    269. M Rodriguez
    270. RC Rose
    271. ES Rosenberg
    272. D Rosenthal
    273. PE Ross
    274. DS Rubin
    275. E Rumbaugh
    276. L Saenz
    277. MR Salvaggio
    278. WC Sanchez
    279. VM Sanjana
    280. S Santiago
    281. W Schmidt
    282. H Schuitemaker
    283. PM Sestak
    284. P Shalit
    285. W Shay
    286. VN Shirvani
    287. VI Silebi
    288. JM Sizemore
    289. PR Skolnik
    290. M Sokol-Anderson
    291. JM Sosman
    292. P Stabile
    293. JT Stapleton
    294. S Starrett
    295. F Stein
    296. HJ Stellbrink
    297. FL Sterman
    298. VE Stone
    299. DR Stone
    300. G Tambussi
    301. RA Taplitz
    302. EM Tedaldi
    303. A Telenti
    304. W Theisen
    305. R Torres
    306. L Tosiello
    307. C Tremblay
    308. MA Tribble
    309. PD Trinh
    310. A Tsao
    311. P Ueda
    312. A Vaccaro
    313. E Valadas
    314. TJ Vanig
    315. I Vecino
    316. VM Vega
    317. W Veikley
    318. BH Wade
    319. C Walworth
    320. C Wanidworanun
    321. DJ Ward
    322. DA Warner
    323. RD Weber
    324. D Webster
    325. S Weis
    326. DA Wheeler
    327. DJ White
    328. E Wilkins
    329. A Winston
    330. CG Wlodaver
    331. A van't Wout
    332. DP Wright
    333. OO Yang
    334. DL Yurdin
    335. BW Zabukovic
    336. KC Zachary
    337. B Zeeman
    338. M Zhao
    339. International HIV Controllers Study
    (2010)
    Science 330:1551–1557.
    https://doi.org/10.1126/science.1195271
  66. 66
  67. 67
  68. 68
  69. 69
  70. 70
    R: A language and environment for statistical computing
    1. R Development Core Team
    (2014)
    R Foundation for Statistical Computing, Vienna, Austria.
  71. 71
  72. 72
  73. 73
  74. 74
  75. 75
  76. 76
  77. 77
  78. 78
  79. 79
  80. 80
  81. 81
  82. 82
  83. 83
  84. 84
  85. 85
  86. 86
  87. 87
  88. 88
    Natural killer cell receptor NKG2A/HLA-E interaction dependent differential thymopoiesis of hematopoietic progenitor cells influences the outcome of HIV infection
    1. EJ Yunis
    2. V Romero
    3. F Diaz-Giffero
    4. J Zuñiga
    5. P Koka
    (2007)
    Journal of Stem Cells 2:237–248.

Decision letter

  1. Miles P Davenport
    Reviewing Editor; University of New South Wales, Australia
  2. Aleksandra M Walczak
    Senior Editor; École Normale Supérieure, France
  3. Marc Lipsitch
    Reviewer; Harvard TH Chan School of Public Health, United States

In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.

Acceptance summary:

This manuscript makes a significant contribution to understanding the mechanisms of HLA associations with infectious and autoimmune disease. It uses a novel immunogenetic approach to study and compare a number of known molecular associations and suggests that alternative molecular mechanisms can contribute to immune control of viral infection.

Decision letter after peer review:

Thank you for submitting your article "Identifying the immune interactions underlying HLA class I disease associations" for consideration by eLife. Your article has been reviewed by three peer reviewers, and the evaluation has been overseen by a Reviewing Editor and Aleksandra Walczak as the Senior Editor. The following individuals involved in review of your submission have agreed to reveal their identity: Shingo Iwami; Rob J. de Boer; Ruy Ribeiro.

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

Summary:

This study proposes a new approach to analyze the effect of known HLA class I association with disease outcome and to attribute that association to one of several functional processes of the immune system. That is, it intends to answer the question: "When an association between HLA class I and disease outcome is found, is it due to function of CD8+ T cells, NK cells (inhibitory or activation function), or myeloid cells (through leukocyte immunoglobulin-like receptors)?" To answer this question, the authors define a similarity metric between binding of that HLA molecule /the different receptors, and all other HLA molecule in the subject's genome.

Overall the reviewers found this to be an area of interest for immunogenetics, and a novel approach to the question.

Essential revisions:

1) There is no "image" figure for intuitively describing their approach and conclusions (especially for, strategy and hypothesis). They introduced TCR-HLA, KIR-HLA, LILR-HLA interactions based on metrics. It is hard to capture a mental image of their overall approach. The manuscript would be improved by introductory and conclusion figures.

2) For HIV-1 and HCV infection, their conclusions are very mild (not clear) compared with HTLV-1 and Crohn's disease. How might the authors validate their conclusion on HIV-1 and HCV infection? This should be discussed.

3) In the main text, it is not clear what "average alleles" are. From the supplementary information, it seems to mean all other alleles with p-values>0.05 in other studies (subject to being sufficiently represented in these cohorts).

4) In the section on HCV, you mention several alleles that have been associated with outcome, but not consistently. Probably some of these are included in your analyses of "average alleles", and it could be interesting to single out the results for those.

5) One limitation mentioned is that KIR alleles or variations are not taken into account. Could this be important in cases where no dominant association is found? That is, do you have an idea if more details would lead to more or less power?

6) In the Materials and methods, it would be important to specify whether the data was obtained from public repositories (and give corresponding citation), from previous study's authors or from the authors of the current study.

7) You use the maximum fraction shared for your similarity and justify this over using, say, the mean. Would it be difficult to see whether there are any important changes when a different summary instead of maximum is used, for example, the mean or the number of alleles with a similarity larger than a given percentage? Basically the question is whether having just one allele with similarity 53% and all others 0% in one person represents a larger similarity than having 6 alleles at 35% similarity in another person.

8) Their reasoning rests on the idea that the observed associations between HLA class I and disease outcome should have the same underlying functional cause in all or most people. Is it possible that the functional cause could be different in different people? The cases where the results are clear may indicate that this is not common, but what about cases where there is no definite answer by this method? This should be discussed.

https://doi.org/10.7554/eLife.54558.sa1

Author response

Essential revisions:

1) There is no "image" figure for intuitively describing their approach and conclusions (especially for, strategy and hypothesis). They introduced TCR-HLA, KIR-HLA, LILR-HLA interactions based on metrics. It is hard to capture a mental image of their overall approach. The manuscript would be improved by introductory and conclusion figures.

This is a good idea. We have replaced the previous Figure 1 with a new Figure 1 that aims to summarise the method. If you don’t find this new figure helpful please let us know and we will try again; we are very open to any suggestions on how to present this introductory figure. You also suggested a conclusion figure. We are not sure what such a conclusion figure should include (beyond the results that are already depicted in the various manuscript figures). Again, we would be very open to any ideas.

2) For HIV-1 and HCV infection, their conclusions are very mild (not clear) compared with HTLV-1 and Crohn's disease. How might the authors validate their conclusion on HIV-1 and HCV infection? This should be discussed.

We disagree with the reviewers that the conclusions are mild/unclear and difficult to validate for HIV-1. We presume you are referring to the conclusion that, in HIV-1 infection, not all alleles protect by the same mechanism (Results section: HIV-1 infection: what determines the impact on viral load set point across all alleles?). The expectation that all alleles would protect by only one mechanism is a very, very strong one. It is HTLV-1 that is the unusual outlier here (to us it seems remarkable that CD8+ T cells are such a dominant mode of protection). The result we obtain for HIV-1 is much more in line with our expectation. We have edited the Discussion to emphasise this point “In HTLV-1 infection, the pattern was remarkably skewed. …. In HIV-1 infection, the picture was more balanced.”

Our method does produce some strong and novel results on HIV-1 infection that support the use of this method. Specifically, it first identifies a feature (nearness to B57 in activating KIR space, aKIR.FS) that is a stronger determinant of protection than the reported protective compound genotype KIR3DS1:Bw4-80I (in a model including both terms aKIR.FS remains protective whilst KIR3DS1:Bw4-80I loses significance suggesting that aKIR.FS is the driver and better reflecting the protective effect). Moreover, on investigating aKIR.FS further it was found to identify three distinct sets of individuals who all had identical KIR3DS1:Bw4-80I status but had different aKIR.FS (see Figure 3A). These three groupings, identified by our method, were associated with different viral loads (see Figure 3B). A bootstrap analysis suggested that the difference in viral load observed between the groupings was highly significant (Figure 3C). We conclude that our method has identified a stratification of the KIR3DS1:Bw4-80I compound genotype that is biologically relevant: an interesting observation in itself but also a validation of the method. This is now emphasised in the Discussion.

Regarding HCV infection, we do agree with you that this is harder to validate. We find that different (average) HLA alleles are associated with different levels of protection for a range of reasons and that the HLA-B57 molecules are protective largely because of their TCR binding. These are both reasonable conclusions but as we do not know the true underlying mechanism we do not know whether these are the correct conclusions so whilst reasonable it does not serve as a validation. We note that the method does make some interesting non-trivial observations e.g. (i) the main protective effect of HLA-B57 is attributable to different mechanism in HIV-1 and HCV infections (activating KIR and TCR respectively) and (ii) it provides an explanation for the divergent behaviour of B57:01 (compared to other B57 alleles) in HCV infection that is not observed in HIV-1 infection. Specifically, in HIV-1 infection it appears that all frequent B57 alleles (B*57:01, B*57:02, B*57:03) are associated with protection but in HCV infection, whilst B*57:02 and B*57:03 are associated with protection B*57:01 is not. Our method provides an explanation for this divergence. The nearness metrics are proteome dependent. For the HIV-1 proteome all the B57 alleles are very close to each other but for the HCV proteome B57:01 is distanced from B57:02 and B57:03 (at least 10 frequent alleles are closer to B57:02 than B57:01). These are plausible and interesting results but it is true that it is difficult to validate this approach for HCV. We argue that i) a priori our method is reasonable being a natural extension of classical HLA immunogenetics ii) it passes a basic sanity test that the TCR.FS is significantly higher for alleles of the same supertype than between alleles from different supertypes iii) it identifies biologically relevant features in the context of both HTLV-1 and HIV-1 that are very difficult to explain if the method was not performing well and iv) that the results for HCV are plausible; these four factors support the utility of this method in general. We have edited the manuscript to clarify these points.

3) In the main text, it is not clear what "average alleles" are. From the supplementary information, it seems to mean all other alleles with p-values>0.05 in other studies (subject to being sufficiently represented in these cohorts).

Apologies for omitting this, your interpretation is correct, we have now clarified this point. We have added a definition to the main text at the start of the section “HTLV-1 infection: what determines the risk of disease across all alleles”

4) In the section on HCV, you mention several alleles that have been associated with outcome, but not consistently. Probably some of these are included in your analyses of "average alleles", and it could be interesting to single out the results for those.

We are reluctant to focus on HLA alleles whose association with protection/susceptibility is not confirmed in the larger cohorts. There is a large amount of noise/confusion in the HLA immunogenetics literature due to false positives. We are not saying that these associations are false positives and there may well be valid reasons why these HLA alleles have significant effects in some cohorts but not others but ideally, we would prefer to focus on validated alleles. However, we respect your question and in Author response image 1 we have reproduced the figure that you asked for. If you think that this offers a useful insight and you believe there is a reason for focusing on these alleles then we could include this figure in the manuscript. There are 4 HLA alleles significantly associated with outcome in the context of HCV infection that have been published in the literature but are not replicated in the largest cohorts: A*11:01, A*23:01, C*01:01 and C*04:01. In Author response image 1 A11:01 is represented by a red circle, A23:01 by a green circle and C04:01 by a black circle. Note there is no one with C*01:01 in our cohort hence it is not included and for the NK metrics A11:01 had insufficient near alleles and so this point is omitted from these graphs. If we had to draw a cursory preliminary conclusion from this we would say that the protective effect of A11:01 (in our cohort) may be most likely due to LILRB2, the detrimental effect of A23:01 is most likely due to TCR binding and the detrimental effect of C04:01 could be due either to iKIR or TCR (inclusion of both variables in the regression would help identify the driver). However this would all need a more thorough examination before making any firm conclusions.

Author response image 1

5) One limitation mentioned is that KIR alleles or variations are not taken into account. Could this be important in cases where no dominant association is found? That is, do you have an idea if more details would lead to more or less power?

When you say “no dominant association” we assume you mean the same mechanism of protection/susceptibility for every single allele in the cohort. For each individual allele focused on (significantly protective or significant detrimental allele) a dominant association was identified. We reiterate that the expectation that all alleles have the same mechanism is a very strong expectation. This is equivalent to believing that the effect of all alleles is due only to CD8s or only to NKs; i.e. it is saying that the immune response is massively skewed and really only one arm is functioning. We grant that this is (quite shockingly) what we observed in HTLV-1 infection but it is really not our expectation in general. So we think the finding that different alleles confer their protection because of different methods is most probably reflective of biological reality rather than a limitation of the method and would not be altered by details of KIR alleles.

Nevertheless, the question “what would be the effect of taking into account KIR alleles” is a valid one. To include KIR alleles there are two basic requirements: firstly, features such as strength of binding of all the individual KIR alleles to all the individual HLA alleles, strength of signalling of the KIR molecule when ligated, impact of peptide on binding for each KIR allele would all need to be measured; secondly, most importantly we would need to know how to integrate these features to get a measure of the KIR allele-HLA allele effect. If we knew all of this, and it mattered biologically, then we believe that adjusting our KIR.FS scores to reflect KIR alleles would be likely to add power since all alleles would be measured on a single scale there would be no need to stratify and reduce power. However if this information is unknown/ poorly known (very much the situation now) or only has a weak secondary effect compared to the KIR features we have already included then inclusion of allele level information would have no beneficial effect and is likely to introduce spurious results or noise.

6) In the Materials and methods, it would be important to specify whether the data was obtained from public repositories (and give corresponding citation), from previous study's authors or from the authors of the current study.

Important point. We have now added this information to the Materials and methods (in each case cohort data was obtained from investigators from previous studies).

7) You use the maximum fraction shared for your similarity and justify this over using, say, the mean. Would it be difficult to see whether there are any important changes when a different summary instead of maximum is used, for example, the mean or the number of alleles with a similarity larger than a given percentage? Basically the question is whether having just one allele with similarity 53% and all others 0% in one person represents a larger similarity than having 6 alleles at 35% similarity in another person.

How exactly the 6 classical HLA I alleles combine to give that individual’s HLA-mediated protection/susceptibility is a very interesting question. A priori there are a number of different ways of combining the information from multiple alleles (as you suggest mean or number of alleles that are above a certain threshold) are two of a multitude of ways. The difficulty of analysing the best metric is of course that one metric may do better than another simply by chance. If our cohorts were much larger then one approach would be to split them and then see if the same metric performed best in each subcohort but with our cohort sizes this would not be feasible. However, the definition of “best” is not entirely clear. We choose a metric based on the logic underlying classical single HLA association studies. In the classical approach an individual is given a 1 if they are positive for that allele (i.e. are “extremely near” to the allele), or a 0 if they are negative for the allele (i.e. are “extremely far”). An individual scores a “1” regardless of how many copies of the allele they carry and regardless of whether they have other alleles that are protective or detrimental (that might potentially contribute to or cancel out the effect of the first allele) i.e. the single “closest” allele is all that matters. Our principal is to extend this approach with a continuous scale between 1 and 0 to take into account how “near” nearby alleles are and also to extend this approach into multiple “dimensions” (near in terms of TCR binding, near in terms of iKIR binding etc); and so the maximum is the natural extension (the single closest allele is all that matters). Going beyond this and investigating how an individual’s HLA genotype combines to confer protection is a major question for the field, but unfortunately requires a much larger cohort than the one we have access to and is beyond the scope of the current study.

8) Their reasoning rests on the idea that the observed associations between HLA class I and disease outcome should have the same underlying functional cause in all or most people. Is it possible that the functional cause could be different in different people? The cases where the results are clear may indicate that this is not common, but what about cases where there is no definite answer by this method? This should be discussed.

In a classical HLA association analysis, the scenario you describe (same HLA allele conferring protection via one mechanism in some individuals and another mechanism in other individuals) could never be detected. However, with our fraction shared approach then the scenario could be detected. Provided the cohort is large enough that there were a sufficient number of individuals with the allele where “mechanism 1” was operative and individuals with the allele where “mechanism 2” was operative then our fraction shared approach would detect this as two independent predictors of outcome that did not lose significance when included in the model together. This has now been included in the manuscript. As a concrete example consider our analysis of the protective effect of the B57 alleles in HIV infection. Here our method did indeed identify two separate mechanisms operating in different individuals. So our method neither assumes a single mechanism and (given sufficient cohort size) is able to identify when multiple mechanisms are at play. As an aside, you mention here (as in points 2 and 5 above) that there are cases where “there is no definite answer by this method”. This is not true, we investigated 11 HLA alleles significantly associated with outcome in three viral infection, in every case our method was able to identify the underlying mechanism. Again, possibly you are referring to the observation that (with the strange exception of HTLV-1) not every single allele in the cohort confers protection via only one arm of the immune response.

https://doi.org/10.7554/eLife.54558.sa2

Article and author information

Author details

  1. Bisrat J Debebe

    Department of Infectious Disease, Imperial College London, London, United Kingdom
    Contribution
    Conceptualization, Resources, Data curation, Software, Formal analysis, Validation, Investigation, Visualization, Methodology, Writing - original draft, Project administration, Writing - review and editing
    Competing interests
    No competing interests declared
  2. Lies Boelen

    Department of Infectious Disease, Imperial College London, London, United Kingdom
    Contribution
    Conceptualization, Formal analysis, Supervision, Investigation, Methodology, Writing - original draft, Writing - review and editing
    Competing interests
    No competing interests declared
  3. James C Lee

    Cambridge Institute for Therapeutic Immunology and Infectious Disease, University of Cambridge, Cambridge, United Kingdom
    Contribution
    Writing - review and editing, provided cohort data
    Competing interests
    No competing interests declared
  4. IAVI Protocol C Investigators

    1. Johns Hopkins University, Baltimore, United States
    2. Faculty of Medicine, University of Southampton, Southampton, United Kingdom
    Contribution
    Writing - review and editing, provided cohort data
    Competing interests
    No competing interests declared
    1. Eduard J Sanders, Centre for Geographic Medicine-Coast/KEMRI, Kilifi, Kenya
    2. Omu Anzala, KAVI—Institute of Clinical Research, Nairobi, Kenya
    3. Anatoli Kamali, IAVI, New York, United States
    4. Pontiano Kaleebu, Medical Research Council/Uganda Virus Research Institute/London School of Hygiene and Tropical Medicine, Uganda Research Unit on AIDS, Entebbe, Uganda
    5. Etienne Karita, Project San Francisco, Kigali, Rwanda
    6. William Kilembe, Zambia Emory Research Project, Lusaka, Zambia
    7. Mubiana Inambao, Zambia Emory Research Project, Lusaka, Zambia
    8. Shabir Lakhi, Zambia Emory Research Project, Lusaka, Zambia
    9. Susan Allen, Zambia Emory Research Project, Lusaka, Zambia
    10. Eric Hunter, Zambia Emory Research Project, Lusaka, Zambia
    11. Vinodh A Edward, The Aurum Institute, Johannesburg, South Africa
    12. Pat E Fast, IAVI, New York, New York, United States
    13. Matt A Price, IAVI, New York, New York, United States
    14. Jill Gilmour, IAVI Human Immunology Laboratory, Imperial College, London, United Kingdom
    15. Jianming Tang, Ryals Public Health Building, University of Alabama, Birmingham, United States
  5. Chloe L Thio

    Johns Hopkins University, Baltimore, United States
    Contribution
    provision of cohort data
    Competing interests
    No competing interests declared
  6. Jacquie Astemborski

    Johns Hopkins University, Baltimore, United States
    Contribution
    provision of cohort data
    Competing interests
    No competing interests declared
  7. Gregory Kirk

    Johns Hopkins University, Baltimore, United States
    Contribution
    provision of cohort data
    Competing interests
    No competing interests declared
  8. Salim I Khakoo

    Faculty of Medicine, University of Southampton, Southampton, United Kingdom
    Contribution
    provision of cohort data
    Competing interests
    No competing interests declared
  9. Sharyne M Donfield

    Rho, Chapel Hill, Durham, United States
    Contribution
    provision of cohort data
    Competing interests
    No competing interests declared
  10. James J Goedert

    Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, United States
    Contribution
    Writing - review and editing, provided cohort data
    Competing interests
    No competing interests declared
  11. Becca Asquith

    Department of Infectious Disease, Imperial College London, London, United Kingdom
    Contribution
    Conceptualization, Data curation, Formal analysis, Supervision, Funding acquisition, Validation, Investigation, Visualization, Methodology, Writing - original draft, Project administration, Writing - review and editing
    For correspondence
    b.asquith@imperial.ac.uk
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-5911-3160

Funding

Wellcome (103865Z/14/Z)

  • Becca Asquith

Medical Research Council (J007439)

  • Becca Asquith

Medical Research Council (G1001052)

  • Becca Asquith

European Union Seventh Framework Programme (317040)

  • Becca Asquith

Bloodwise (15012)

  • Becca Asquith

Wellcome (105920/Z/14/Z)

  • James C Lee

National Institutes of Health (DA13324)

  • Chloe L Thio

National Institutes of Health (R01-HD-41224)

  • Sharyne M Donfield

National Institutes of Health (U01-DA-036297)

  • Gregory Kirk

National Institutes of Health (R01-DA-12568)

  • Gregory Kirk

National Institutes of Health (K24-AI118591)

  • Gregory Kirk

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

BA is a Wellcome Trust (WT) Investigator (103865Z/14/Z) and is also funded by the Medical Research Council (MRC) (J007439 and G1001052), the European Union Seventh Framework Programme (317040, QuanTI) and Leukemia and Lymphoma Research (15012). JCL is supported by a WT Intermediate Clinical Fellowship (105920/Z/14/Z). CLT is supported by NIH R01 DA13324. The Hemophilia Growth and Development Study is supported by the National Institute of Child Health and Human Development (R01-HD-41224). Data presented in this manuscript were collected by the ALIVE Study funded by the National Institute on Drug Abuse (U01-DA-036297, R01-DA-12568, K24-AI118591). This work was partially funded by IAVI with the generous support of USAID and other donors; a full list of IAVI donors is available at www.iavi.org. The content of this publication is the responsibility of the authors and does not necessarily reflect the views or policies of the Department of Health and Human Services, USAID or the US Government, nor does the mention of trade names, commercial products, or organizations imply endorsement by the US Government.

Ethics

Human subjects: This study was approved by the NHS Research Ethics Committee (13/WS/0064) and the Imperial College Research Ethics Committee (ICREC_11_1_2). Informed consent was obtained at the study sites from all individuals. The study was conducted according to the principles of the Declaration of Helsinki.

Senior Editor

  1. Aleksandra M Walczak, École Normale Supérieure, France

Reviewing Editor

  1. Miles P Davenport, University of New South Wales, Australia

Reviewer

  1. Marc Lipsitch, Harvard TH Chan School of Public Health, United States

Publication history

  1. Received: December 18, 2019
  2. Accepted: March 6, 2020
  3. Accepted Manuscript published: April 2, 2020 (version 1)
  4. Version of Record published: May 27, 2020 (version 2)

Copyright

This is an open-access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication.

Metrics

  • 1,355
    Page views
  • 248
    Downloads
  • 1
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)

Further reading

    1. Computational and Systems Biology
    Brittany Rosener et al.
    Research Article

    Metabolism of host-targeted drugs by the microbiome can substantially impact host treatment success. However, since many host-targeted drugs inadvertently hamper microbiome growth, repeated drug administration can lead to microbiome evolutionary adaptation. We tested if evolved bacterial resistance against host-targeted drugs alters their drug metabolism and impacts host treatment success. We used a model system of C. elegans, its bacterial diet, and two fluoropyrimidine chemotherapies. Genetic screens revealed that most of loss-of-function resistance mutations in Escherichia coli also reduced drug toxicity in the host. We found that resistance rapidly emerged in E. coli under natural selection and converged to a handful of resistance mechanisms. Surprisingly, we discovered that nutrient availability during bacterial evolution dictated the dietary effect on the host – only bacteria evolving in nutrient-poor media reduced host drug toxicity. Our work suggests that bacteria can rapidly adapt to host-targeted drugs and by doing so may also impact the host.

    1. Computational and Systems Biology
    2. Microbiology and Infectious Disease
    Thomas Stoeger, Luís A Nunes Amaral
    Feature Article

    It is known that research into human genes is heavily skewed towards genes that have been widely studied for decades, including many genes that were being studied before the productive phase of the Human Genome Project. This means that the genes most frequently investigated by the research community tend to be only marginally more important to human physiology and disease than a random selection of genes. Based on an analysis of 10,395 research publications about SARS-CoV-2 that mention at least one human gene, we report here that the COVID-19 literature up to mid-October 2020 follows a similar pattern. This means that a large number of host genes that have been implicated in SARS-CoV-2 infection by four genome-wide studies remain unstudied. While quantifying the consequences of this neglect is not possible, they could be significant.