Systematic screening of viral and human genetic variation identifies antiretroviral resistance and immune escape link
Abstract
Background:
Considering the remaining threat of drug-resistantmutations (DRMs) to antiretroviral treatment (ART) efficacy, we investigated how the selective pressure of human leukocyte antigen (HLA)-restricted cytotoxic T lymphocytes drives certain DRMs’ emergence and retention.
Methods:
We systematically screened DRM:HLA class I allele combinations in 3997 ART-naïve Swiss HIV Cohort Study (SHCS) patients. For each pair, a logistic regression model preliminarily tested for an association with the DRM as the outcome. The three HLA:DRM pairs remaining after multiple testing adjustment were analyzed in three ways: cross-sectional logistic regression models to determine any HLA/infection time interaction, survival analyses to examine if HLA type correlated with developing specific DRMs, and via NetMHCpan to find epitope binding evidence of immune escape.
Results:
Only one pair, RT-E138:HLA-B18, exhibited a significant interaction between infection duration and HLA. The survival analyses predicted two pairs with an increased hazard of developing DRMs: RT-E138:HLA-B18 and RT-V179:HLA-B35. RT-E138:HLA-B18 exhibited the greatest significance in both analyses (interaction term odds ratio [OR] 1.169 [95% confidence interval (CI) 1.075–1.273]; p-value<0.001; survival hazard ratio 12.211 [95% CI 3.523–42.318]; p-value<0.001). The same two pairs were also predicted by netMHCpan to have epitopic binding.
Conclusions:
We identified DRM:HLA pairs where HLA presence is associated with the presence or emergence of the DRM, indicating that the selective pressure for these mutations alternates direction depending on the presence of these HLA alleles.
Funding:
Funded by the Swiss National Science Foundation within the framework of the SHCS, and the University of Zurich, University Research Priority Program: Evolution in Action: From Genomes Ecosystems, in Switzerland.
Introduction
Antiretroviral resistance remains a major obstacle to the successful and lasting suppression of HIV (Gupta et al., 2012; Günthard et al., 2019). While in resource-rich settings the availability of novel drug classes and personalized HIV treatment have diminished the challenges associated with antiretroviral resistance, resource-limited settings have experienced a continuous increase in antiretroviral resistance, which is now threatening the unprecedented success of the global rollout of antiretroviral treatment (ART) (Fund, 2019; Hauser et al., 2019). In the context of this globalization of antiretroviral resistance, it is becoming increasingly important to understand how human and viral genetic variation are affecting the processes generating or limiting antiretroviral resistance (Aghokeng et al., 2011; Lataillade et al., 2010).
HIV drug-resistant mutations (DRMs) can either be selected in patients on ART experiencing treatment failure (acquired drug resistance, or aDRM) or be transmitted from a patient carrying the resistance mutation to an uninfected individual (transmitted drug resistance, tDRM). As some DRMs have been shown to carry a cost, feeding on the virus fitness and replication capacity, they can revert in the absence of ART. Once the selective pressure favoring those mutations is removed, their frequency within a host continuously decreases at the expense of the wild-type variant, and eventually they become undetectable by standard resistance tests. It has been shown that the time scales on which reversion occurs exhibit a large variation ranging from several months to over 10 years, depending on the fitness cost that in turn is governed by both the type of mutation and the genetic background in which it occurs (Kühnert et al., 2018; Yang et al., 2015). This canonical perspective based on the evolutionary forces of aDRM and tDRM, and their disappearance from the replicating quasi-species, generally disregards the possibility that antiretroviral- resistant mutations are selected in untreated individuals.
One process that may act against the paradigm of DRM emerging only in treated individuals and reverting in untreated individuals is accidental resistance evolution occurring as a collateral effect of viral immune escape. A well-understood instance of this process is evolutionary escape from binding to human leukocyte antigen (HLA), an extremely diverse gene complex encoding for major histocompatibility complex (MHC) proteins. MHC class I proteins (corresponding to HLA class I) are found on the surface of all nucleated cells, and by presenting antigens from the cell interior to the surface, they allow for binding to cytotoxic CD8 T cells (CTL); thus, MHC class I proteins tag the virally infected cell and can subsequently be eliminated by CTL (Markov and Pybus, 2015; Zinkernagel and Doherty, 1979). The high mutation rate associated with replicating HIV predisposes to cellular and humoral immune escape, where the viral epitopes are no longer recognized by the mounted immune effectors. For CTL-mediated immune responses, this process of developing escape mutations remains a critical part of HIV pathogenesis (Leslie et al., 2004). Conversely, the high variability of encoded MHC alleles and their combinations come into play, as the host HLA alleles change as a consequence of transmission (Markov and Pybus, 2015; Zinkernagel and Doherty, 1979; Borghans et al., 2004). If the viral epitope recognized by MHC-I maps to the viral genome at the same region, this could confer an increased viral fitness leading to mutation persistence or even the emergence of a new DRM in an ART-naïve host (Gatanaga et al., 2013). While this phenomenon has been reported for individual HIV mutation:HLA pairs, a systematic assessment of the impact of epitope escape across HIV DRM:HLA pairs in a representative population has not yet been reported.
In this study, we investigated and analyzed the viral and genetic data from ART-naïve patients in the Swiss HIV Cohort Study (SHCS). This is leveraging the unique combination of viral and human genetic data in the SHCS, with over 20,000 genotypic resistance tests and over 5000 patients with information on HLA-I alleles. This allowed us to systematically screen the cohort for associations between DRM:HLA-I pairs and hence for pairs where escape from HLA-I binding might confer the DRM an evolutionary advantage even in the absence of ART.
Materials and methods
Swiss HIV Cohort Study
Request a detailed protocolThe SHCS is a prospective multicenter study with continuing enrollment, aiming to include all people living with HIV in Switzerland since 1988. About half of all people living with HIV (PLWH) as notified to the Swiss health authorities are voluntarily participating in the SHCS, and include three-quarters of all PLWH receiving ART in the country (Schoeni-Affolter et al., 2010). As of August 2019, the SHCS has a cumulative total of 20,741 patients. Demographic information, mode of HIV transmission, treatment, clinical, and other data are updated every 6 months per standard protocol.
Drug resistance mutation data
Request a detailed protocolThe SHCS Drug Resistance Database contains the HIV sequence data, primarily partial pol gene sequences, used to determine the presence of DRMs in the viral genome (von Wyl et al., 2007). This data, currently covering 13,798 patients, was obtained from both routine clinical testing and systematic retroactive sequencing of stored plasma samples (Kletenkov et al., 2017; von Wyl et al., 2016). To reduce the scope of our systematic screening to only HIV mutations relevant to drug resistance (thus reducing the risk of overtesting), we only considered the presence of DRMs as defined by the Stanford Drug Resistance Database (Rhee et al., 2003). To avoid confounding by the effect of ART, we only considered sequences in ART-naïve individuals (before ART treatment).
HLA data
Request a detailed protocolData on the HLA class I type was available for 6453 SHCS patients. This information was obtained from SNP genotype data, using SNP2HLA with the type 1 Diabetes Genetics Consortium reference panel for HLA imputation techniques on the exome/SNP data (Jia et al., 2013; Szolek et al., 2014; Dilthey et al., 2016). We limited our analyses to that of the HLA class I (HLA-A, -B, and -C) considering the existing literature supporting the role of HLA class I peptides in HIV control (Leslie et al., 2010; Pereyra et al., 2010). Of these patients with HLA data, 3997 additionally had drug resistance testing data.
Screening candidate pairs of DRM:HLA-type
Request a detailed protocolOur study aimed to retrieve all DRMs identified in the SHCS as well as the HLA-I types found, to analyze whether or not a specific HLA-I type significantly alters the probability of finding a DRM. As there were a possible 5561 combinations represented in our dataset, it was necessary to reduce these candidate pairs to only those for which our data provided sufficient statistical power to detect an association (Figure 1). To do this, we filtered out only the combinations where the number of SHCS patients with the given mutation or HLA type were sufficient to provide a statistical power of 0.8, assuming an odds ratio (OR) of 3. This resulted in 225 pairs, from which 225 logistic regression models were made. For each model, the duration of HIV infection time and the presence/absence of the queried HLA-I type were used as predictors of the outcome – the presence of the resistance mutation in the last available resistance test from a given patient when they were ART-naïve. We then used a Benjamini–Hochberg adjustment to account for multiple testing, considering a false discovery rate of 0.2. We purposefully used a more liberal false discovery rate and OR in the prior steps to avoid erroneously discarding any mutation:HLA pair with a potentially valid association, with the intent of compensating for this with the following three analyses assessing the plausibility of the identified pairs:

Flowchart of methodology of obtaining the candidate DRM:HLA pairs with possible epitope relationship.
From the 3997 SHCS patients with both HLA-I data and drug resistance testing data, 5561 potential combinations of HLA-I type and DRMs were examinable, from which only 225 had sufficient power for testing. From these 225, three candidate pairs were found to have a significant HLA term in a logistic regression model predicting the resistance mutation in question. DRM, drug-resistant mutation; HLA, human leukocyte antigen.
Testing if the impact of duration of HIV infection on the emergence of DRM of interest depends on HLA type: For each candidate pair identified and systematically filtered out after the initial screening, we created a multivariable logistic regression model, where the outcome is the presence of the mutation in the ART-naïve patients in their earliest available sequence (before the start of ART), with the predictors being the presence of the queried HLA type, duration of HIV infection until time of ART initiation, and additionally, an interaction term between HLA type and infection time. The purpose of the interaction term is to measure if the presence or absence of the queried HLA type affects the selection pressure on the resistance mutation, which would be determined by the interaction term with time since HIV infections – that is, a significant interaction term would imply that time since HIV infection has a different effect on the odds of observing the DRM depending on whether the HLA allele is present or not.
Longitudinal/survival analyses: In addition to the cross-sectional logistic regression models, we used Cox proportional hazards survival models to test whether patients initially free of the queried DRM developed it over time. We only considered resistance testing data and time at risk before ART initiation. A patient requires at least two sequences before ART initiation to be included in this analysis. We observed which of the candidate DRM:HLA pairs yielded a survival model where the presence of the queried HLA type was significantly associated with a higher or lower hazard of developing/detecting the mutation over time.
Mechanistic plausibility/epitopic binding: To examine whether there was any mechanistic plausibility to the associations found in the above analyses, we utilized the program server NetMHCpan 4.1 to predict the binding affinity of the HLA allele to the all 9-mer peptides including the mutation position, with either the wild-type amino acid at the position or one of the three most common mutated amino acids observed (Reynisson et al., 2020). For the candidate pairs where the mutation does cause immune escape, we would anticipate the binding to be stronger for the wild-type peptide compared to the mutated peptides. Additionally, we searched the Los Alamos HIV Molecular Immunology Database to corroborate the candidate pairs with prior experimental studies indicating the HLA–epitope match (Korber et al., 2021).
Software
All analyses (besides the epitope binding predictions performed with netMHCpan) were done in R (version 3.6.1). The code can be found in Github (Nguyen, 2021).
Results
Obtaining candidate HLA–mutation pairs
From the 20,741 patients in the SHCS, 3997 had both HLA-I alleles data and resistance testing data (Figure 1). Characteristics of these patients are shown in Table 1. Patients with HLA data were more likely to be Caucasian compared to the general SHCS population, as the HLA SNP imputation methods were validated on a Caucasian population. In the data set, there were 5561 different combinations of HLA-I types represented and DRMs. Only 225 of these pairs had sufficient diversity at the HLA and DRM positions to convey a power greater than 0.8 to detect a strong effect defined as OR = 3 (see 'Materials and methods'). Using logistic regression models, we found three DRM:HLA pairs after multiple testing adjustment (described in Supplementary file 1), with a significant impact of the queried HLA type on the odds of observing the DRM: RT-E138:HLA-B18 (OR 6.999, 95% CI 4.662–10.413), RT-E138:HLA-A24 (OR 2.444, 95% CI 1.602–3.658), and RT-V179:HLA-B35 (OR 2.431, 95% CI 1.398–4.108). All three combinations involved a DRM in the reverse transcriptase (RT) gene. Of the three combinations, two were with HLA-B, while one was with HLA-A. These three candidate pairs were further evaluated with three complementary methods: (1) a further cross-sectional analysis examining the presence of an interaction term between infection time and HLA type, (2) a longitudinal survival analysis examining time to the DRM detection among treatment-naïve patients initially without the queried DRM detectable, and (3) NetMHCPan MHC binding prediction analysis to examine mechanistic plausibility.
General characteristics of SHCS patients and those with resistance mutation and human leukocyte antigen (HLA) data.
Overview of general characteristics of SHCS patients and the subsets with sequencing resistance testing data, HLA-I data, and both. IQR: interquartile range; MSM: men who have sex with men; HET: heterosexual; IDU: intravenous drug use.
All SHCS participants | SHCS patients with resistance testing data | SHCS patients with HLA-I data | SHCS patients with HLA-I and resistance testing data | |
---|---|---|---|---|
Number | 20,741 | 13,116 | 6450 | 3997 |
Median age (IQR) | 56 (48–62) | 54 (47–60) | 55 (49–62) | 54 (47–60) |
Male (%) | 15,064 (72.6%) | 9402 (71.2%) | 4836 (75.0%) | 3027 (75.7%) |
Risk group: MSM | 8100 (39.1%) | 5226 (39.8%) | 2777 (43.1%) | 1784 (44.6%) |
HET | 6841 (33.0%) | 4731 (36.1%) | 2173 (33.7%) | 1439 (36.0%) |
IDU | 4840 (23.3%) | 2568 (19.6%) | 1255 (19.5%) | 620 (15.5%) |
Other | 960 (4.6%) | 591 (4.5%) | 245 (3.8%) | 154 (3.9%) |
White (%) | 14044 (67.7%) | 9993 (76.2%) | 5661 (87.8%) | 3487 (87.2%) |
HLA-I types and DRMs in study population
The most commonly found HLA-I types are summarized in Table 2. Of note, 668 (16.7%) have an HLA-A24 allele, 376 (9.4%) with an HLA-B18 allele, and 728 (18.2%) with HLA-B35. Of the 3997 patients with both DRM and HLA-I data available, 719 (18.0%) had at least 1 DRM, of which 209 (5.2%) had multiple DRMs. Overall, 2267 of all 5155 DRMs in the study population are found among treatment-naïve individuals, and the most frequent of the 1072 DRMs found in the first resistance test in treatment-naïve individuals are summarized in Table 3. As for the two DRMs of interest, 145 had a DRM at RT-E138: 124 RT-E138A, 14 RT-E138G, 6 RT-E138K, and 1 RT-E138Q. Eighty-two were found at RT-V179: 68 RT-V179D, 13 RT-V179E, and 1 RT-V179F.
Distribution of most common HLA-I A, B, and C alleles in study population.
Ten most common HLA-A, -B, and -C types in study population individuals with both HLA-I and DRM information. Frequency and percentage of individuals with each allele are indicated. DRM, drug-resistant mutation; HLA, human leukocyte antigen.
HLA-A type | Frequency | Percentage |
---|---|---|
02 | 1838 | 46.0 |
03 | 964 | 24.1 |
01 | 857 | 21.4 |
24 | 668 | 16.7 |
11 | 493 | 12.3 |
68 | 340 | 8.5 |
32 | 302 | 7.6 |
30 | 300 | 7.5 |
26 | 272 | 6.8 |
29 | 261 | 6.5 |
HLA-B type | Frequency | Percentage |
44 | 905 | 22.6 |
07 | 814 | 20.4 |
35 | 729 | 18.2 |
51 | 639 | 16.0 |
15 | 582 | 14.6 |
08 | 500 | 12.5 |
40 | 410 | 10.3 |
18 | 376 | 9.4 |
57 | 328 | 8.2 |
27 | 294 | 7.4 |
HLA-C type | Frequency | Percentage |
07 | 1794 | 44.9 |
04 | 941 | 23.5 |
03 | 812 | 20.3 |
06 | 772 | 19.3 |
12 | 510 | 12.8 |
05 | 485 | 12.1 |
02 | 401 | 10.0 |
16 | 341 | 8.5 |
01 | 328 | 8.2 |
15 | 320 | 8.0 |
Distribution of most common drug-resistant mutations (DRMs) in study population.
Ten most common DRMs from the earliest available resistance testing of the study population, with the frequency and percentage of each among the study population indicated. Specific amino acid mutations represented in the population are shown.
Gene | Specific DRM | Frequency | Percentage |
---|---|---|---|
RT-E138 | AGKQ | 145 | 3.63 |
RT-T215 | ACDEFILNSVY | 132 | 3.30 |
RT-V106 | AIM | 95 | 2.38 |
RT-V179 | DEF | 82 | 2.05 |
RT-M41 | L | 72 | 1.80 |
PR-M46 | ILV | 47 | 1.18 |
RT-K103 | NS | 46 | 1.15 |
RT-K219 | ENQR | 34 | 0.85 |
RT-D67 | EGN | 34 | 0.85 |
RT-M184 | IV | 30 | 0.75 |
Cross-sectional analyses/logistic regression models
To examine the effect of having a given HLA-I allele on the presence of the DRM in question, we created for each candidate pair a logistic regression model predicting the presence of that specific DRM (at the earliest resistance testing), given the presence/absence of the queried HLA-I type. From the three candidate pairs, one resultant logistic regression model had a significant interaction term between presence of the queried HLA type and duration of HIV infection (Figure 2). For RT-E138:HLA-B18, duration of HIV infection (OR 0.918, 95% CI 0.862–0.971 [p-value=0.004]) and the HLA:time-to-DRM interaction term (OR 1.169, 95% CI 1.075–1.273 [p-value<0.001]) were both significant predictors of an RT-E138 mutation. Greater infection time was thus correlated with a smaller chance of having/detecting the RT-E138 mutation (due to the fitness cost of the mutation). However, in individuals with HLA-B18, the HLA:time-to-DRM interaction terms cause the selection pressure to reverse direction, hence greater infection time is instead correlated with a greater probability of an RT-E138 mutation for HLA-B18 individuals.

Logistic regression models testing for interaction between the queried human leukocyte antigen (HLA) type and duration of infection in predicting the presence of drug-resistant mutation (DRM).
Of the three candidate DRM:HLA type pairs, one pair, RT-E138:HLA-B18, indicates a significant interaction term between the presence of the queried HLA type and the duration of HIV infection in a logistic regression model predicting the presence of a mutation at RT-E138 (A). (B) Details of all three candidates’ logistic regression models.
Longitudinal/survival analyses
To examine the effect of having a given HLA-I allele on the development of the DRM in question, we performed for each pair a survival analysis to observe how many individuals initially without the DRM eventually develop it prior to initiation of ART. Two of the three candidate DRM:HLA pairs were shown to have a significant difference in the probability of the queried mutation arising in initially wild-type individuals. For RT-E138:HLA-B18, 63 (7.7%) of the 813 patients without an RT-E138 mutation were HLA-B18, among which 5 (7.9%) developed it before ART initiation, compared to the 5 (0.7%) of the 750 with another HLA-B18 type (hazard ratio [HR] 12.211, 95% CI 3.523–42.318 [p-value<0.001]) (Figure 3). RT-V179:HLA-B35 showed a similarly sharpened increased hazard of developing the mutation. Of the 150 (18.3%) of the 821 patients with HLA-B35 (initially without an HLA-B35 mutation), 3 (2.0%) developed a mutation at RT-V179, compared to only 1 (0.1%) of the 671 with another HLA-B type (HR 16.116, 95% CI 1.673–155.216 [p-value=0.016]).

Hazard ratios and cumulative hazards of developing queried drug-resistant mutation over time in relation to the presence of human leukocyte antigen (HLA) type.
(A) Cox proportional hazard ratios for developing the queried drug-resistant mutation with the queried HLA-I type. (B, C) Cumulative hazard plots of the two pairs from (A) where the hazard ratios were significant, indicating cumulative hazards of developing the mutation among those initially wild type, with red lines indicating individuals with the queried HLA type and blue lines for those with another HLA type.
Mechanistic plausibility/epitope binding
NetMHCpan predictions of HLA binding were performed to gauge the mechanistic plausibility of the effects observed in the first two analyses. These also indicated weakened HLA binding to the DRM-peptide (i.e. supporting the putative association) for two of the three candidate pairs: RT-E138:HLA-B18 and RT-V179:HLA-B35 (Supplementary file 2). Thus, in these two DRM:HLA pairs, the HLA-I allele is driving viral immune escape by reducing avidity to MHC. The two pairs supported by mechanistic plausibility are the same two pairs having a significant relationship between HLA type presence and survival in the longitudinal analyses (Table 4). Prior literature indicating experimentally verified epitope binding of the HIV proteome to HLA also exists for these two pairs (Gatanaga et al., 2013; Kopycinski et al., 2014; Liu et al., 2006; Li et al., 2011; Llano et al., 2019; Pereyra et al., 2014; Kiepiela et al., 2007; Peretz et al., 2011; Rowland-Jones et al., 1995; Tebit et al., 2009; Bond et al., 2001).
DRM:HLA pairs corroborated by each analytical approach.
Summary of HLA–drug-resistant mutation pairs in all three approaches. Methods that corroborate the HLA–mutation relationship are indicated by ‘yes.’ DRM, drug-resistant mutation; HLA, human leukocyte antigen.
DRM:HLA pair | Interaction term in cross-sectional logistic regression | Longitudinal/ survival analysis | Mechanistic plausibility |
---|---|---|---|
RT-E138:HLA-B18 | Yes | Yes | Yes |
RT-E138:HLA-A24 | No | No | No |
RT-V179:HLA-B35 | No | Yes | Yes |
Discussion
Our analyses indicate strong evidence for the presence of an evolutionary intrapatient interaction between HIV DRMs and certain HLA-I alleles. Of the three candidate DRM:HLA pairs analyzed by three methods, two were supported by two of the analyses to show this relationship, of which one, RT-E138:HLA-B18, was supported by all three (Table 4). This is notable as this pair has been specifically investigated by Gatanaga et al., 2013, who showed both experimentally and through structural modeling that HLA B18-restricted CTLs select for a mutation in RT138. Our study independently demonstrates that this interaction is relevant at the population level, both in cross-sectional and in longitudinal cohort data. Of note, both DRMs are associated with the nonnucleoside analogue reverse transcriptase inhibitor class of ART drugs, with RT-V179D/F/T being associated with resistance to Etravirine and RT-V179L being associated with Rilpivirine. RT-E138A/G/K/Q is associated with resistance to Etravirine and Rilpivirine (International Antiviral Society, 2019). Estimates of virological failure for these two drugs are upwards of 5% and 11%, for Efavirenz and Rilpivirine, respectively (Sanford, 2012).
These results have major implications for our understanding of the evolutionary epidemiology in viral infections as they demonstrate a considerable interaction between the processes of drug resistance evolution and immune escape observed for several drug classes and HLA alleles in a representative patient population. This extends the standard paradigm that resistance mutations are acquired in treated individuals, may become transmitted, but eventually disappear in treated individuals with the possibility that resistance mutations newly emerge in untreated individuals due to immune escape. While this mechanism does obviously not account for the majority of DRMs in patients with untreated HIV, it may not be a negligible phenomenon.
In fact, HLA type-driven viral evolution in DRM-relevant CTL epitopes may be particularly relevant in light of the estimated 10% with a DRM in ART-naïve European HIV-positive patients, and even higher figures in low-resource settings, where continuing issues with access to treatment and adherence exacerbate the risk of treatment failure (Günthard et al., 2019; Hofstra et al., 2016; Wittkop et al., 2011; Chimukangara et al., 2019; Pessôa and Sanabani, 2017). As HLA is extremely diverse in the human population, and exhibits high variation in allelic frequency in different geographic regions (Piazza et al., 1980), this DRM:HLA link may partially explain regional variations in pre-treatment drug resistance. Accordingly, we would expect the emergence of certain DRMs in the population that is ART-naïve, or specifically, naïve to Etravirine and Rilpivirine, if the local population has a higher prevalence of the HLA types indicated in our analyses.
As the SHCS primarily consists of individuals of white ethnicity from Switzerland and surrounding countries, our study is statistically best powered to detect DRM:HLA pairs amongst white patients, and may be too underpowered to detect DRM:HLA pairs involving HLA-I alleles more prevalent in non-white, low-resource settings – precisely where DRMs are a more urgent issue. This is even concerning considering the high number of pairs eliminated after filtering out those with insufficient numbers to power an analysis (Figure 1). This lack of power may explain, for example, RT-V179:HLA-B35 indicates a DRM:HLA association in the longitudinal analysis, but not in the cross-sectional analysis with the interaction term (Table 4). It is conceivable that with greater numbers of patients and more years of follow-up that more DRM:HLA pairs would be detected and that these inter-analyses inconsistencies would be resolved, though we should not exclude the possibility of other sources for such discrepancies, for example, imprecise estimates of HIV infection time. The limitation of most sequences to the pol gene also made the analyses underpowered to find DRM:HLA relationships in other genes.
Despite these limitations, our study is strengthened by its methodological breadth and thoroughness. While other studies have examined the link between HLA-I and DRMs (Ahlenstiel et al., 2007; Bailey et al., 2007), this study is on a numerically larger scale, and is unique to systematically examine an entire HIV cohort population’s DRM profiles and HLA-I types to screen for potential DRM:HLA pairs. As the cross-sectional analysis took into account duration of infection, it thus effectively excluded from consideration tDRMs that were disadvantageous to viral fitness in ART-naïve patients, identifying any DRMs that remained over time despite the lack of selection pressure from ART, thus mitigating the possibility that these DRMs are merely tDRMs with no relevance to viral pathogenesis in the patient. Additionally, as it is now clinical practice to immediately initiate ART in newly diagnosed patients since several years, there is now hardly ever more than one ART-naïve sequence per patient, thus making our longitudinal analysis very unique and difficult to replicate in the future (World Health Organization, 2016; Ryom et al., 2016).
By utilizing three different analytical approaches, especially by combining the longitudinal and cross-sectional approaches, we are able to identity and validate DRM:HLA pairs where there is this epitope–mutation interaction. The NetMHCPan analyses allowed us to connect the associations we statistically detected at a population level with predicted MHC binding, which was additionally supported by prior experimental findings. This screening process is also strengthened by the restriction to pairs where the HLA-I and DRM frequencies have sufficient power, thus reducing the number of performed tests and the magnitude of the Benjamini–Hochberg multiple testing adjustment.
Our findings not only have an impact on our understanding of why DRMs tend to be transmitted and maintained in certain individuals, but may also help inform ART in the future. While it would not be feasible to tailor ART treatment based on personal HLA genotyping in resource-limited settings, this information could be used to help anticipate a higher frequency of certain DRMs where a corresponding HLA-I type is more prevalent. As HIV sequencing progresses, more complete DRM:HLA data on other genes, particularly integrase, will become available at sufficiently powered frequencies, enabling us to detect potential DRM:HLA pairs that may affect the efficacy of integrase inhibitors, a newer and increasingly used ART drug class.
Data availability
The individual-level datasets generated and analyzed for the current study do not fullfill the requirements for open data access: (1) The SHCS informed consent states that sharing data outside the SHCS network is only permitted for specific studies on HIV infection and its related complications, and to researchers who have signed an agreement detailing the use of the data and biological samples; and (2) the data is too dense and comprehensive to preserve patient privacy in persons living with HIV. Per Swiss law, data cannot be shared if data subjects have not agreed or if data is too sensitive to share. Investigators with a request for the data that support the findings of this study should contact the corresponding author Roger Kouyos and the Scientific Board of the SHCS. The provision of data will be considered by the Scientific Board of the SHCS and the study team and is subject to Swiss legal and ethical regulations, and is outlined in a material and data transfer agreement. We have however, provided the data files (with the rows anonymized and randomly re-assorted) with the bare minimum number of variables necessary to do the core analyses and to assemble the figures as shown in the manuscript. The code for the analysis can be found on Github repository: https://github.com/hnyhnyhny/HNGUYEN_HLA_DRM (copy archived at https://archive.softwareheritage.org/swh:1:rev:4a03919f07748ff22c4bf529100505ecc78b57cd).
References
-
Selective pressures of HLA genotypes and antiviral therapy on human immunodeficiency virus type 1 sequence mutation at a population levelClinical and Vaccine Immunology 14:1266–1273.https://doi.org/10.1128/CVI.00169-07
-
Evolution of HIV-1 in an HLA-B*57-positive patient during virologic escapeThe Journal of Infectious Diseases 196:50–55.https://doi.org/10.1086/518515
-
MHC polymorphism under host-pathogen coevolutionImmunogenetics 55:732–739.https://doi.org/10.1007/s00251-003-0630-5
-
High-Accuracy HLA type inference from Whole-Genome sequencing data using population reference graphsPLOS Computational Biology 12:e1005151.https://doi.org/10.1371/journal.pcbi.1005151
-
Naturally selected rilpivirine-resistant HIV-1 variants by host cellular immunityClinical Infectious Diseases 57:1051–1055.https://doi.org/10.1093/cid/cit430
-
Human immunodeficiency virus drug resistance: 2018 recommendations of the international antiviral Society-USA panelClinical Infectious Diseases 68:177–187.https://doi.org/10.1093/cid/ciy463
-
Transmission of HIV drug resistance and the predicted effect on current First-line regimens in EuropeClinical Infectious Diseases 62:655–663.https://doi.org/10.1093/cid/civ963
-
BookUpdate of the Drug Resistance Mutations in HIV-1USA: International Antiviral Society.
-
Swiss HIV cohort study. role of gag mutations in PI resistance in the swiss HIV cohort study: bystanders or contributors?The Journal of Antimicrobial Chemotherapy 72:866–875.https://doi.org/10.1093/jac/dkw493
-
ReportHIV Molecular Immunology DatabaseLos Alamos National Laboratory, Theoretical Biology and Biophysics.
-
HIV evolution: CTL escape mutation and reversion after transmissionNature Medicine 10:282–289.https://doi.org/10.1038/nm992
-
Additive contribution of HLA class I alleles in the immune control of HIV-1 infectionJournal of Virology 84:9879–9888.https://doi.org/10.1128/JVI.00320-10
-
Selection on the human immunodeficiency virus type 1 proteome following primary infectionJournal of Virology 80:9519–9529.https://doi.org/10.1128/JVI.00575-06
-
BookOptimal HIV CTL epitopes update: Growing diversity in epitope length and HLA restrictionLos Alamos, NM, USA: HIV Immunology and HIV/SIV Vaccine Databases; Los Alamos National Laboratory, Theoretical Biology and Biophysics.
-
Evolution and diversity of the human leukocyte antigen (HLA)Evolution Medicine 2015:1.https://doi.org/10.1093/emph/eou033
-
HIV control is mediated in part by CD8+ T-cell targeting of specific epitopesJournal of Virology 88:12937–12948.https://doi.org/10.1128/JVI.01004-14
-
Human immunodeficiency virus reverse transcriptase and protease sequence databaseNucleic Acids Research 31:298–303.https://doi.org/10.1093/nar/gkg100
-
Cohort profile: the swiss HIV cohort studyInternational Journal of Epidemiology 39:1179–1189.https://doi.org/10.1093/ije/dyp321
-
Analysis of the diversity of the HIV-1 pol gene and drug resistance associated changes among drug-naïve patients in Burkina FasoJournal of Medical Virology 81:1691–1701.https://doi.org/10.1002/jmv.21600
-
Emergence of acquired HIV-1 drug resistance almost stopped in Switzerland: a 15-Year prospective cohort analysisClinical Infectious Diseases 62:1310–1317.https://doi.org/10.1093/cid/ciw128
-
BookConsolidated Guidelines on the Use of Antiretroviral Drugs for Treating and Preventing HIV Infection: Recommendations for a Public Health ApproachWorld Health Organization.
Decision letter
-
Joshua T SchifferReviewing Editor; Fred Hutchinson Cancer Research Center, United States
-
Jos WM van der MeerSenior Editor; Radboud University Medical Centre, Netherlands
-
Joshua T SchifferReviewer; Fred Hutchinson Cancer Research Center, United States
In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.
Acceptance summary:
This paper uses a rigorous methodological approach to identify an interesting and novel mechanism of HIV drug resistance in addition to acquired and transmitted drug resistance: accidental resistance arising coincidentally due to immune escape. This finding is of potential clinical importance as such mutations can arise spontaneously during untreated HIV infection in the context of a specific HLA-type.
Decision letter after peer review:
Thank you for submitting your article "Systematic screening of viral and human genetic variation identifies antiretroviral resistance and immune escape link" for consideration by eLife. Your article has been reviewed by 2 peer reviewers, including Joshua T Schiffer as the Reviewing Editor and Reviewr #1, and the evaluation has been overseen by Jos van der Meer as the Senior Editor.
The reviewers have discussed their reviews with one another, and the Reviewing Editor has drafted this to help you prepare a revised submission.
Essential revisions:
1. Please expand and clarify the description of the DRM and genotyping data. Please specify the number of different HLA-types, the relative frequencies of the predominant HLA-types, the number of different drug resistance variants found in ART-naïve patients and their relative frequency in the population as well as frequencies of multiple DRMs.
2. In the results, please include a first paragraph that puts the subsequent analysis into context and describe the purpose of the cross-sectional and survival analyses before presenting the results.
3. Either eliminate the term "literature search" or specifically describe the search methodology that was used to perform this search.
4. Include viral load data if available as described by Reviewer 1.
Reviewer #1:
This paper by Nguyen and colleagues identifies 3 possible HIV mutations which are linked by HLA-association and drug resistance. The conclusions are justified based on the analysis. The results are unlikely to influence clinical practice as the identified mutations are of only occasional importance in the clinic. However, the general concept is of great relevance to HIV and other viral infections.
Strengths of the study are:
1. Novel conclusions: the paper is important because it identifies a new source of possible drug resistance in addition to selective pressure during failed therapy and transmitted drug resistance.
2. Methodology strength: the authors do a nice job of attempting to move towards causality rather than correlation. Specifically, they establish strength of correlation, temporality (the survival analysis showing accrual of the new mutation over time), assessment of possible confounding variables such as ruling out transmitted mutations with the survival analysis and looking at an interaction term with infection time, assessing mechanistic plausibility with literature review and utilizing a very large cohort.
3. Thoughtful and clearly written intro and discussion.
There were no major weaknesses with the study from my perspective.
This study was extremely clearly written and I believe the scientific conclusions are justified by the analysis. One area of interest would be viral loads among study participants with specific HLA/drug resistance pairs. If possible and if data is available, then it would be interesting to see whether onset of a new mutation as seen in the survival analysis is associated with a decrease in viral load due to a fitness cost. There would be a natural comparison to make versus viral load changes in all other participants, as well as those with relevant HLA types who do not develop new drug resistance mutations. A similar approach with CD4 T cell count trajectories would also be interesting and in theory easy to perform.
Reviewer #2:
Drug resistance mutations (DRMs) are known to emerge and be selected for in patients on ART experiencing treatment failure. Transmission of DRMs can lead to high levels of drug resistance in treatment-naïve individuals in resource-limited settings, which may compromise first-line treatment regimens. Here the authors investigated whether the emergence and persistence of DRMs in untreated individuals could also occur as a side effect of immune selection. To this end, they screened for associations between drug resistance mutations and different HLA-1 alleles in a large cohort of nearly 4,000 treatment-naïve patients.
The authors identified three DRM-HLA-1 pairs where the presence of the HLA-1 allele was predictive of the patient having acquired the DRM prior to starting ART. Similar analyses of other patients cohorts have also identified potential associations between DRMs and specific HLA-1 alleles, so in that sense, the novelty of these findings is limited. However, this study substantially extends previous work by incorporating additional analyses that account for the duration of infection before treatment and the time to emergence of drug resistance, leveraging their exceptionally detailed dataset. These analyses revealed that patients without the HLA-B18 allele who had acquired the DRM at transmission were likely to revert to wildtype over time due to the fitness cost associated with the mutation, whereas patients with the HLA-B18 allele either retained the transmitted DRM or generated it de novo to escape immune pressure. Longitudinal survival analysis further indicated that patients with the HLA-B18 and HLA-B35 alleles who did not initially have the associated DRMs were significantly more likely to acquire them than patients without the alleles.
Overall, the conclusions of this paper are well supported by data. The statistical methods used are rigorous and straightforward to apply to cohort data for other populations. However, the description of the DRM and genotyping data is minimal, and its presentation is somewhat confusing. While the three identified HLA-DRM pairs were analyzed extensively, it is not clear how frequent these genotypes and DRMs (independently of each other) are in the patient cohort. As currently written, the manuscript lacks context for assessing the contribution of immune-driven DRMs relative to transmission-acquired DRMs in the treatment-naïve HIV+ population. Overall, however, this paper provides a valuable contribution to the field as a proof-of-concept demonstration of the emergence of immune-derived DRMs.
The Results section would greatly benefit from a first paragraph that puts the subsequent analysis into context. Currently it is challenging to get any sense for the distribution of DMRs and HLA-types in the population. It would help to specify the number of different HLA-types, the relative frequencies of the predominant HLA-types, the number of different drug resistance variants found in ART-naïve patients and their relative frequency in the population. What proportion of the patients that were included in the study had any drug resistance mutations at all? Did any patients have multiple DRMs?
In my opinion, the Results section would flow more logically if the purpose of the cross-sectional and survival analyses were discussed before presenting the results. E.g. A cross-sectional analysis was performed to assess "…" and a survival analysis to assess "…".
Specifying "literature search" as an analysis methodology seems a little clunky. Why not simply state that the B18 HLA type has been shown to bind to the epitope containing the E138 DR mutation in several previous studies [listing citations]?
Would the cross-section analysis have more power if the two potential HLA alleles associated with the PR-E138 DRM were compared simultaneously against a reference group containing all other alleles? In the current analysis, the two potentially important alleles are essentially compared against each other (as part of the "other" category).
It would be helpful if the frequencies in Figure 3A would be spelled out in text rather than just shown in the table. E.g. what proportion of patients had the DRM-associated HLA alleles, and what fraction of these patients subsequently acquired the DRMs?
https://doi.org/10.7554/eLife.67388.sa1Author response
Reviewer #1:
[…] This study was extremely clearly written and I believe the scientific conclusions are justified by the analysis. One area of interest would be viral loads among study participants with specific HLA/drug resistance pairs. If possible and if data is available, then it would be interesting to see whether onset of a new mutation as seen in the survival analysis is associated with a decrease in viral load due to a fitness cost. There would be a natural comparison to make versus viral load changes in all other participants, as well as those with relevant HLA types who do not develop new drug resistance mutations. A similar approach with CD4 T cell count trajectories would also be interesting and in theory easy to perform.
While this is a very relevant question and would undoubtedly add to the study, we are unfortunately limited by power. For 2 of the pairs: RT-E138:HLA-A24 and RT-V179:HLA-B35, there are 2 and 3 mutation events respectively among those with the queried HLA type, and 8 and 1 respectively for those with another HLA type. This makes a paired T-test comparing the mean HIV viral load or CD4 count before and after the mutation nearly impossible to detect any difference, even if there is one.
To illustrate this, we did perform the comparison for RT-E138:HLA-B18:
The average viral load right after the mutation for those with HLA-B18 was 51 489, compared to the much higher 97 098 viral load mean before the mutation arose. Although the viral load is apparently halved, the fact that this was pooled from the four out of the five individuals with a post and pre-mutation viral load measurement, meant that the p value of the paired T-test (after log-transforming the viral load values), was only 0.122. While this indicates a probable relationship (particularly in contrast to the increase among the 5 non-HLA-B18 individuals who developed the mutation [mean viral load before: 23 036, after: 35 756, p value=0.137), we would need more patients to definitively demonstrate this. There was no change observed for CD4 count.
Because of this, we regretfully decided not to include this into the manuscript.
Reviewer #2:
[…] The Results section would greatly benefit from a first paragraph that puts the subsequent analysis into context. Currently it is challenging to get any sense for the distribution of DRMs and HLA-types in the population. It would help to specify the number of different HLA-types, the relative frequencies of the predominant HLA-types, the number of different drug resistance variants found in ART-naïve patients and their relative frequency in the population. What proportion of the patients that were included in the study had any drug resistance mutations at all? Did any patients have multiple DRMs?
This is a very valid point. Two tables have now been added to indicate the most frequent HLA-I alleles and DRMs observed in the study population (Tables 2-3, pg 14-15). A new subsection (lines 270-280) has been added to the Results to summarize them, before moving on to the analyses themselves (the DRM breakdown originally in the Discussion has also been moved to this subsection):
“The most commonly found HLA-I types are summarized in Table 2. […] As for the two DRMs of interest, 145had a DRM at RT-E138: 124 RT-E138A, 14 RT-E138G, 6 RT-E138K, 1 RT-E138Q. Eighty-two were found at RT-V179: 68 RT-V179D, 13 RT-V179E, 1 RT-V179F.”
In my opinion, the Results section would flow more logically if the purpose of the cross-sectional and survival analyses were discussed before presenting the results. E.g. A cross-sectional analysis was performed to assess "…" and a survival analysis to assess "…".
An introduction to each analysis has been added to the beginning of each analysis subsection. (lines 294-297, 319-321, 345-346):
“To examine the effect of having a given HLA-I allele on the presence of the DRM in question, we created for each candidate pair a logistic regression model predicting the presence of that specific DRM (at the earliest resistance testing), given presence/absence of the queried HLA-I type. […] These also indicated weakened HLA binding to the DRM-peptide (i.e. supporting the putative association) for two of the three candidate pairs: RT-E138:HLA-B18 and RT-V179:HLA-B35 (Supplementary Table S2).”
Specifying "literature search" as an analysis methodology seems a little clunky. Why not simply state that the B18 HLA type has been shown to bind to the epitope containing the E138 DR mutation in several previous studies [listing citations]?
Methods:
“To examine if there was any mechanistic plausibility to the associations found in the above analyses, we utilized the program server NetMHCpan 4.1 to predict the binding affinity of the HLA allele to the all 9-mer peptides including the mutation position, with either the wildtype amino acid at the position or one of the three most common mutated amino acids observed (24). […] Additionally, we searched the Los Alamos HIV Molecular Immunology Database to corroborate the candidate pairs with prior experimental studies indicating the HLA-epitope match (25).”
Results:
“NetMHCpan predictions of HLA binding were performed to gauge the mechanistic plausibility of the effects observed in the first two analyses. […] Prior literature indicating experimentally verified epitope binding of the HIV proteome to HLA also exists for these two pairs (13, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36).”
Would the cross-section analysis have more power if the two potential HLA alleles associated with the PR-E138 DRM were compared simultaneously against a reference group containing all other alleles? In the current analysis, the two potentially important alleles are essentially compared against each other (as part of the "other" category).
While combining the two HLA alleles would provide more statistical power, the issue is that one pair with RT-E138 ( with HLA-B18), already demonstrates a very strong relationship, while the other (with HLA-A24) shows none in our analysis. A pooled analysis would very likely show a relationship, but it would almost certainly be due to the RT-E138:HLA-B18’s strongly significant association “overpowering” the RT-E138:HLA-A24’s null association, and misleadingly yielding a statistically significant association overall.
It would be helpful if the frequencies in Figure 3A would be spelled out in text rather than just shown in the table. E.g. what proportion of patients had the DRM-associated HLA alleles, and what fraction of these patients subsequently acquired the DRMs?
The manuscript has now been updated to clearly explain these numbers (and percentages) within the text itself (lines 324-332):
“For RT-E138:HLA-B18, 63 (7.7%) of the 813 patients without an RT-E138 mutation were HLA-B18, among which 5 (7.9%) developed it before ART initiation, compared to the 5 (0.7%) out of the 750 with another HLA-B18 type (HR: 12.211, 95% CI: 3.523-42.318 [p value <0.001]) (Figure 3). […] Of the 150 (18.3%) of the 821 patients with HLA-B35 (initially without an HLA-B35 mutation), 3 (2.0%) developed a mutation at RT-V179, compared to only 1 (0.1%) of the 671 with another HLA-B type (HR: 16.116, 95% CI: 1.673-155.216 [p value = 0.016]).”
https://doi.org/10.7554/eLife.67388.sa2Article and author information
Author details
Funding
University of Zurich (University Research Priority Program, “Evolution in Action: From Genomes to Ecosystems”: U-702-26-01)
- Huyen Nguyen
- Roger D Kouyos
Schweizerischer Nationalfonds zur Förderung der Wissenschaftlichen Forschung (BSSGI0_155851)
- Huldrych F Günthard
Schweizerischer Nationalfonds zur Förderung der Wissenschaftlichen Forschung (179571)
- Huldrych F Günthard
Schweizerischer Nationalfonds zur Förderung der Wissenschaftlichen Forschung (148522)
- Huldrych F Günthard
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Acknowledgements
The authors thank the patients who participated in the Swiss HIV Cohort Study; the physicians and study nurses, for the excellent patient care provided to participants; the resistance laboratories for high-quality genotyping drug resistance testing; SmartGene (Zug, Switzerland), for technical support; Alexandra Scherrer, Susanne Wild, and Anna Traytel from the SHCS data center for data management; and Marianne Amstutz, Danièle Perraudin, and Mirjam Minichiello for administration. The members of the Swiss HIV Cohort Study include the following: A Anagnostopoulos, MB, EB, JB, D L Braun, H C Bucher, A Calmy, MC, A Ciuffi, G Dollenmaier, M Egger, L Elzi, J Fehr, JF, H Furrer (chairman of the Clinical and Laboratory Committee), C A Fux, H F Günthard (president of the SHCS), D Haerry (deputy of ‘Positive Council’), B Hasse, HH Hirsch, M Hoffmann, I Hösli, M Huber, C Kahlert, L Kaiser, O Keiser, TK, RD Kouyos, H Kovari, B Ledergerber, G Martinetti, B Martinez de Tejada, C Marzolini, KJ Metzner, N Müller, D Nicca, PP, G Pantaleo, MP, A Rauch (chairman of the Scientific Board), C Rudin (chairman of the Mother and Child Substudy), K Kusejko (head of Data Center), P Schmid, R Speck, M Stöckle, P Tarr, A Trkola, PV, G Wandeler, R Weber, and SY.
Ethics
Human subjects: The SHCS has been approved by the participating institutions' ethics committees (Kantonale Ethikkommission Bern, Ethikkommission des Kantons St. Gallen, Comité; Départemental dÉthique des Spécialités Médicales et de Médecine Communautaire et de Premier Recours, Kantonale Ethikkommission Zurich, Repubblica e Cantone Ticino-Comitato Etico Cantonale, Commission Cantonale d'Éthique de la Recherche sur l'tre Humain, Ethikkommission beider Basel; all approvals are available at http://www.shcs.ch/206-%0Dethic-committee-approval-and-informed-consent). Written informed consent was obtained from all participants.
Senior Editor
- Jos WM van der Meer, Radboud University Medical Centre, Netherlands
Reviewing Editor
- Joshua T Schiffer, Fred Hutchinson Cancer Research Center, United States
Reviewer
- Joshua T Schiffer, Fred Hutchinson Cancer Research Center, United States
Version history
- Received: February 9, 2021
- Accepted: May 18, 2021
- Version of Record published: June 1, 2021 (version 1)
Copyright
© 2021, Nguyen et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics
-
- 430
- Page views
-
- 43
- Downloads
-
- 2
- Citations
Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading
-
- Biochemistry and Chemical Biology
- Epidemiology and Global Health
Background: High levels of circulating adiponectin are associated with increased insulin sensitivity, low prevalence of diabetes, and low body mass index (BMI); however, high levels of circulating adiponectin are also associated with increased mortality in the 60-70 age group. In this study, we aimed to clarify factors associated with circulating high-molecular-weight (cHMW) adiponectin levels and their association with mortality in the very old (85-89 years old) and centenarians.
Methods: The study included 812 (women: 84.4%) for centenarians and 1,498 (women: 51.7%) for the very old. The genomic DNA sequence data were obtained by whole genome sequencing or DNA microarray-imputation methods. LASSO and multivariate regression analyses were used to evaluate cHMW adiponectin characteristics and associated factors. All-cause mortality was analyzed in three quantile groups of cHMW adiponectin levels using Cox regression.
Results: The cHMW adiponectin levels were increased significantly beyond 100 years of age, were negatively associated with diabetes prevalence, and were associated with SNVs in CDH13 (p = 2.21 × 10-22) and ADIPOQ (p = 5.72 × 10-7). Multivariate regression analysis revealed that genetic variants, BMI, and high-density lipoprotein cholesterol (HDLC) were the main factors associated with cHMW adiponectin levels in the very old, whereas the BMI showed no association in centenarians. The hazard ratios for all-cause mortality in the intermediate and high cHMW adiponectin groups in very old men were significantly higher rather than those for all-cause mortality in the low level cHMW adiponectin group, even after adjustment with BMI. In contrast, the hazard ratios for all-cause mortality were significantly higher for high cHMW adiponectin groups in very old women, but were not significant after adjustment with BMI.
Conclusions: cHMW adiponectin levels increased with age until centenarians, and the contribution of known major factors associated with cHMW adiponectin levels, including BMI and HDLC, varies with age, suggesting that its physiological significance also varies with age in the oldest old.
Funding: This study was supported by grants from the Ministry of Health, Welfare, and Labour for the Scientific Research Projects for Longevity; a Grant-in-Aid for Scientific Research (No 21590775, 24590898, 15KT0009, 18H03055, 20K20409, 20K07792, 23H03337) from the Japan Society for the Promotion of Science; Keio University Global Research Institute (KGRI), Kanagawa Institute of Industrial Science and Technology (KISTEC), Japan Science and Technology Agency (JST) Research Complex Program 'Tonomachi Research Complex' Wellbeing Research Campus: Creating new values through technological and social innovation (JP15667051), the Program for an Integrated Database of Clinical and Genomic Information from the Japan Agency for Medical Research and Development (No. 16kk0205009h001, 17jm0210051h0001, 19dk0207045h0001); the medical-welfare-food-agriculture collaborative consortium project from the Japan Ministry of Agriculture, Forestry, and Fisheries; and the Biobank Japan Program from the Ministry of Education, Culture, Sports, and Technology.
-
- Epidemiology and Global Health
Background:
Machine learning (ML) techniques improve disease prediction by identifying the most relevant features in multidimensional data. We compared the accuracy of ML algorithms for predicting incident diabetic kidney disease (DKD).
Methods:
We utilized longitudinal data from 1365 Chinese, Malay, and Indian participants aged 40–80 y with diabetes but free of DKD who participated in the baseline and 6-year follow-up visit of the Singapore Epidemiology of Eye Diseases Study (2004–2017). Incident DKD (11.9%) was defined as an estimated glomerular filtration rate (eGFR) <60 mL/min/1.73 m2 with at least 25% decrease in eGFR at follow-up from baseline. A total of 339 features, including participant characteristics, retinal imaging, and genetic and blood metabolites, were used as predictors. Performances of several ML models were compared to each other and to logistic regression (LR) model based on established features of DKD (age, sex, ethnicity, duration of diabetes, systolic blood pressure, HbA1c, and body mass index) using area under the receiver operating characteristic curve (AUC).
Results:
ML model Elastic Net (EN) had the best AUC (95% CI) of 0.851 (0.847–0.856), which was 7.0% relatively higher than by LR 0.795 (0.790–0.801). Sensitivity and specificity of EN were 88.2 and 65.9% vs. 73.0 and 72.8% by LR. The top 15 predictors included age, ethnicity, antidiabetic medication, hypertension, diabetic retinopathy, systolic blood pressure, HbA1c, eGFR, and metabolites related to lipids, lipoproteins, fatty acids, and ketone bodies.
Conclusions:
Our results showed that ML, together with feature selection, improves prediction accuracy of DKD risk in an asymptomatic stable population and identifies novel risk factors, including metabolites.
Funding:
This study was supported by the National Medical Research Council, NMRC/OFLCG/001/2017 and NMRC/HCSAINV/MOH-001019-00. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.