Transmission Histories: Traversing missing links in the spread of HIV

Combining clinical and genetic data can improve the effectiveness of virus tracking with the aim of reducing the number of HIV cases by 2030.
  1. Erin Brintnell
  2. Art Poon  Is a corresponding author
  1. Department of Pathology and Laboratory Medicine, Western University, Canada
  2. Department of Computer Science, Western University, Canada

The human immunodeficiency virus type 1 (HIV-1), which can lead to acquired immune deficiency syndrome (AIDS), remains a leading cause of death and a health threat worldwide, with over 38.4 million people currently living with the virus. Global health sector strategies strive to end HIV-1 epidemics by 2030 (Duncombe et al., 2019). This requires significant investment in resources to treat and prevent the disease, such as reducing the number of people who do not know they are carrying the virus and improving the availability and affordability of effective treatments.

In cities that have scaled up HIV-1 treatment and prevention, it will be crucial to establish whether new HIV-1 infections are due to ongoing local transmission or to infections acquired abroad. This means reconstructing the spread of HIV-1 between individuals through contact tracing: interviewing people recently diagnosed with HIV, and locating and notifying their intimate partners. However, contact tracing is both time-consuming and intrusive (El-Sadr et al., 2022).

A cost-effective alternative to contact tracing is to compare the genomic sequences of HIV-1 samples from different patients, which are often collected to screen for mutations that confer drug resistance. Infections that are genetically similar are more likely to be related through recent transmissions. This is especially true for HIV-1, a rapidly evolving virus that becomes genetically unique within months of an infection (Williamson, 2003).

These genetic sequences can be used to build a tree that represents the shared evolutionary history of the infections and approximates the history of recent transmissions (De Maio et al., 2018; Romero-Severson et al., 2014). Furthermore, the spread of infections from one place to another can be extrapolated by reconstructing locations of ‘ancestral’ infections at deeper nodes of the tree from the known locations at the tips (Faria et al., 2011). The accuracy of these estimates, however, is impeded by the unknown number of people with undiagnosed infections, or with diagnosed infections that have not been sequenced (Didelot et al., 2017). In addition, reconstructing transmission patterns from HIV-1 sequences comes with its own ethical challenges because HIV-1 transmission is criminalized in many countries (Dawson et al., 2020).

Now, in eLife, Oliver Ratmann at the HIV Transmission Elimination Amsterdam Consortium and colleagues – including Alexandra Blenkinsop as first author – report an innovative approach to overcome the disadvantages of sequence analysis (Blenkinsop et al., 2022). Blenkinsop et al. combined different data sources to reconstruct the transmission histories of HIV-1 in Amsterdam, which has the highest concentration of HIV-1 cases in the Netherlands. Amsterdam is also part of the Fast-Track city network, which provides funds to expand effective HIV prevention, testing and treatment services.

Blenkinsop et al. extended the standard approach of reconstructing transmission histories from HIV-1 sequences by incorporating additional information from clinical biomarkers (biological indicators of disease progression or response to treatment) and other patient data (Figure 1). A statistical model was fitted to two biomarkers: the number of HIV-1 particles circulating in the blood (the viral load) and the number of white blood cells targeted by HIV-1. Based on how these biomarkers changed over time, it was possible to estimate the length of time between a person’s HIV-1 infection and diagnosis. These estimates were then used to infer how many cases were transmitted from people with unsequenced infections, adjusting for factors like route of transmission (e.g., injection drug use), age group, and place of birth.

Estimating the number of unsampled HIV-1 infections.

The top panel illustrates how a chain of HIV-1 infections may be partially sampled over time. The top dashed line shows an infection (represented by the virus particle symbol) that is transmitted (red arrow) before it is sequenced (DNA symbol), with the time between the infection occurring and sequencing taking place indicated by the two-headed arrow. The dashed line in the centre shows an infection resulting from transmission from the first infection, which is transmitted (red arrow) but never sequenced. The dashed line on the bottom represents a third infection resulting from the second infection, that is sequenced (DNA symbol) more quickly than the original infection. The bottom panel depicts two phylogenetic trees. The first tree (green) is inferred from the available sequences (in this case, the two infections sequenced in the top panel). By fitting a statistical model to HIV-1 cases with estimated dates of infection and clinical data, the number of unsampled infections (‘missing links’) in the new tree (red) can be extrapolated for different populations.

Despite extensive measures to curb the transmission of HIV in Amsterdam, results from Blenkinsop et al. suggest that many HIV-1 infections have remained undiagnosed for a long time, especially among heterosexual residents and recent arrivals from sub-Saharan Africa. Further, they provide evidence of ongoing HIV-1 transmission within the city over the duration of the five-year study. These results suggest that, while Amsterdam has made significant progress in reducing the spread of HIV-1, closing the final gap to end the local epidemic by 2030 remains a challenge.

The study also highlights the importance of linking HIV-1 sequences to both clinical and demographic information to determine which groups have been neglected by the generalized scale-up of public health testing and treatment. This may also be a critical step for other cities in the FastTrack initiative. Furthermore, the work of Blenkinsop et al. mirrors ongoing challenges in tracking and controlling other infectious diseases like COVID-19, which is characterised by an abundance of viral genome sequences but a lack of linked contextual information, including clinical outcomes, travel histories and sampling strategies (Chiara et al., 2021; Chen et al., 2022).


Article and author information

Author details

  1. Erin Brintnell

    Erin Brintnell is in the Department of Pathology and Laboratory Medicine, Western University, London, Canada

    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-5042-7799
  2. Art Poon

    Art Poon in the Department of Pathology and Laboratory Medicine, the Department of Microbiology and Immunology, and the Department of Computer Science, Western University, London, Canada

    For correspondence
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-3779-154X

Publication history

  1. Version of Record published: September 30, 2022 (version 1)


© 2022, Brintnell and Poon

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.


  • 268
    Page views
  • 40
  • 0

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Erin Brintnell
  2. Art Poon
Transmission Histories: Traversing missing links in the spread of HIV
eLife 11:e82610.

Further reading

    1. Biochemistry and Chemical Biology
    2. Epidemiology and Global Health
    Takashi Sasaki, Yoshinori Nishimoto ... Yasumichi Arai
    Research Article

    Background: High levels of circulating adiponectin are associated with increased insulin sensitivity, low prevalence of diabetes, and low body mass index (BMI); however, high levels of circulating adiponectin are also associated with increased mortality in the 60-70 age group. In this study, we aimed to clarify factors associated with circulating high-molecular-weight (cHMW) adiponectin levels and their association with mortality in the very old (85-89 years old) and centenarians.

    Methods: The study included 812 (women: 84.4%) for centenarians and 1,498 (women: 51.7%) for the very old. The genomic DNA sequence data were obtained by whole genome sequencing or DNA microarray-imputation methods. LASSO and multivariate regression analyses were used to evaluate cHMW adiponectin characteristics and associated factors. All-cause mortality was analyzed in three quantile groups of cHMW adiponectin levels using Cox regression.

    Results: The cHMW adiponectin levels were increased significantly beyond 100 years of age, were negatively associated with diabetes prevalence, and were associated with SNVs in CDH13 (p = 2.21 × 10-22) and ADIPOQ (p = 5.72 × 10-7). Multivariate regression analysis revealed that genetic variants, BMI, and high-density lipoprotein cholesterol (HDLC) were the main factors associated with cHMW adiponectin levels in the very old, whereas the BMI showed no association in centenarians. The hazard ratios for all-cause mortality in the intermediate and high cHMW adiponectin groups in very old men were significantly higher rather than those for all-cause mortality in the low level cHMW adiponectin group, even after adjustment with BMI. In contrast, the hazard ratios for all-cause mortality were significantly higher for high cHMW adiponectin groups in very old women, but were not significant after adjustment with BMI.

    Conclusions: cHMW adiponectin levels increased with age until centenarians, and the contribution of known major factors associated with cHMW adiponectin levels, including BMI and HDLC, varies with age, suggesting that its physiological significance also varies with age in the oldest old.

    Funding: This study was supported by grants from the Ministry of Health, Welfare, and Labour for the Scientific Research Projects for Longevity; a Grant-in-Aid for Scientific Research (No 21590775, 24590898, 15KT0009, 18H03055, 20K20409, 20K07792, 23H03337) from the Japan Society for the Promotion of Science; Keio University Global Research Institute (KGRI), Kanagawa Institute of Industrial Science and Technology (KISTEC), Japan Science and Technology Agency (JST) Research Complex Program 'Tonomachi Research Complex' Wellbeing Research Campus: Creating new values through technological and social innovation (JP15667051), the Program for an Integrated Database of Clinical and Genomic Information from the Japan Agency for Medical Research and Development (No. 16kk0205009h001, 17jm0210051h0001, 19dk0207045h0001); the medical-welfare-food-agriculture collaborative consortium project from the Japan Ministry of Agriculture, Forestry, and Fisheries; and the Biobank Japan Program from the Ministry of Education, Culture, Sports, and Technology.

    1. Epidemiology and Global Health
    Charumathi Sabanayagam, Feng He ... Ching Yu Cheng
    Research Article Updated


    Machine learning (ML) techniques improve disease prediction by identifying the most relevant features in multidimensional data. We compared the accuracy of ML algorithms for predicting incident diabetic kidney disease (DKD).


    We utilized longitudinal data from 1365 Chinese, Malay, and Indian participants aged 40–80 y with diabetes but free of DKD who participated in the baseline and 6-year follow-up visit of the Singapore Epidemiology of Eye Diseases Study (2004–2017). Incident DKD (11.9%) was defined as an estimated glomerular filtration rate (eGFR) <60 mL/min/1.73 m2 with at least 25% decrease in eGFR at follow-up from baseline. A total of 339 features, including participant characteristics, retinal imaging, and genetic and blood metabolites, were used as predictors. Performances of several ML models were compared to each other and to logistic regression (LR) model based on established features of DKD (age, sex, ethnicity, duration of diabetes, systolic blood pressure, HbA1c, and body mass index) using area under the receiver operating characteristic curve (AUC).


    ML model Elastic Net (EN) had the best AUC (95% CI) of 0.851 (0.847–0.856), which was 7.0% relatively higher than by LR 0.795 (0.790–0.801). Sensitivity and specificity of EN were 88.2 and 65.9% vs. 73.0 and 72.8% by LR. The top 15 predictors included age, ethnicity, antidiabetic medication, hypertension, diabetic retinopathy, systolic blood pressure, HbA1c, eGFR, and metabolites related to lipids, lipoproteins, fatty acids, and ketone bodies.


    Our results showed that ML, together with feature selection, improves prediction accuracy of DKD risk in an asymptomatic stable population and identifies novel risk factors, including metabolites.


    This study was supported by the National Medical Research Council, NMRC/OFLCG/001/2017 and NMRC/HCSAINV/MOH-001019-00. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.