Figures and data

Demographics of the individuals in the 355 probable transmission pairs.
Age is calculated at the estimated time at which the source infected the recipient. Numbers are given as count and proportion (column-wise) for discrete variables and mean and standard deviation for continuous variables.

Phylogenetic analysis and the IBM model show similar age distributions for sources and recipients.
Histograms of the proportions of transmissions from and to different age groups in the IBM results (top) and the phylogenetics (bottom). Values from the IBM are the posterior mean proportion of transmissions with the 95% highest density interval displayed. As every transmission has a source and a recipient, proportions sum to 1 across the two source histograms (left, light colours) and also to 1 across the two recipient histograms (right, dark colours).

Summary of ages, and age gaps, in transmission pairs identified using both methods.
The IBM results are from the best-fitting ABC simulation.

Male partners are on average older in both male-to-female and female-to-male pairs.
Distributions of age gaps within transmission pairs from the phylogenetics (yellow, histogram) and the IBM (blue, kernel density estimate). For the IBM, the 95% highest posterior density intervals are given by the shaded area. Female-to-male transmission pairs are displayed on the left and male-to-female on the right. The dashed line indicates equal age.

Transmissions are most common in young couples.
Age-based who-acquires-infection-from-whom (WAIFW) matrix for modelled transmissions occurring in 2016. Values are the mean, over the entire year, of the per-year transmission rates from infected individuals in one age group to individuals in another (given by the number of transmissions divided by the size of the source population).

Transmissions to a younger cohort take place predominantly in male-to-female transmissions.
Alluvial plots for the transmission history, derived from the IBM, of a cohort of males (panel a) and females (panel b) infected with HIV in 2016. That cohort is G0, shown on the right; to the left of it are the three previous generations of transmission (indicated as G1 to G3) that gave rise to the infections of G0. Male generations are blue and female red. The partitioning within each vertical bar indicates its age distribution. The age distributions of G1 to G3 are shown at both the time of their own infection (left) and the time at which they infected an individual in a subsequent generation (right). Movement to an older age group (by ageing or transmission) is shown by a green alluvium, to a younger group by an orange alluvium, and to the same age group by a grey alluvium If an individual was responsible for two or more infections in the subsequent generation, then they still contribute a total of 1 individual to their “age at onwards transmission” bar, which is split proportionally between age bands depending on the number of infections they were responsible for while they were a member of each band. (For example, if individual A infected two individuals while A was aged 20–24 and A infected one while they were aged 25–29, A would contribute ⅔ of an individual to the 20–24 bar and ⅓ to the 25–29 bar.) Note that the members of G0 are all members of a particular cohort who were infected with HIV, butall other generations are composed solely of people who transmit HIV, a somewhat different group.

The age gap increases with recipient age in female-to-male pairs and decreases in male-to-female pairs.
Relationship between recipient age and source age (a, c) and age gap (b, d) from the IBM (a, b) and the phylogenetics (c, d). The dashed line represents equal ages. The IBM results are from the “best-fitting” simulation (that with the minimum ABC distance) and represent the median value (solid black line) and quantiles (orange shaded areas) of the y-axis variable in 1-year increments. The individual values from the phylogenetics pairs are plotted as green points, with a regression line fitted using quadratic polynomial regression and the 95% confidence region given by the pale green area.

Little change in age gaps over the period of ART rollout.
Time trends in age of sources (a), age gaps (b) and proportion of the population on ART (c) over time. Estimates are from the best-fitting IBM simulation. The black line in (a) and (b) is the median as calculated per year, with the shaded area representing the interquartile range.


The impact of an intervention focussing on younger people.
Projected time trends in HIV incidence (A) and population proportion of people with unsuppressed HIV (B), 2018–2041, in the Zambian control communities of the PopART trial, under various scenarios of future PopART intervention rollout. Posterior median estimates are calculated in two year time periods for the whole population (top), females (middle) and males (bottom). The four scenarios are, firstly, no CHiPS deployment at all (red), provision of the full PopART intervention to the whole population (purple), PopART intervention targeting only those aged under 35 (grey), and PopART intervention targeting only men aged under 35 (green). Vertical bars indicate the 95% highest posterior density interval.

The impact of ART interruption.
A: Posterior median new infections by year, by sex and age. B: The age distribution of new transmissions by year in the circumstances of a complete cessation of ART provision during 2025, returning in 2026. Lines chart the posterior median proportion of new infections for sources and recipients of each integer age (top and middle), and age gap rounded to the nearest integer (bottom). In both cases, lines are coloured by year.

Empirical distributions of mean (a) and variance (b) of ages of recipients at time of transmission from pairs where “recipients” had randomly chosen “sources” (histograms) compared to the true value from the phylogenetics data (green line).

Empirical distributions of mean estimated time from infection to sampling of recipients from randomised transmission pairs (histograms) compared to the true value from the phylogenetics data (green line).

Empirical distributions of mean (a) and variance (b) of ages of sources at time of transmission from randomised transmission pairs (histograms) compared to the true value from the phylogenetics data (green line).

Empirical distributions of mean estimated time from infection to sampling of sources from randomised transmission pairs (histograms) compared to the true value from the phylogenetics data (green line).

Empirical distributions of mean (a) and variance (b) in transmission age gap from randomised transmission pairs (histograms) compared to the true value from the phylogenetics data (green line).

Distributions of the estimated ages of sources at the time of transmission (top), estimated ages of recipients at the time of transmission (middle) and age gap amongst probable transmission pairs identified using phyloscanner when the read downsampling procedure is varied.
Downsampling to a maximum of n reads means that a random sampling of n reads is taken for each individual in each genomic window. For windows in which n reads do not exist, the two options are “blacklist” - to exclude that host from that window entirely - and “keep” - to use all the reads that are available. The boxplots display the median, 25% and 75% quantiles, and range.

Distributions of the estimated ages of sources at the time of transmission (top), estimated ages of recipients at the time of transmission (middle) and age gap amongst probable transmission pairs identified using phyloscanner with different genomic window widths.
The boxplots display the median, 75% quantiles, and range.

Distributions of the estimated ages of sources at the time of transmission (top), estimated ages of recipients at the time of transmission (middle) and age gap amongst probable transmission pairs identified using phyloscanner.
Different window-related thresholds for identifying linkage and direction are indicated on the x axis in the format X/Y, where Y is the minimum fraction of genomic windows supporting two samples being from a transmission pair, and X is the minimum fraction of genomic windows supporting transmission having an identifiable direction (conditional on being a transmission pair). The boxplots display the median, 75% quantiles, and range.


Number of phylogenetic transmission pairs identified using different patristic distance thresholds.

Distributions of the estimated ages of sources at the time of transmission (top), estimated ages of recipients at the time of transmission (middle) and age gap amongst probable transmission pairs identified using phyloscanner with patristic distance thresholds used to identify transmission pairs in an individual window.
The boxplots display the median, 75% quantiles, and range.