1. Epidemiology and Global Health
  2. Microbiology and Infectious Disease
Download icon

Assessing the danger of self-sustained HIV epidemics in heterosexuals by population based phylogenetic cluster analysis

  1. Teja Turk
  2. Nadine Bachmann
  3. Claus Kadelka
  4. Jürg Böni
  5. Sabine Yerly
  6. Vincent Aubert
  7. Thomas Klimkait
  8. Manuel Battegay
  9. Enos Bernasconi
  10. Alexandra Calmy
  11. Matthias Cavassini
  12. Hansjakob Furrer
  13. Matthias Hoffmann
  14. Huldrych F Günthard
  15. Roger D Kouyos  Is a corresponding author
  16. Swiss HIV Cohort Study
  1. University Hospital Zurich, Switzerland
  2. University of Zurich, Switzerland
  3. Geneva University Hospitals, Switzerland
  4. University Hospital Lausanne, Switzerland
  5. University of Basel, Switzerland
  6. University Hospital Basel, Switzerland
  7. Regional Hospital Lugano, Switzerland
  8. Lausanne University Hospital, Switzerland
  9. Bern University Hospital, University of Bern, Switzerland
  10. Cantonal Hospital St. Gallen, Switzerland
Research Article
Cite this article as: eLife 2017;6:e28721 doi: 10.7554/eLife.28721
16 figures, 5 tables and 1 additional file

Figures

Overall basic reproductive number R0 and R0 per subtype from stratified analysis.

The dark gray point indicates the overall basic reproductive number R0 estimate (by neglecting the transmission chain subtypes) and the corresponding 95%-confidence interval is shown with the dark gray line and the gray-shaded band. The analogous results from the per-subtype stratified analysis are represented by colored points and lines, each color corresponding to one of the subtypes (B, C, CRF01_AE, CRF02_AG or A) or the group of subtypes (other).

https://doi.org/10.7554/eLife.28721.004
Time trends for R0.

The upper smaller panels show the time trends for R0 from the subtype-stratified analyses, in which the log(R0)’s were modeled as linear functions of establishment date (i.e., for each subtype the time trend rate was assumed to be constant). The colored shaded-bands correspond to the 95%-prediction bands. The (best-fitting) nonlinear time trend for R0 from the overall analysis is displayed in the lower panel (dark gray curve) together with the 95%-prediction band (gray-shaded area). The black points represent the R0 estimates from the per establishment year stratified analyses and the gray vertical lines the corresponding 95%-confidence intervals.

https://doi.org/10.7554/eLife.28721.005
Effect of different factors on the basic reproductive number R0 from the multivariate model with only linear factor terms.

The black square and the black line show the reference basic reproductive number R0 and its 95%-confidence interval (for a transmission chain of subtype B which started on 1.1.1996, and in which the index case was diagnosed 3 years after the infection, was 32 years old upon infection, never reported on having sex with occasional partner and had the earliest CD4 cell count of 350 cells per μL). The vertical gray line separates the factors associated with lower R0 (left; effect factor <1) and from the factors contributing to higher R0 (right; effect factor >1). The black points on this line refer to the reference transmission chain. The colored and dark gray lines represent the effect sizes from multivariate model (black circles depicting the estimates) for different factors and their 95%-confidence intervals. The corresponding p-values are shown in the rightmost column. FUP, follow-up visit.

https://doi.org/10.7554/eLife.28721.007
Final multivariate model’s profile plots of factors associated with the basic reproductive number R0.

The vertical dotted lines depict the reference transmission chain (of subtype B, started on 1.1.1996, in which the observed index case did not report having sex with occasional partner and was diagnosed after 3 years after the infection). The left y-axis represents the basic reproductive number whereas the right y-axis corresponds to the relative values of R0 as compared to the baseline R0. The R0 as the function of specific factor (with the other factors held fixed at the reference value) is displayed by the colored (for HIV-1 subtype) and the dark gray (establishment date, sexual risk behavior and time to diagnosis) lines. The vertical bars and the shaded bands, respectively, correspond to the 95%-confidence intervals.

https://doi.org/10.7554/eLife.28721.008
Graphical representation of our phylogeny-based statistical approach.

(i): HIV transmission among heterosexuals in Switzerland (white arrow) has never led to a self-sustained epidemic. However, the unknown potential of imported infections (black arrows) either from abroad or from other transmission groups in Switzerland remains a large concern. (ii): The HIV transmission chains corresponding to Swiss heterosexuals (depicted in red) were identified from the phylogenetic tree containing the SHCS and background viral sequences. (iii): Our mathematical model is based on the discrete-time branching process with nodes of three different types: sampled Swiss infection (red), unsampled Swiss infection (light red) and foreign infection infected by a Swiss index case before moving to Switzerland (green). (iv): Our method for inferring R0 accounts for both imperfect sampling and modified transmission potential of the index case. (v): Moreover, it includes the baseline transmission chain characteristics to assess the determinants of R0.

https://doi.org/10.7554/eLife.28721.009
Appendix 1—figure 1
Sensitivity analysis regarding the index case relative transmission potential.

Panel (i) shows the sensitivity of the R0 estimates from baseline model and panel (ii) the sensitivity of the time trend factor. The colored lines represent the subtype-stratified analyses, while the results from the overall models are shown in gray. In the first sensitivity analysis, the ρindex of Swiss-originating transmission chains was held at 1 and the ρindex of non-Swiss origin varied (solid lines). In the second analysis, the ρindex of Swiss and non-Swiss origin was the same (dashed lines). The dotted lines show the results from the sensitivity subanalysis including only the transmission chains of non-Swiss origin. The vertical and horizontal lines depict the parameters and estimates from the main analysis, respectively.

https://doi.org/10.7554/eLife.28721.012
Appendix 1—figure 2
Sensitivity analysis regarding the sampling density.

The index case relative transmission potential parameter ρindex was the same as used in the main analyses, while the sampling densities varied (x-axis). In the pooled analysis (larger plots) the sampling density was the same for all transmission chains. Panel (i) shows the corresponding estimates of the basic reproductive number R0 and the time trend factor estimates are displayed in panel (ii). The dotted vertical lines depict the sampling densities used for each subtype in our study (subtype-stratified plots) and the mean sampling density over all transmission chains (overall plots). The horizontal dotted lines represent the estimates from the main analysis.

https://doi.org/10.7554/eLife.28721.013
Appendix 1—figure 3
Conservative (with respect to ongoing transmission) maximum number of completed transmission degrees by a given date.

The red lines show the date (y-axis) by which at least a certain number (red numbers) of transmission degrees have been completed for a transmission chain with a specific establishment date (x-axis). The diagonal dotted gray lines depict the number of years since the establishment date, and the horizontal blue line represents the last sampling date.

https://doi.org/10.7554/eLife.28721.014
Appendix 1—figure 4
Relative bias due to ongoing transmission.

The upper panel shows the relative bias of the basic reproductive number R0 from the baseline model and the lower panel the relative bias of the linear time trend factor from the corresponding generalized linear model. The proportion of active transmission chains over time is represented by the black line. The relative bias associated with overestimation and underestimation is displayed with green and red bars-points, respectively. Absence of bias is depicted by the horizontal gray lines.

https://doi.org/10.7554/eLife.28721.015
Appendix 1—figure 5
Sensitivity analysis regarding the stuttering transmission chains assumption.

The Q-Q plots compare the hypothetical transmission chain size distributions (y-axis showing their empirical permilles) with the transmission chain size distribution (empirical permilles on the x-axis) inferred from the phylogeny. The upper left plot compares the distribution of the simulated transmission chain sizes based on the estimated R0 with the (from the phylogeny) observed transmission chain sizes and thus verifies the R0 estimate. The remaining plots compare the simulated transmission chain size distributions against the extracted transmission chain sizes for R0 closer to 1 to justify the subcritical transmission assumption. Each point represents a permille, hence the darker points indicate more overlapping permilles.

https://doi.org/10.7554/eLife.28721.016
Appendix 1—figure 6
Comparison of effect sizes in the multivariate model with linear terms only for different sexual risk behavior definitions of a transmission chain.

The thick lines with black circles show the original effect sizes (where the index case determined the sexual risk behavior of the transmission chain) and their 95%-confidence intervals. The empirical distribution of the effect sizes where a random individual in a transmission chain determines its sexual risk behavior is displayed by the shaded areas. The thinner horizontal double sided arrows with the filled circles correspond to the effect sizes and their 95%-confidence intervals for the transmission chain level fraction of follow-up visits (FUPs) with reported sex with occasional partner by any of the infected individuals from the transmission chain. The vertical dotted gray line depicts the reference R0 from the original model, i.e., using the index case to define the sexual risk behavior.

https://doi.org/10.7554/eLife.28721.017
Appendix 1—figure 7
Comparison between the Poisson and the negative binomial offspring distribution baseline model R0 estimates.

The dark gray and colored lines show the estimates from the model with Poisson offspring distribution, while the black lines correspond to the negative binomial distribution. The index case relative transmission potential parameter ρindex was fixed to 1 and the sampling density (x-axis) varied. In the overall analysis the sampling density was the same for all transmission chains regardless of their subtype. The vertical gray lines depict the sampling densities used for each subtype in our study (above panels) and the mean sampling density in the overall analysis (bottom panel).

https://doi.org/10.7554/eLife.28721.018
Appendix 1—figure 8
Sensitivity analysis regarding the transmission cluster definition.

The upper panel (i) compares the estimated R0 with the original cluster definition (brighter lines) with the R0 estimated based on the relaxed cluster definition (darker lines) from the overall analysis (in gray) and subtype-stratified analyses (in colors). Similarly, the bottom panel (ii) shows the comparison between the estimated time trend factors obtained from the transmission chain sizes based on different cluster definition thresholds.

https://doi.org/10.7554/eLife.28721.019
Appendix 1—figure 9
Subanalysis for the transmission chains with available follow-up information about sex with occasional partner of the index case compared to the main analysis with imputed data.

The effect sizes from the subanalysis are shown in brighter colors and those from the main analysis in dark. In the main analysis, the missing data were replaced by never reporting sex with an occasional partner.

https://doi.org/10.7554/eLife.28721.020
Appendix 1—figure 10
Empirical distribution of maximum likelihood (ML) estimator and the Wald-type confidence intervals (CI) coverage rates.

Each plot represents a single parameter from a single model (see Appendix 1—table 1 for the parameters overview including their values), where the number in the lower left corner denotes the parameter’s consecutive parameter number. The light gray-shaded area represents the proportion of the Wald-type 95%-CIs from the parametric bootstrap simulations which contained the true value (depicted by the vertical orange line), while the green-shaded area corresponds to those CIs from the simulations that missed the true value. The numbers in the upper left corners are the coverage rates from the parametric bootstrap. The original Wald 95%-CIs used in our study are displayed with the light orange-area. The dark blue and gray lines show the empirical distribution of ML estimators from the parametric bootstrap samples and the normal approximation based probability density function, respectively. The horizontal red lines depict the target coverage rate of 95%.

https://doi.org/10.7554/eLife.28721.022
Appendix 1—figure 11
Comparison of different types of 95%-confidence intervals (CI) with the normal approximation based Wald-type 95%-CIs.

Each column corresponds to a different type of CIs, namely the profile likelihood based CIs, the basic nonparametric bootstrap CIs and the basic parametric bootstrap CIs. Each row represents a single parameter (the overview of the parameters is provided in Appendix 1—table 1). The colorful lines show the specific CIs compared to the corresponding Wald-type CIs, namely their relative widths and positions. The gray-shaded areas represent the Wald-type 95%-CIs.

https://doi.org/10.7554/eLife.28721.023

Tables

Table 1
Transmission chain size distribution and model parameters.
https://doi.org/10.7554/eLife.28721.003
SubtypeOverall
BC01_AE02_AGAOther
Total number of chains, n (%)1643
(53%)
322
(10%)
239
(7.7%)
331
(11%)
327
(11%)
238
(7.7%)
3100
(100%)
Chain size, n (%)
 11437
(87%)
280
(87%)
206
(86%)
272
(82%)
269
(82%)
195
(82%)
2659
(86%)
 2158
(9.6%)
34
(11%)
31
(13%)
40
(12%)
44
(13%)
36
(15%)
343
(11%)
 330
(1.8%)
7
(2.2%)
1
(0.42%)
10
(3.0%)
10
(3.1%)
6
(2.5%)
64
(2.1%)
 412
(0.73%)
-1
(0.42%)
6
(1.8%)
3
(0.92%)
1
(0.42%)
23
(0.74%)
 51
(0.06%)
1
(0.31%)
-2
(0.6%)
1
(0.31%)
-5
(0.16%)
 61
(0.06%)
--1
(0.3%)
--2
(0.06%)
 71
(0.06%)
-----1
(0.03%)
 82
(0.12%)
-----2
(0.06%)
 91
(0.06%)
-----1
(0.03%)
Sampling probability, p (SD)0.390.290.340.260.330.290.35
(0.05)
Chain origin, n (%)
 Swiss (ρindex=1)948
(58%)
36
(11%)
36
(15%)
36
(11%)
47
(14%)
30
(13%)
1133
(37%)
 non-Swiss (ρindex=0.35)695
(42%)
286
(89%)
203
(85%)
295
(89%)
280
(86%)
208
(87%)
1967
(63%)
Table 2
Patients’ demographic characteristics.
https://doi.org/10.7554/eLife.28721.006
PatientsTransmission chains
Index case
Total number, n36983100
Age at estimated date of infection [in years], median (IQR)29.2 (23.1—37.8)28.8 (22.8—37.4)
Estimated date of infection, median (IQR)Jun 1996 (Sep 1990—Nov 2001)Nov 1995 (Sep 1989—May 2001)
Time to diagnosis [in years], median (IQR)3.40 (1.66—5.24)3.54 (1.78—5.43)
Reported sex with occasional partner [as fraction of FUPs*], median (IQR)0.53 (0.09—0.89)0.50 (0.07—0.88)
 No available FUP, n (%)250 (6.8%)226 (7.3%)
Earliest CD4 count [per μL], median (IQR)310 (143—510)300 (134—507)
  1. *Follow-up visit (FUP).

    Patients without FUP questionnaire regarding the sexual risk behavior. See Sensitivity analyses.

  2. One patient did not have any available CD4 cell count. The missing value was imputed with the mean CD4 cell count.

Appendix 1—table 1
Overview of all the parameters, their estimates and the 95%-confidence intervals fitted in all the models presented in this study.
https://doi.org/10.7554/eLife.28721.021
SubtypesParameter 
number
Parameter 
name
Parameter 
estimate
Wald-type
95%-CI
Profile likelihood
95%-CI
Overall1log(R0)-0.823(-0.876,-0.770)(-0.878,-0.772)
B2log(R0)-1.037(-1.121,-0.952)(-1.124,-0.955)
C3log(R0)-0.719(-0.879,-0.559)(-0.892,-0.571)
01_AE4log(R0)-0.826(-1.036,-0.615)(-1.057,-0.632)
02_AG5log(R0)-0.483(-0.587,-0.378)(-0.594,-0.384)
A6log(R0)-0.618(-0.751,-0.485)(-0.760,-0.492)
other7log(R0)-0.605(-0.758,-0.451)(-0.771,-0.461)
Overall8log(R0,𝑟𝑒𝑓)-0.839(-0.894,-0.784)(-0.895,-0.785)
9

𝐷𝑎𝑡𝑒𝑖𝑛𝑓𝑒𝑐𝑡𝑖𝑜𝑛-1.1.199636510

-0.112(-0.187,-0.037)(-0.188,-0.037)
B10log(R0,𝑟𝑒𝑓)-1.070(-1.165,-0.975)(-1.169,-0.979)
11

𝐷𝑎𝑡𝑒𝑖𝑛𝑓𝑒𝑐𝑡𝑖𝑜𝑛-1.1.199636510

-0.112(-0.234,0.010)(-0.236,0.008)
C12log(R0,𝑟𝑒𝑓)-0.692(-0.851,-0.533)(-0.864,-0.544)
13

𝐷𝑎𝑡𝑒𝑖𝑛𝑓𝑒𝑐𝑡𝑖𝑜𝑛-1.1.199636510

-0.209(-0.466,0.049)(-0.473,0.046)
01_AE14log(R0,𝑟𝑒𝑓)-0.781(-0.991,-0.570)(-1.013,-0.588)
15

𝐷𝑎𝑡𝑒𝑖𝑛𝑓𝑒𝑐𝑡𝑖𝑜𝑛-1.1.199636510

-0.255(-0.616,0.106)(-0.629,0.101)
02_AG16log(R0,𝑟𝑒𝑓)-0.434(-0.539,-0.329)(-0.545,-0.333)
17

𝐷𝑎𝑡𝑒𝑖𝑛𝑓𝑒𝑐𝑡𝑖𝑜𝑛-1.1.199636510

-0.415(-0.609,-0.222)(-0.615,-0.226)
A18log(R0,𝑟𝑒𝑓)-0.725(-0.892,-0.558)(-0.907,-0.571)
19

𝐷𝑎𝑡𝑒𝑖𝑛𝑓𝑒𝑐𝑡𝑖𝑜𝑛-1.1.199636510

-0.430(-0.660,-0.199)(-0.672,-0.209)
other20log(R0,𝑟𝑒𝑓)-0.600(-0.754,-0.446)(-0.767,-0.456)
21

𝐷𝑎𝑡𝑒𝑖𝑛𝑓𝑒𝑐𝑡𝑖𝑜𝑛-1.1.199636510

-0.162(-0.397,0.073)(-0.403,0.072)
Overall22log(R0,𝑟𝑒𝑓)-0.710(-0.780,-0.640)(-0.782,-0.641)
23

(𝐷𝑎𝑡𝑒𝑖𝑛𝑓𝑒𝑐𝑡𝑖𝑜𝑛-1.1.199636510)2

-0.313(-0.451,-0.176)(-0.457,-0.182)
24

(𝐷𝑎𝑡𝑒𝑖𝑛𝑓𝑒𝑐𝑡𝑖𝑜𝑛-1.1.199636510)3

-0.184(-0.283,-0.086)(-0.288,-0.091)
Overall25log(R0,𝑟𝑒𝑓)-1.252(-1.366,-1.137)(-1.369,-1.140)
26𝑆𝑢𝑏𝑡𝑦𝑝𝑒C0.352(0.167,0.538)(0.158,0.531)
27𝑆𝑢𝑏𝑡𝑦𝑝𝑒01_𝐴𝐸0.274(0.046,0.502)(0.029,0.490)
28𝑆𝑢𝑏𝑡𝑦𝑝𝑒02_𝐴𝐺0.575(0.428,0.721)(0.426,0.720)
29𝑆𝑢𝑏𝑡𝑦𝑝𝑒A0.430(0.271,0.588)(0.266,0.584)
30𝑆𝑢𝑏𝑡𝑦𝑝𝑒𝑜𝑡ℎ𝑒𝑟0.426(0.247,0.606)(0.238,0.600)
31

𝐷𝑎𝑡𝑒𝑖𝑛𝑓𝑒𝑐𝑡𝑖𝑜𝑛-1.1.199636510

-0.214(-0.301,-0.127)(-0.301,-0.128)
32𝐴𝑔𝑒-32100.007(-0.045,0.058)(-0.046,0.057)
33

CD4-350100

0.000(-0.018,0.019)(-0.019,0.018)
34𝑅𝑎𝑡𝑒𝑟𝑖𝑠𝑘0.230(0.095,0.364)(0.096,0.365)
35

𝑌𝑒𝑎𝑟𝑠𝑑𝑖𝑎𝑔𝑛𝑜𝑠𝑖𝑠-310

0.351(0.210,0.492)(0.207,0.490)
Overall36

log(R0,𝑟𝑒𝑓)

-1.173(-1.301,-1.045)(-1.304,-1.048)
37

110log(𝑌𝑒𝑎𝑟𝑠𝑑𝑖𝑎𝑔𝑛𝑜𝑠𝑖𝑠3)

1.727(1.049,2.405)(1.064,2.420)
38𝑆𝑢𝑏𝑡𝑦𝑝𝑒C0.322(0.140,0.505)(0.131,0.498)
39𝑆𝑢𝑏𝑡𝑦𝑝𝑒01_𝐴𝐸0.246(0.020,0.472)(0.004,0.460)
40𝑆𝑢𝑏𝑡𝑦𝑝𝑒02_𝐴𝐺0.516(0.374,0.659)(0.372,0.658)
41𝑆𝑢𝑏𝑡𝑦𝑝𝑒A0.404(0.246,0.562)(0.241,0.558)
42𝑆𝑢𝑏𝑡𝑦𝑝𝑒𝑜𝑡ℎ𝑒𝑟0.401(0.223,0.580)(0.214,0.574)
43

(𝐷𝑎𝑡𝑒𝑖𝑛𝑓𝑒𝑐𝑡𝑖𝑜𝑛-1.1.199636510)3

-0.231(-0.337,-0.124)(-0.345,-0.131)
44

𝑅𝑎𝑡𝑒𝑟𝑖𝑠𝑘

0.230(0.094,0.366)(0.096,0.368)
45

(𝐷𝑎𝑡𝑒𝑖𝑛𝑓𝑒𝑐𝑡𝑖𝑜𝑛-1.1.199636510)4

-0.129(-0.227,-0.031)(-0.235,-0.038)
Appendix 2—table 1
Establishment date models obtained with the AIC/BIC forward selection and backward elimination and their respective AIC and BIC values as well as the p-values from the likelihood ratio test compared to the null model without any covariates.

Terms that were part of the respective final model are marked by ×.

https://doi.org/10.7554/eLife.28721.025
AICBIC
ForwardBackwardForwardBackward

Dateinfection1.1.199636510

(Dateinfection1.1.199636510)2

××

(Dateinfection1.1.199636510)3

××××

(Dateinfection1.1.199636510)4

××
AIC3364.33364.23364.33364.2
BIC3382.43382.33382.43382.3
p-value from LR test<0.0001<0.0001<0.0001<0.0001
Appendix 2—table 2
Multivariate models obtained with the AIC/BIC forward selection and backward elimination algorithms.

The terms listed in the table are the terms identified from the single determinant model selections and the crosses indicate the terms entering the multivariate models. The null model from the likelihood ratio test refers to the baseline model without any covariates (not even the subtype).

https://doi.org/10.7554/eLife.28721.026
AICBIC
ForwardBackwardForwardBackward

Subtype

××××

(Dateinfection1.1.199636510)2

(Dateinfection1.1.199636510)3

××××

(Dateinfection1.1.199636510)4

×××

𝑅𝑎𝑡𝑒𝑟𝑖𝑠𝑘

××××

Raterisk

×××

110log(Yearsdiagnosis3)

××

Yearsdiagnosis310

××

Yearsdiagnosis310

×××

(Yearsdiagnosis310)3

××

CD435010

(Age3210)2
AIC3254325232543262
BIC3314333133143316
p-value from LR test<0.0001<0.0001<0.0001<0.0001

Additional files

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)