Human genetic ancestry, Mycobacterium tuberculosis diversity, and tuberculosis disease severity in Dar es Salaam, Tanzania

  1. Michaela Zwyer
  2. Zhi Ming Xu
  3. Amanda Ross
  4. Jerry Hella
  5. Mohamed Sasamalo
  6. Maxime Rotival
  7. Hellen Charles Hiza
  8. Liliana K Rutaihwa
  9. Sonia Borrell
  10. Klaus Reither
  11. Jacques Fellay
  12. Damien Portevin
  13. Lluis Quintana-Murci
  14. Sebastien Gagneux  Is a corresponding author
  15. Daniela Brites  Is a corresponding author
  1. Swiss Tropical and Public Health Institute, Switzerland
  2. University of Basel, Switzerland
  3. Swiss Institute of Bioinformatics, Switzerland
  4. School of Life Sciences, École Polytechnique Fédérale de Lausanne, Switzerland
  5. Department of Intervention and Clinical Trials, Ifakara Health Institute, United Republic of Tanzania
  6. Human Evolutionary Genetics Unit, Institut Pasteur, Université Paris Cité, France
  7. FIND, Foundation for Innovative New Diagnostics, Switzerland
  8. Precision Medicine Unit, Lausanne University Hospital and University of Lausanne, Switzerland
  9. Chair of Human Genomics and Evolution, Collège de France, France
4 figures, 3 tables and 3 additional files

Figures

Figure 1 with 5 supplements
Genetic ancestry analyses of Tanzanian TB patients.

(A) Genetic ancestry proportions of the 1444 Tanzanian TB patients and representative human populations who shared at least 1% of their most common genetic ancestry with the Tanzanians for K = 24 (ESN: Esan from Nigeria (1000G), LWK: Luhya from Kenya (1000G)). For all populations included in our study, see Figure 1—figure supplement 1 for their geographic distribution and Figure 1—figure supplement 5 for the ancestry composition of all African populations included in this study. (B) The geographical location of the representative populations shown in A is depicted with black circles, and the corresponding country is highlighted. The remaining African populations included in the analysis are represented by blue circles.

Figure 1—figure supplement 1
Populations included in the admixture analysis.
Figure 1—figure supplement 2
The first two principal component analysis (PCA) components for all African populations included in this study (n = 116).
Figure 1—figure supplement 3
Boxplots of cross-validation errors for values of K between 2 and 29 resulting from 15 runs.

The lowest cross-validation error was obtained for K = 24.

Figure 1—figure supplement 4
Boxplots of the proportions of the 24 genetic ancestries among the Tanzanian TB patients.

The ancestries were named after the population(s) where they were most abundant.

Figure 1—figure supplement 5
Ancestry plots for all African populations included (K = 24).
Figure 2 with 1 supplement
Spatial visualizations of the Bantu-speaking (BS) genetic ancestries and the genetic ancestries of the different self-identified ethnic groups among the TB patients in Tanzania.

The genetic ancestry was inferred by admixture with K = 24, and the interpolation of the ancestries was performed by using the pykrige module in Python (see methods). (A) eBS genetic ancestry, (B) seBS genetic ancestry, and (C) wBS genetic ancestry. The populations included for spatial interpolations are marked with a black dot on the maps. The maps were created using the basemap module in Python. (D) Geographical origin of the ethnic groups among our TB patient cohort. The Temeke District hospital in Dar es Salaam where the patients were recruited is marked with a red point. Note that for some ethnic groups, no geographical origin could be identified (Supplementary file 1). (E) Ancestry plots for the different ethnic groups with at least 10 patients from our TB patient cohort.

Figure 2—figure supplement 1
Heatmap showing the correlations between the genetic ancestries and geographical location.

(A) Correlations at the country level of Tanzania between the latitude and longitude of the ethnic group the patients belong to. (B) Correlations on a continental level including all African populations. For the latitude, a negative correlation indicates higher genetic ancestry proportions toward the South, and for the longitude, a negative correlation indicates lower genetic ancestry proportions toward the East.

Figure 3 with 1 supplement
Manhattan plot for genome-wide association study (GWAS) conducted using (A–C) TB-score, X-ray score, and Ct-value.

Red line indicates GWAS significance threshold of 5e−8.

Figure 3—figure supplement 1
QQ plot and genomic inflation factor for genome-wide association study (GWAS) conducted using (A–C) TB-score, X-ray score, and Ct-value.
Appendix 1—figure 1
Association between the Bantu genetic ancestries and TB-score and X-ray score for each of the most successful introductions.

(A, C, E) Southeastern, Eastern, and Western Bantu on mild or severe TB-score, (B, D, F) Southeastern, Eastern, and Western Bantu genetic ancestry lung damage assessed by lung X-ray.

Tables

Table 1
Human and bacterial genotypes by the severity measures.
TB-scoreLung damage (X-ray score)Bacterial load (Ct-value)
LevelsTotal
N (%)
Missing
N (%)
Severe
N (%)
Mild
N (%)
Total NMissing NSevereMildNot availableTotal NMissing NMean (SD)
Total N (%)624 (33)1280 (67)177 (9)849 (45)878 (46)
MTBC genotypeOther1471 (77.3)433207 (42)406 (41)764 (74.5)26251 (39)269 (43)293 (41)863 (78.3)23919.4 (4.9)
L2.2.1 – Intro 122 (4)49 (5)7 (5)28 (4)36 (5)20.5 (5.1)
L3.1.1 – Intro 10184 (38)388 (40)59 (45)239 (38)274 (39)19.1 (4.7)
L4.3.4 – Intro 526 (5)60 (6)4 (3)38 (6)44 (6)19.5 (4.1)
L1.1.2 – Intro 951 (10)78 (8)11 (8)58 (9)60 (8)19.1 (4.7)
seBS BantuMean (SD)1442 (75.7)4620.44 (0.2)0.44 (0.2)840 (81.9)1860.45 (0.1)0.43 (0.2)0.45 (0.2)810 (73.5)29219.9 (5.2)
eBSMean (SD)1442 (75.7)4620.23 (0.1)0.22 (0.1)840 (81.9)1860.22 (0.1)0.23 (0.1)0.22 (0.1)810 (73.5)29219.9 (5.2)
wBSMean (SD)1442 (75.7)4620.08 (0.1)0.09 (0.1)840 (81.9)1860.08 (0.1)0.09 (0.1)0.08 (0.1)810 (73.5)29219.9 (5.2)
Other ancestryMean (SD)1442 (75.7)4620.25 (0.1)0.25 (0.1)840 (81.9)1860.25 (0.1)0.25 (0.1)0.25 (0.1)810 (73.5)29219.9 (5.2)
Table 2
Characteristics of MTBC genotypes for all patients with either human or bacterial genetic data available.
N (%)*Missing NLevelsOther genotypeL2.2.1 – Intro 1L3.1.1 – Intro 10L4.3.4 – Intro 5L1.1.2 – Intro 9No bacterial data available
Total N (%)613 (32)71 (4)572 (30)86 (5)129 (7)433 (23)
Sex1471 (100.0)0Male (%)424 (69)52 (73)425 (74)55 (64)87 (67)302 (70)
Female (%)189 (31)19 (27)147 (26)31 (36)42 (33)131 (30)
Age in years1471 (100.0)0Median (IQR)33.0 (28.0–40.0)31.0 (24.5–38.0)33.0 (26.0–41.0)31.5 (27.0–38.8)35.0 (26.0–45.0)35.0 (27.0–43.0)
HIV status1452 (98.7)19Infected (%)90 (15)10 (14)103 (18)19 (22)31 (25)97 (23)
Smoker1443 (98.1)28Yes (%)127 (21)18 (26)149 (27)11 (13)26 (20)97 (23)
Cough duration (weeks)1454 (98.8)17Median (IQR)4.0 (3.0–4.0)4.0 (3.0–4.0)4.0 (3.0–4.0)3.5 (2.2–4.0)4.0 (2.0–5.0)3.0 (2.0–4.0)
Socioeconomic status1452 (98.7)19Median (IQR)80.0 (50.0–125.0)75.0 (50.0–125.0)87.5 (50.0–126.2)85.4 (50.0–122.9)75.0 (50.0–125.0)83.3 (50.0–133.3)
Education1471 (100.0)0No education (%)90 (14.7)5 (7.0)76 (13.3)10 (11.6)15 (11.6)76 (17.6)
Primary (%)391 (63.8)54 (76.1)390 (68.2)61 (70.9)92 (71.3)273 (63.0)
Secondary (%)105 (17.1)12 (16.9)90 (15.7)13 (15.1)18 (14.0)64 (14.8)
University (%)27 (4.4)0 (0.0)16 (2.8)2 (2.3)4 (3.1)20 (4.6)
Malnutrition1471 (100.0)0Yes (%)83 (13.5)9 (12.7)71 (12.4)11 (12.8)13 (10.1)56 (12.9)
Relapse/reinfection1447 (98.4)24Yes (%)9 (1.5)4 (5.6)18 (3.2)2 (2.4)3 (2.3)12 (2.8)
Drug resistance status1471 (100.0)0Susceptible (%)574 (93.6)70 (98.6)556 (97.2)80 (93.0)119 (92.2)0
INH-Mono (%)37 (6.0)1 (1.4)16 (2.8)6 (7.0)10 (7.8)0
MDR (%)0 (0.0)0 (0.0)0 (0.0)2 (0.3)0
seBS1009 (68.6)462Median (IQR)0.5 (0.3–0.6)0.5 (0.4–0.5)0.5 (0.4–0.6)0.5 (0.4–0.6)0.5 (0.4–0.6)0.5 (0.3–0.6)
eBS1009 (68.6)462Median (IQR)0.2 (0.2–0.3)0.2 (0.2–0.3)0.2 (0.2–0.3)0.2 (0.2–0.3)0.2 (0.2–0.3)0.2 (0.2–0.3)
wBS1009 (68.6)462Median (IQR)0.1 (0.0–0.1)0.1 (0.1–0.1)0.1 (0.0–0.1)0.1 (0.0–0.1)0.1 (0.0–0.1)0.1 (0.0–0.1)
Other ancestry1009 (68.6)462Median (IQR)0.2 (0.2–0.3)0.2 (0.2–0.3)0.2 (0.2–0.3)0.2 (0.2–0.2)0.2 (0.2–0.2)0.2 (0.2–0.3)
  1. *

    The column N (%) indicates the total number of patients with bacterial genetic data available that contained a value for the respective variable.

  2. The percentage of an MTBC genotype that has the respective characteristic (e.g. percentage of males among patients infected with an Intro 1 MTBC genotype).

Table 3
Estimated associations between disease severity, human genetic ancestry, and bacterial genotype.

Three variables as proxies for disease severity were included: lung damage (mild versus severe), TB-score (mild versus severe), bacterial load (continuous, log10 transformed). Binomial logistic regressions were performed on the data of HIV-negative patients and adjusting was done for age, sex, smoking, socioeconomic status, level of education, malnutrition, TB type (relapse or new infection), and drug resistance status by including these variables in the model. For the ancestries and the interactions, the p-values were retrieved by performing a likelihood ratio test comparing a model including the ancestries and interactions to a model without them. This table combines the results of two logistic regressions per disease severity measure, one including an interaction and one without. The ancestries were transformed and categorized (see Methods) with category 1 comprising the lowest amount of the respective ancestry and category 3 (4 in the case of wBS) the highest amount.

Disease severity measure
Lung damageTB-scoreBacterial load (Ct-value)
OR adjustedp-value adjustedOR adjustedp-value adjustedOR adjustedp-value adjusted
MTBC genotype*L3.1.1 – Introduction 101.60 (0.95–2.67)0.071.00 (0.72–1.40)0.980.99 (0.97–1.00)0.13
Other genotypes111.00
Human ancestryseBS category 32.29 (1.04–5.42)0.191.06 (0.66–1.69)0.171.00 (0.98–1.03)0.13
seBS category 22.40 (1.15–5.41)0.62 (0.40–0.94)1.02 (0.99–1.04)
seBS category 1111.00
eBS category 31.30 (0.51–3.79)1.09 (0.64–1.84)1.02 (0.99–1.05)
eBS category 21.50 (0.61–4.28)1.17 (0.69–1.94)1.00 (0.97–1.03)
eBS category 1111.00
wBS category 40.58 (0.23–1.53)0.94 (0.49–1.76)1.02 (0.98–1.06)
wBS category 30.54 (0.21–1.46)1.09 (0.57–2.04)1.03 (0.99–1.07)
wBS category 20.73 (0.26–2.15)1.03 (0.50–2.09)1.03 (0.99–1.08)
wBS category 1111.00
InteractionseBS category 3* L3.1.1 – Intro 101.05 (0.20–5.43)0.06(1)1.05 (0.40–2.71)0.920.99 (0.94–1.04)0.83
seBS category 2* L3.1.1 – Intro 100.39 (0.08–1.94)1.29 (0.52–3.15)0.98 (0.93–1.03)
seBS category 1* L3.1.1 – Intro 10111.00
eBS category 3* L3.1.1 – Intro 108.32 (0.91–193.14)1.45 (0.48–4.34)1.01 (0.94–1.07)
eBS category 2* L3.1.1 – Intro 106.38 (0.71–146.86)1.25 (0.42–3.70)1.02 (0.96–1.09)
eBS category 1* L3.1.1 – Intro 10111.00
wBS category 4* L3.1.1 – Intro 100.17 (0.02–1.23)1.79 (0.49–6.56)0.97 (0.90–1.05)
wBS category 3* L3.1.1 – Intro 100.33 (0.04–2.51)1.48 (0.40–5.51)0.99 (0.92–1.07)
wBS category 2* L3.1.1 – Intro 100.87 (0.09–8.06)2.40 (0.53–11.05)0.98 (0.90–1.07)
wBS category 1* L3.1.1 – Intro 10111.00
  1. *

    The odds ratio for genotype represents the odds of severe disease for Introduction 10 compared to the odds for other genotypes.

  2. The odds ratios represent the estimated multiple in the odds of severe disease for a one-unit increase in the additive log-transformed ancestry variable.

Additional files

Supplementary file 1

The different ethnic groups with at least 10 members in our cohort and the region and broad geographic location of the original area of the ethnic group.

The latitude and longitude are given in decimal degrees. In the case of two associated regions, a location close to the border of the two regions was selected.

https://cdn.elifesciences.org/articles/103533/elife-103533-supp1-v1.docx
Supplementary file 2

Phylogenetic markers selected to identify the introductions.

The position is based on the reconstructed reference of the ancestor (Chang et al., 2015) and the derived base indicates the base present in the respective introduction. Intro 1 refers to Introduction 1 within L2.2.1, Intro 5 to Introduction 5 within L4.3.4, Intro 9 to Introduction 9 within L1.1.2, and Intro 10 to Introduction 10 within L3.1.1.

https://cdn.elifesciences.org/articles/103533/elife-103533-supp2-v1.docx
MDAR checklist
https://cdn.elifesciences.org/articles/103533/elife-103533-mdarchecklist1-v1.pdf

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Michaela Zwyer
  2. Zhi Ming Xu
  3. Amanda Ross
  4. Jerry Hella
  5. Mohamed Sasamalo
  6. Maxime Rotival
  7. Hellen Charles Hiza
  8. Liliana K Rutaihwa
  9. Sonia Borrell
  10. Klaus Reither
  11. Jacques Fellay
  12. Damien Portevin
  13. Lluis Quintana-Murci
  14. Sebastien Gagneux
  15. Daniela Brites
(2026)
Human genetic ancestry, Mycobacterium tuberculosis diversity, and tuberculosis disease severity in Dar es Salaam, Tanzania
eLife 14:RP103533.
https://doi.org/10.7554/eLife.103533.3