Research Article

Human genetic ancestry, Mycobacterium tuberculosis diversity, and tuberculosis disease severity in Dar es Salaam, Tanzania

Swiss Tropical and Public Health Institute, Switzerland
University of Basel, Switzerland
Swiss Institute of Bioinformatics, Switzerland
School of Life Sciences, École Polytechnique Fédérale de Lausanne, Switzerland
Department of Intervention and Clinical Trials, Ifakara Health Institute, United Republic of Tanzania
Human Evolutionary Genetics Unit, Institut Pasteur, Université Paris Cité, France
FIND, Foundation for Innovative New Diagnostics, Switzerland
Precision Medicine Unit, Lausanne University Hospital and University of Lausanne, Switzerland
Chair of Human Genomics and Evolution, Collège de France, France

Mar 24, 2026

https://doi.org/10.7554/eLife.103533.3

Open access
Copyright information

eLife Assessment

This valuable observational study was conducted in Dar es Salaam, Tanzania, to investigate potential associations between genetic variation in Mycobacterium tuberculosis and human host vs. disease severity. The authors conclude that human genetic ancestry did not contribute to tuberculosis severity and the evidence supporting this is generally convincing. The findings have significance for the understanding of the influence of host/bacillary genetics on tuberculosis disease.

https://doi.org/10.7554/eLife.103533.3.sa0

Significance of the findings:

Valuable: Findings that have theoretical or practical implications for a subfield

Landmark
Fundamental
Important
Valuable
Useful

Strength of evidence:

Convincing: Appropriate and validated methodology in line with current state-of-the-art

Exceptional
Compelling
Convincing
Solid
Incomplete
Inadequate

During the peer-review process the editor and reviewers write an eLife Assessment that summarises the significance of the findings reported in the article (on a scale ranging from landmark to useful) and the strength of the evidence (on a scale ranging from exceptional to inadequate). Learn more about eLife Assessments

Abstract
Introduction
Results
Discussion
Methods
Appendix 1
Data availability
References
Article and author information
Metrics

Abstract

Infectious diseases have affected humanity for millennia and are among the strongest selective forces. Tuberculosis (TB) is an ancient disease, caused by the human-adapted members of the Mycobacterium tuberculosis complex (MTBC). The outcome of TB infection and disease is highly variable, and co-evolution between human populations and MTBC strains may account for some of this variability. Particular human genetic ancestries have been associated with higher susceptibility to TB, but sociodemographic aspects of the disease can confound such associations. Here, we studied 1000 TB patients from Dar es Salaam, Tanzania, together with their respective MTBC isolates, by combining human and bacterial genomics with clinical data. We found that the genetic background of the TB patient population was strongly influenced by migrations of Bantu-speaking populations from West Africa, which contrasts with the corresponding MTBC genotypes that were mainly introduced from outside Africa. These findings suggest a recent evolutionary history of co-existence between the human and MTBC populations in Dar es Salaam. We detected no evidence of an effect of human genetic ancestry, or MTBC phylogenetic diversity alone, nor their interaction, on TB disease severity. There was also no evidence of an association between human variation genome-wide and TB disease severity. Treatment-seeking, social, and environmental factors are likely to be the main determinants of disease severity at the point of care in this patient population.

Introduction

Africa harbors the largest human genetic diversity worldwide (Tishkoff and Verrelli, 2003). This continent is also inhabited by numerous ethnic and linguistic groups (Reed and Tishkoff, 2006). While the long evolutionary history of modern humans in Africa and their large effective population sizes (Garrigan et al., 2007) explain this high genetic diversity, more recent migration events within and from outside Africa during the last 5000 years, as well as admixture between historically separated populations, have resulted in some degree of homogenization (Patin et al., 2017). Hence, African populations nowadays are often composed of a mixture of different genetic ancestries (Bird et al., 2023; Pfennig et al., 2023). One human migration that had a major influence on the population structure of present-day Africans is the so-called ‘Bantu expansion’, where Bantu-speaking (BS) groups migrated from central Western Africa southwards and eastwards, spreading farming technologies across sub-Saharan Africa, and admixing with local groups of hunter–gatherers and pastoralists (Patin et al., 2017; Wang et al., 2020). Recent human genetic studies have identified moderate population structuring among BS populations (Semo et al., 2020; Sengupta et al., 2021), yet admixture with local populations has impacted immune responses to infectious diseases (Uren et al., 2017). Since pathogens have been one of the strongest selective forces driving human evolution (Fumagalli et al., 2011), disease susceptibility and clinical outcomes can differ markedly between human populations. Genome-wide association studies (GWAS) have identified mutations altering the susceptibility to various infectious diseases (Chapman and Hill, 2012; Newport and Finan, 2011; Lee et al., 2013; Akcay et al., 2018), including tuberculosis (TB), which remains the main cause of human death due to a single infectious agent (WHO, 2023).

The bacteria that cause TB belong to the Mycobacterium tuberculosis complex (MTBC) and can be classified into ten human-adapted phylogenetic lineages: Lineage 1 (L1) to L10, plus several lineages adapted to different wild and domestic animal host species (Zwyer et al., 2021; Guyeux et al., 2024). While TB is a global problem, the human-adapted MTBC lineages differ in their geographical distribution. L2 and L4 occur worldwide, and other lineages are restricted to specific regions. Specifically, L1 and L3 mainly occur around the Indian Ocean (Gagneux, 2018), L5 and L6 are limited to West Africa (de Jong et al., 2010), and L7 only occurs in Ethiopia (Wiens et al., 2018). This phylogeographic population structure of the human-adapted MTBC led to the hypothesis that certain MTBC genotypes are locally adapted to their sympatric human populations (Gagneux, 2018). This hypothesis is supported by findings from cosmopolitan settings, where sympatric associations between the geographical origin of TB patients and their infecting MTBC strains were observed (Reed et al., 2009; Gagneux et al., 2006; Baker et al., 2004; Gröschel et al., 2024). Moreover, in immune-compromised individuals with HIV co-infection, these sympatric associations were disrupted (Fenner et al., 2013). The various MTBC genotypes also differ phenotypically (Du et al., 2023), regarding disease progression (de Jong et al., 2008), transmission (Asare et al., 2018; Holt et al., 2018), disease presentation (Click et al., 2012) and immune activation in vitro (Manca et al., 2004; Arbués et al., 2025) and ex vivo (Coussens et al., 2013).

Human genetic diversity has also been linked to differences in TB susceptibility. While, for example, TYK2 has been associated with TB disease worldwide (Boisson-Dupuis et al., 2018), several human genetic loci were not consistently associated with TB in populations from different geographical regions but specific to certain populations (Stein et al., 2017; Phelan et al., 2023; Bai et al., 2023; Png et al., 2012; Zheng et al., 2018). Particular human genetic ancestries have also been found to play a role in the context of TB. People with higher proportions of native Peruvian genetic ancestry showed a higher risk of progressing to active TB (Asgari et al., 2022), and a higher proportion of San genetic ancestry was associated with an increased risk for TB among South African Coloured individuals (Daya et al., 2014). Differences in ethnicity have also been associated with different inflammatory profiles of TB patients at the time of presentation (Coussens et al., 2013). In addition to the effects of human and bacterial genetic diversity on TB, many social and environmental factors, as well as co-morbidities, are known drivers of TB. These include malnutrition (Macallan, 1999), HIV infection, diabetes (Selwyn et al., 1989; Dooley and Chaisson, 2009), poverty (Spence et al., 1993), smoking (Bates et al., 2007), and alcohol consumption (Imtiaz et al., 2017). While associations between TB and individual host, pathogen, or environmental factors have been found (Stein et al., 2017; Phelan et al., 2023; Bai et al., 2023; Png et al., 2012; Zheng et al., 2018; Asgari et al., 2022; Macallan, 1999), studies considering all these components simultaneously remain scarce (McHenry et al., 2021; Xu et al., 2025; Ogarkov et al., 2012).

Symptomatic TB has a wide spectrum of severity, and studies conducted before anti-TB drugs became available suggest that mild disease is associated with higher odds of natural recovery (Leavitt et al., 2024; Alling and Bosworth, 1960). If mortality and natural recovery from symptomatic TB vary depending on the spectrum of disease, conceivably human populations exhibit genetic variation underlying disease severity. Furthermore, given that humans and the MTBC have coexisted for millennia, some of this variation could be linked to human ancestry. Here, we characterized the genetic variation including the genetic ancestry of a cohort of TB patients from Dar es Salaam, Tanzania, the phylogenetic lineage of their correspondent MTBC isolate, and investigated the association of both with TB disease severity while accounting for demographic, social–economical, and environmental variables.

Results

Genetic ancestry of Tanzanian TB patients

Genetic ancestries were estimated for 7479 individuals from 249 populations (Figure 1—figure supplement 1) including 1444 Tanzanian TB patients, using the software Admixture (Alexander et al., 2009) with 53,255 SNPs (Figure 1—figure supplement 2).

The optimal number of source populations to describe our dataset was 24, based on the lowest cross-validation error (Alexander and Lange, 2011; Figure 1—figure supplement 3). For this study, we named the genetic ancestries according to the geographical distribution and/or ethnicity of the reference populations that they are most prevalent in. The genetic ancestry with the highest contribution among Tanzanians with a mean of 44% (maximum 68%, minimum 0%) was also the most abundant in BS people from Southern and southeastern Africa (e.g. the Ronga population in Figure 1), and hence hereafter, we will refer to this ancestry as ‘Southeastern BS’ (Figure 1). The second most common genetic ancestry with a mean of 22% (maximum 42%, minimum 6%) in the Tanzanian TB patients was most common among Kenyans (e.g. the Luhya population in Figure 1, 1000G and HGDP, see methods), and will be referred to as the ‘Eastern BS’ genetic ancestry. Additionally, the Tanzanian TB patients had a mean of 9% (maximum 53%, minimum 0%) of a genetic ancestry that was most common among BS populations from western Central Africa (e.g. the Eviya population in Figure 1); we will refer to it as the ‘Western BS’ genetic ancestry. Furthermore, the Tanzanian TB patients contained on average 4% of a genetic ancestry most abundant among Nigerians represented by the Esan population (‘Nigerian’ genetic ancestry in Figure 1), and 4% of a genetic ancestry most abundant in people from Chad and Sudan (represented by the Nuba population in Figure 1). In addition, the genetic ancestry of the Tanzanian TB patients was composed of 3% of a genetic ancestry most prevalent in people from Western Africa (Gambia and Senegal represented by the Senegal Bedik population, in Figure 1), as well as 3% of a genetic ancestry most prevalent in individuals from western Africa rainforest hunter–gatherer populations (Bezan population in Figure 1). A mean of 2% belonged to a genetic ancestry most common among Bedouin individuals (represented by the Yemenite Jew population in Figure 1). The proportions of the remaining genetic ancestries were all smaller than 2% (Figure 1—figure supplement 4, admixture plots for all African populations included can be found in Figure 1—figure supplement 5). Finally, most Tanzanian TB patients had little non-African genetic ancestry (mean 5%, maximum 65%), with only 14 patients (~1%) showing more than 30% non-African genetic ancestry. In summary, the ancestry of Tanzanian TB patients was, for the most part, a mixture of three different Bantu components. Thus, for the remaining sections, we will focus on the ancestries termed Eastern BS (eBS), Southeastern BS (seBS), and Western BS (wBS).

Figure 1 with 5 supplements see all

Download asset Open asset

Genetic ancestry analyses of Tanzanian TB patients.

(A) Genetic ancestry proportions of the 1444 Tanzanian TB patients and representative human populations who shared at least 1% of their most common genetic ancestry with the Tanzanians for K = 24 (ESN: Esan from Nigeria (1000G), LWK: Luhya from Kenya (1000G)). For all populations included in our study, see Figure 1—figure supplement 1 for their geographic distribution and Figure 1—figure supplement 5 for the ancestry composition of all African populations included in this study. (B) The geographical location of the representative populations shown in A is depicted with black circles, and the corresponding country is highlighted. The remaining African populations included in the analysis are represented by blue circles.

Insights into the BS genetic ancestries in Africa and Tanzania

Compiling several datasets, including many different BS populations, allowed for a closer look at the distribution of BS ancestries across African populations. Like Tanzanians, the populations from the neighboring Kenya and Mozambique showed strong contributions of BS ancestries resulting from different admixture events (Figure 1). While the eBS genetic ancestry was highest in Kenya and Tanzania, decreasing from there to the south and to the west of the continent (Figure 2A), the seBS genetic ancestry generally increased toward the south and decreased toward the west as observed by others (Figure 2B; Semo et al., 2020). The wBS genetic ancestry was mainly seen in BS populations from Gabon and Cameroon, as well as in South African populations (Figure 2C). At a continental-wide level, the geographical distribution and the genetic distances of the human populations analyzed were significantly correlated (Mantel test: veridical correlation = 0.18, p-value <0.001, Figure 2—figure supplement 1).

Figure 2 with 1 supplement see all

Download asset Open asset

Spatial visualizations of the Bantu-speaking (BS) genetic ancestries and the genetic ancestries of the different self-identified ethnic groups among the TB patients in Tanzania.

The genetic ancestry was inferred by admixture with K = 24, and the interpolation of the ancestries was performed by using the pykrige module in Python (see methods). (A) eBS genetic ancestry, (B) seBS genetic ancestry, and (C) wBS genetic ancestry. The populations included for spatial interpolations are marked with a black dot on the maps. The maps were created using the basemap module in Python. (D) Geographical origin of the ethnic groups among our TB patient cohort. The Temeke District hospital in Dar es Salaam where the patients were recruited is marked with a red point. Note that for some ethnic groups, no geographical origin could be identified (Supplementary file 1). (E) Ancestry plots for the different ethnic groups with at least 10 patients from our TB patient cohort.

Even though all our TB patients were recruited in Dar es Salaam, we found them to belong to a variety of self-defined ethnic groups linked to different geographical regions within Tanzania (Figure 2D). Even at this smaller scale, we found a significant correlation between the geographic distance of the self-defined ethnic groups of our TB patients and their genetic distances (Mantel test: veridical correlation = 0.12, p-value <0.001). The geographical structuring of human genetic diversity within Tanzania was further supported by our finding that the seBS genetic ancestry increased from West to East, and from North to South. The eBS genetic ancestry increased from South to North and decreased from West to East. The wBS genetic ancestry increased to the North and decreased to the East (Figure 2—figure supplement 1A).

The MTBC genotypes circulating in Tanzania and their association with TB disease severity

In previous work (Zwyer et al., 2023), we investigated the MTBC genotypes causing TB in the patient cohort analyzed here. We found a high diversity of MTBC genotypes, with approximately half of the TB cases caused by four main MTBC genotypes estimated to have been introduced into Tanzania starting 320 years ago. After adding the genomic information of an additional 389 MTBC isolates to our dataset (new total N = 1471), the prevalence of the four dominant MTBC genotypes was very similar to our previous findings (Zwyer et al., 2023). The most frequent genotype within L3.1.1 (referred to as ‘Introduction 10’) contributed 39% of the current TB cases, followed by a genotype within L1.1.2 (‘Introduction 9’) with 9%, a genotype within L4.3.4 (‘Introduction 5’) with 6%, and a genotype within L2.2.1 (‘Introduction 1’) with 5%. The remaining TB cases were caused by a variety of other genotypes within L1–L4 but occurred at frequencies of 1% or less (Zwyer et al., 2023). Despite a 36% increase in sample size compared to our previous analysis (Zwyer et al., 2023), we still found no evidence of an association between the different MTBC genotypes circulating in Dar es Salaam and TB disease severity using as proxies; X-ray scores (mild or severe), bacterial load (inferred from GeneXpert cycle threshold), and TB-score (mild or severe) (Table 1, Appendix 1—figure 1, Table 3) (see Methods).

Table 1

Human and bacterial genotypes by the severity measures.

		TB-score				Lung damage (X-ray score)					Bacterial load (Ct-value)
	Levels	Total N (%)	Missing N (%)	Severe N (%)	Mild N (%)	Total N	Missing N	Severe	Mild	Not available	Total N	Missing N	Mean (SD)
Total N (%)				624 (33)	1280 (67)			177 (9)	849 (45)	878 (46)
MTBC genotype	Other	1471 (77.3)	433	207 (42)	406 (41)	764 (74.5)	262	51 (39)	269 (43)	293 (41)	863 (78.3)	239	19.4 (4.9)
	L2.2.1 – Intro 1			22 (4)	49 (5)			7 (5)	28 (4)	36 (5)			20.5 (5.1)
	L3.1.1 – Intro 10			184 (38)	388 (40)			59 (45)	239 (38)	274 (39)			19.1 (4.7)
	L4.3.4 – Intro 5			26 (5)	60 (6)			4 (3)	38 (6)	44 (6)			19.5 (4.1)
	L1.1.2 – Intro 9			51 (10)	78 (8)			11 (8)	58 (9)	60 (8)			19.1 (4.7)
seBS Bantu	Mean (SD)	1442 (75.7)	462	0.44 (0.2)	0.44 (0.2)	840 (81.9)	186	0.45 (0.1)	0.43 (0.2)	0.45 (0.2)	810 (73.5)	292	19.9 (5.2)
eBS	Mean (SD)	1442 (75.7)	462	0.23 (0.1)	0.22 (0.1)	840 (81.9)	186	0.22 (0.1)	0.23 (0.1)	0.22 (0.1)	810 (73.5)	292	19.9 (5.2)
wBS	Mean (SD)	1442 (75.7)	462	0.08 (0.1)	0.09 (0.1)	840 (81.9)	186	0.08 (0.1)	0.09 (0.1)	0.08 (0.1)	810 (73.5)	292	19.9 (5.2)
Other ancestry	Mean (SD)	1442 (75.7)	462	0.25 (0.1)	0.25 (0.1)	840 (81.9)	186	0.25 (0.1)	0.25 (0.1)	0.25 (0.1)	810 (73.5)	292	19.9 (5.2)

Relationship between the human and MTBC population structures

We previously found that the four dominant MTBC genotypes in Dar es Salaam differed in their transmission rate and in the duration of the infectious period (Zwyer et al., 2023). Here, we assessed whether there might be an additional host genetic contribution to these differences. We first compared the genetic ancestry proportions between patients infected with the four dominant genotypes and then tested whether patients who were genetically more closely related were infected with MTBC genotypes that were also more closely related as would be expected from a co-evolutionary process (Brites and Gagneux, 2015). However, the human genetic ancestry proportions differed only marginally between the TB patients infected by the four main MTBC genotypes (Table 2). Moreover, there was no correlation between the human and bacterial genetic distances (Mantel test: veridical correlation = –0.02, p-value = 0.85). Taken together, we found no statistically significant relationship between the human and bacterial genetic population structure in Dar es Salaam. These results also suggest that the genetic composition of this human population is unlikely to have a measurable effect on the differences in bacterial transmission rate and duration of the infectious period reported previously (Zwyer et al., 2023).

Table 2

Characteristics of MTBC genotypes for all patients with either human or bacterial genetic data available.

	N (%)*	Missing N	Levels	Other genotype	L2.2.1 – Intro 1	L3.1.1 – Intro 10	L4.3.4 – Intro 5	L1.1.2 – Intro 9	No bacterial data available
Total N (%)				613 (32)	71 (4)	572 (30)	86 (5)	129 (7)	433 (23)
Sex^†	1471 (100.0)	0	Male (%)	424 (69)	52 (73)	425 (74)	55 (64)	87 (67)	302 (70)
			Female (%)	189 (31)	19 (27)	147 (26)	31 (36)	42 (33)	131 (30)
Age in years	1471 (100.0)	0	Median (IQR)	33.0 (28.0–40.0)	31.0 (24.5–38.0)	33.0 (26.0–41.0)	31.5 (27.0–38.8)	35.0 (26.0–45.0)	35.0 (27.0–43.0)
HIV status^†	1452 (98.7)	19	Infected (%)	90 (15)	10 (14)	103 (18)	19 (22)	31 (25)	97 (23)
Smoker^†	1443 (98.1)	28	Yes (%)	127 (21)	18 (26)	149 (27)	11 (13)	26 (20)	97 (23)
Cough duration (weeks)	1454 (98.8)	17	Median (IQR)	4.0 (3.0–4.0)	4.0 (3.0–4.0)	4.0 (3.0–4.0)	3.5 (2.2–4.0)	4.0 (2.0–5.0)	3.0 (2.0–4.0)
Socioeconomic status	1452 (98.7)	19	Median (IQR)	80.0 (50.0–125.0)	75.0 (50.0–125.0)	87.5 (50.0–126.2)	85.4 (50.0–122.9)	75.0 (50.0–125.0)	83.3 (50.0–133.3)
Education^†	1471 (100.0)	0	No education (%)	90 (14.7)	5 (7.0)	76 (13.3)	10 (11.6)	15 (11.6)	76 (17.6)
			Primary (%)	391 (63.8)	54 (76.1)	390 (68.2)	61 (70.9)	92 (71.3)	273 (63.0)
			Secondary (%)	105 (17.1)	12 (16.9)	90 (15.7)	13 (15.1)	18 (14.0)	64 (14.8)
			University (%)	27 (4.4)	0 (0.0)	16 (2.8)	2 (2.3)	4 (3.1)	20 (4.6)
Malnutrition^†	1471 (100.0)	0	Yes (%)	83 (13.5)	9 (12.7)	71 (12.4)	11 (12.8)	13 (10.1)	56 (12.9)
Relapse/reinfection^†	1447 (98.4)	24	Yes (%)	9 (1.5)	4 (5.6)	18 (3.2)	2 (2.4)	3 (2.3)	12 (2.8)
Drug resistance status^†	1471 (100.0)	0	Susceptible (%)	574 (93.6)	70 (98.6)	556 (97.2)	80 (93.0)	119 (92.2)	0
			INH-Mono (%)	37 (6.0)	1 (1.4)	16 (2.8)	6 (7.0)	10 (7.8)	0
			MDR (%)		0 (0.0)	0 (0.0)	0 (0.0)	2 (0.3)	0
seBS	1009 (68.6)	462	Median (IQR)	0.5 (0.3–0.6)	0.5 (0.4–0.5)	0.5 (0.4–0.6)	0.5 (0.4–0.6)	0.5 (0.4–0.6)	0.5 (0.3–0.6)
eBS	1009 (68.6)	462	Median (IQR)	0.2 (0.2–0.3)	0.2 (0.2–0.3)	0.2 (0.2–0.3)	0.2 (0.2–0.3)	0.2 (0.2–0.3)	0.2 (0.2–0.3)
wBS	1009 (68.6)	462	Median (IQR)	0.1 (0.0–0.1)	0.1 (0.1–0.1)	0.1 (0.0–0.1)	0.1 (0.0–0.1)	0.1 (0.0–0.1)	0.1 (0.0–0.1)
Other ancestry	1009 (68.6)	462	Median (IQR)	0.2 (0.2–0.3)	0.2 (0.2–0.3)	0.2 (0.2–0.3)	0.2 (0.2–0.2)	0.2 (0.2–0.2)	0.2 (0.2–0.3)

*

The column N (%) indicates the total number of patients with bacterial genetic data available that contained a value for the respective variable.
†

The percentage of an MTBC genotype that has the respective characteristic (e.g. percentage of males among patients infected with an Intro 1 MTBC genotype).

Association of human genetic ancestry with TB disease severity

In a previous publication on the same cohort, we found that the bacterial genetic background could not explain the differences observed in TB disease severity (Zwyer et al., 2023). Since TB disease is shaped by the bacterial and the human genetic background as well as environmental factors (Comas and Gagneux, 2009), we next investigated whether human genetic ancestry could have contributed to the differences in disease severity observed between our TB patients. We assessed the associations between ancestry and the three proxies of TB severity in HIV-negative patients using logistic regression models. We included the three human genetic ancestries with the highest proportions among the Tanzanian TB patients (seBS, eBS, and wBS) as covariates, together with age, sex, smoking, socioeconomic status, TB category (relapse or not), malnutrition, education level, and drug resistance status to control for potential confounding. We found no evidence of an association between human genetic ancestry and any of these three proxies of TB disease severity (Table 1). We repeated the analysis by adding cough duration as a covariate to possibly account for the disease duration before recruitment, but the results remained unchanged. Additionally, we conducted a GWAS to test for associations between specific human genetic variants and the three proxies of TB disease severity. We included the top three human genetic principal components (PCs), HIV status, and the same covariates as for the regression described above. No evidence of an association was found (Figure 3).

Figure 3 with 1 supplement see all

Download asset Open asset

Manhattan plot for genome-wide association study (GWAS) conducted using (**A–C**) TB-score, X-ray score, and Ct-value.

Red line indicates GWAS significance threshold of 5e−8.

The combined effect of human and MTBC genetic diversity on TB disease severity

For a subset of 1000 TB patients, we had both an MTBC genome and a human genome or genotype available. Genetic interactions between the host and the pathogen have been shown to affect TB severity in other settings (Asgari et al., 2022; McHenry et al., 2020a). To test for potential interactions between human and bacterial diversity on TB severity, we added to the ancestry logistic regression models described in the previous section the most common MTBC genotype as an additional explanatory variable (L3.1.1 – Introduction 10), as well as the interaction between human ancestry and MTBC genotype. We only tested L3.1.1 – Introduction 10, since the numbers of patients with the other genotypes were too few for meaningful testing. However, we found no evidence for any interaction between the MTBC genotype and human ancestry influencing TB disease severity in this patient population (Table 3).

Table 3

Estimated associations between disease severity, human genetic ancestry, and bacterial genotype.

Three variables as proxies for disease severity were included: lung damage (mild versus severe), TB-score (mild versus severe), bacterial load (continuous, log₁₀ transformed). Binomial logistic regressions were performed on the data of HIV-negative patients and adjusting was done for age, sex, smoking, socioeconomic status, level of education, malnutrition, TB type (relapse or new infection), and drug resistance status by including these variables in the model. For the ancestries and the interactions, the p-values were retrieved by performing a likelihood ratio test comparing a model including the ancestries and interactions to a model without them. This table combines the results of two logistic regressions per disease severity measure, one including an interaction and one without. The ancestries were transformed and categorized (see Methods) with category 1 comprising the lowest amount of the respective ancestry and category 3 (4 in the case of wBS) the highest amount.

				Disease severity measure
		Lung damage		TB-score		Bacterial load (Ct-value)
		OR adjusted	p-value adjusted	OR adjusted	p-value adjusted	OR adjusted	p-value adjusted
MTBC genotype^*	L3.1.1 – Introduction 10	1.60 (0.95–2.67)	0.07	1.00 (0.72–1.40)	0.98	0.99 (0.97–1.00)	0.13
MTBC genotype^*	Other genotypes	1	0.07	1	0.98	1.00	0.13
Human ancestry^†	seBS category 3	2.29 (1.04–5.42)	0.19	1.06 (0.66–1.69)	0.17	1.00 (0.98–1.03)	0.13
	seBS category 2	2.40 (1.15–5.41)		0.62 (0.40–0.94)		1.02 (0.99–1.04)
	seBS category 1	1		1		1.00
	eBS category 3	1.30 (0.51–3.79)		1.09 (0.64–1.84)		1.02 (0.99–1.05)
	eBS category 2	1.50 (0.61–4.28)		1.17 (0.69–1.94)		1.00 (0.97–1.03)
	eBS category 1	1		1		1.00
	wBS category 4	0.58 (0.23–1.53)		0.94 (0.49–1.76)		1.02 (0.98–1.06)
	wBS category 3	0.54 (0.21–1.46)		1.09 (0.57–2.04)		1.03 (0.99–1.07)
	wBS category 2	0.73 (0.26–2.15)		1.03 (0.50–2.09)		1.03 (0.99–1.08)
	wBS category 1	1		1		1.00
Interaction^†	seBS category 3* L3.1.1 – Intro 10	1.05 (0.20–5.43)	0.06⁽¹⁾	1.05 (0.40–2.71)	0.92	0.99 (0.94–1.04)	0.83
	seBS category 2* L3.1.1 – Intro 10	0.39 (0.08–1.94)		1.29 (0.52–3.15)		0.98 (0.93–1.03)
	seBS category 1* L3.1.1 – Intro 10	1		1		1.00
	eBS category 3* L3.1.1 – Intro 10	8.32 (0.91–193.14)		1.45 (0.48–4.34)		1.01 (0.94–1.07)
	eBS category 2* L3.1.1 – Intro 10	6.38 (0.71–146.86)		1.25 (0.42–3.70)		1.02 (0.96–1.09)
	eBS category 1* L3.1.1 – Intro 10	1		1		1.00
	wBS category 4* L3.1.1 – Intro 10	0.17 (0.02–1.23)		1.79 (0.49–6.56)		0.97 (0.90–1.05)
	wBS category 3* L3.1.1 – Intro 10	0.33 (0.04–2.51)		1.48 (0.40–5.51)		0.99 (0.92–1.07)
	wBS category 2* L3.1.1 – Intro 10	0.87 (0.09–8.06)		2.40 (0.53–11.05)		0.98 (0.90–1.07)
	wBS category 1* L3.1.1 – Intro 10	1		1		1.00

*

The odds ratio for genotype represents the odds of severe disease for Introduction 10 compared to the odds for other genotypes.
†

The odds ratios represent the estimated multiple in the odds of severe disease for a one-unit increase in the additive log-transformed ancestry variable.

Discussion

In this study, we analyzed the genetic ancestry of TB patients, the MTBC diversity underlying their TB infection, and estimated the associations of both with disease severity in Dar es Salaam, Tanzania. We found a strong component of BS genetic ancestries among the Tanzanian TB patients, similar to those of neighboring populations from Mozambique and Kenya, and little non-African genetic ancestry. Genetic ancestry proportions did not differ between patients infected with different MTBC genotypes. There was no evidence that the patient genetic ancestry or the MTBC genotype on their own, nor their interaction, had any effect on TB disease severity.

Despite the fact that Tanzania is one of the few countries in sub-Saharan Africa where all four main African linguistic groups co-exist (Tishkoff et al., 2009), and that its largest city and economic capital, Dar es Salaam, is strongly influenced by different human populations from within and outside Africa, our cohort of TB patients mostly comprised BS ethnicities. Comparing this TB patient cohort to a large number of modern human populations revealed major components of eBS and seBS genetic ancestries. This genetic population structure probably resulted from several admixture events estimated to have happened between 1500 and 150 years ago, between local populations and BS populations who migrated from West Africa to the East and South of the continent (Patin et al., 2017). The TB patients investigated here were recruited in one district hospital of Dar es Salaam. Yet, we found the genetic distances between the patients to be correlated with the original geographic range of their self-identified ethnicities, suggesting that the corresponding human populations are not fully admixed. The population of Dar es Salaam has increased by several millions in the last 40 years, mainly as a result of immigration from rural parts of Tanzania (Llc, 2026). Thus, our findings suggest that our TB patient population mostly represents recent migrants to Dar es Salaam from other regions of Tanzania. Moreover, we found little evidence of Eurasian genomic influence in the TB patient population (on average 5% genetic ancestry). This is in strong contrast to coastal Swahili populations, as recent findings comparing modern and medieval Swahili people revealed large components of genetic ancestry derived from exchanges between local East African BS populations and people from India, Persia, and Arabia, starting as early as 1000 AD (Brielle et al., 2023). We conclude that our TB patient population does not represent the full spectrum of human genetic diversity in Tanzania.

In contrast to the genetic ancestry of the TB patients, we found that the MTBC genotypes infecting these patients descend from multiple historical introductions, which mainly resulted from the human exchanges that took place across the Indian Ocean during the last few centuries (Zwyer et al., 2023). Some of these recently introduced MTBC genotypes became dominant, in particular the MTBC genotype L3.1.1 – Introduction 10, which caused TB in approximately 40% of our patients. These strains descended from an introduction that occurred approximately 300 years ago from South or Central Asia to East Africa (Zwyer et al., 2023). Hence, while our TB patient population reflects little historical gene flow from non-African populations, the underlying MTBC diversity indicates that the MTBC genotypes introduced from outside successfully spread in this newly encountered host population, eventually outcompeting native MTBC genotypes (Comas et al., 2015).

We previously reported that in this TB patient cohort, some of the dominant MTBC genotypes had a higher transmission rate than others, while some other MTBC genotypes induced patients to remain infectious for longer (Zwyer et al., 2023). Based on the similar proportions of MTBC genotypes among self-reported ethnic groups we observed at the time, we had already hypothesized that human genetic heterogeneity of the host population is unlikely to be responsible for those differences (Zwyer et al., 2023). Here we formally addressed this hypothesis and found that there was no evidence that the human genetic ancestry proportions differed between patients infected with different MTBC genotypes in our cohort. This finding is consistent with the notion that the differences in epidemiological parameters we reported previously are mainly determined by the pathogen genotype.

Disease severity is one aspect of the clinical presentation of TB with a direct impact on patient mortality and morbidity, as well as on pathogen transmission, as it influences patient infectiousness. It is thus likely that both host and pathogen genetic characteristics can modulate TB disease severity (Brites and Gagneux, 2015). Experimental infections in various animal models suggest that different MTBC strains vary in virulence (López et al., 2003; Aguilar L et al., 2010; Dormans et al., 2004). However, in clinico evidence for differences in disease severity caused by different MTBC genotypes is inconsistent (Coscolla and Gagneux, 2010; Stanley et al., 2024). We found no evidence of differences in disease severity at the point of care caused by the different MTBC genotypes in our study. Moreover, we did not observe any association between human genetic ancestry and disease severity, which is in contrast to a recent study from Peru, where human genetic ancestry was found to influence progression to active TB (Asgari et al., 2022).

Our previous work found evidence of such genetic interactions when considering the complete genomes of the TB patients and their infecting MTBC strain (Xu et al., 2025) and identified associations between human and pathogen variants. Such associations reflect host–pathogen genetic interactions that determine susceptibility to symptomatic TB or intra-host selection during mycobacterial replication. Here, in the context of co-evolutionary history between humans and MTBC, we specifically tested whether an interaction between the main human ancestry components and being infected with the most dominant MTBC strain in Dar es Salaam could explain the variability in TB disease severity. However, we did not find evidence of such an effect. Others have reported an association between TB disease severity, a particular bacterial genotype, and a particular human SNP in Uganda but did not explicitly link this human SNP to a particular human genetic ancestry (McHenry et al., 2020a; McHenry et al., 2020b). Several factors could account for the lack of effect we observed. First, our patient population was relatively genetically homogeneous, given that the different BS components represent populations with only moderate levels of genetic differentiation (Cavalli-Sforza et al., 1994). Second, there is likely to be selection bias in our cohort since only patients presenting at the clinic were recruited. The disease severity measures included in this study mainly reflect disease stages at which patients felt ill enough to go to the hospital, that is we did not consider intermediate, more contrasting disease states that are known to occur between infection and the development of symptoms (Frascella et al., 2021). To at least partially account for that, we included the number of weeks a patient was coughing as a covariate in our analyses. Third, the lack of a measurable interaction between host genetic ancestry and MTBC genotype could reflect the relatively recent presence of these MTBC genotypes in Tanzania, and the distinct (i.e. allopatric) geographical origins of the host and pathogen populations. This indicates that none of the ancestral human populations that compose modern Tanzanians has lived in sympatry with the ancestors of the modern MTBC genotypes that circulate in Dar es Salaam today. With the exception of West Africa, where the geographically restricted West-African MTBC lineages L5 and L6 remain an important cause of human TB (Silva et al., 2022), the situation in Tanzania might be representative of the TB epidemics in many African countries, as evidence suggests that many MTBC genotypes dominating the continent today have been introduced from outside Africa recently in the history of human populations (Comas et al., 2015; Rutaihwa et al., 2019; Menardo et al., 2021; Chihota et al., 2018).

In conclusion, our study shows that the TB patients from Dar es Salaam were mainly of BS genetic ancestry reflecting limited Eurasian genetic influx. Neither the human genetic ancestry nor the MTBC genotype alone, nor their interaction, was associated with TB disease severity. Our results highlight the dominant role of social and environmental factors in human TB in Tanzania.

Methods

Study population

This study is based on a previously described dataset (Xu et al., 2025; Zwyer et al., 2023). Briefly, adult active TB patients with pulmonary disease (sputum smear-positive and GeneXpert-positive) were recruited between November 2013 and June 2022 at the Temeke District Hospital in Dar es Salaam, Tanzania, when they first presented for care. Sputum and blood samples were collected from each patient to extract DNA for sequencing of the MTBC strain and genotyping or whole-genome sequencing (WGS) of the patient. Additionally, clinical and sociodemographic information was obtained from every patient. In total, there was either human or bacterial data available for 1904 patients. Of those, 1444 patients had human genetic data and 1471 had bacterial genetic data available, respectively. A total of 1000 patients had both types of data available after quality-based filtering. The geographical locations of the self-indicated ethnic group of each patient were retrieved by searching for the original region of the respective ethnic group, and if they originated from a single region, the geographic coordinates according to Wikipedia were taken. If two neighboring regions were among the origins, then a random location between the two regions was taken as surrogate (Supplementary file 1).

Bacterial and human sample processing

The MTBC bacteria were cultured on solid Löwenstein-Jensen media at the TB laboratory of the Ifakara Health Institute in Bagamoyo, Tanzania. Before 2018, the MTBC isolates were transferred to Switzerland for DNA extraction and later, DNA extraction was carried out in Bagamoyo. Bacterial WGS was done using the Illumina short-read technology at the Department of Biosystems Science and Engineering of ETH Zurich in Basel (DBSSE). Human WGS was done at the Health 2030 Genome Center in Geneva, Switzerland, using an Illumina NovaSeq 6000 sequencer. The human genotyping was done at the iGE3 Genomics platform at the University of Geneva in Switzerland using the Illumina Infinium H3Africa genotyping microarray (Version 2; https://chipinfo.h3abionet.org) plus custom Tanzanian-specific SNP add-ons (Xu et al., 2022).

Human genetic data

The processing of the human genetic data has been described in detail by Xu et al., 2025. Briefly, we used the GRCh38 as a human reference genome to map the WGS reads of 118 patients using BWA aligner (v0.7.17) (Li and Durbin, 2009). Duplicate reads were then marked with the markduplicates module of Picard (v2.8.14) (broadinstitute, 2019). Variant calling was first done for each sample individually following GATK best practices for germline short variant discovery. Samples with a coverage below 5 were excluded, followed by a joint calling of the variants. A Quality Score Recalibration (VQSR) based filter was applied (real sensitivity of 99.7, excess heterozygosity of 54.69) and samples with more than 50% missing genotype calls were removed.

For the genotyping data, we used the Illumina GenomeStudio software (v2.0.5, https://support.illumina.com/array/array_software/genomestudio/downloads.html) to analyze the raw microarray data. Samples with a low quality, that were badly clustered, or that had a call rate below 0.97 were excluded. The PLINK format was then converted to VCF format using PLINK (v1.9) (Chang et al., 2015). The first round of imputation was performed with the African Genome Resources (AFGR, https://www.apcdr.org/) reference panel on the sanger imputation server with EAGLE (Loh et al., 2016) for phasing and positional Burrows–Wheeler transform (PBWT) (Durbin, 2014) for imputing. The second round was performed with a reference panel created in-house that was based on 118 patients with available whole-genome sequences (Xu et al., 2022) with SHAPEIT4 for phasing and Minimac3 for imputing. For each SNP, the reference panel with the highest imputation quality score was used to determine the final genotype call. SNPs with an INFO score below 0.8 were discarded.

Bcftools (v1.15) was used to merge the WGS and genotyping samples after identifying SNPs shared between the two methods that were missing in fewer than 10 samples.

Whole-genome sequence analysis of the MTBC bacteria

We analyzed all FASTQ files using the WGS analysis pipeline described previously (Menardo et al., 2018). In summary, Trimmomatic (Bolger et al., 2014) v. 0.33 (SLIDINGWINDOW:5:20) was used to remove the Illumina adaptors and to trim low-quality reads. Only reads with a length of at least 20 bp were kept for further analysis. Overlapping paired-end reads were merged using SeqPrep v. 1.2 (John, 2011) (overlap size = 15). We then mapped the resulting reads to a reconstructed ancestral sequence of the MTBC (Comas et al., 2010) with BWA v. 0.7.13 (mem algorithm) (Li and Durbin, 2009). Picard v. 2.9.1 (broadinstitute, 2025) was then applied to mark and exclude duplicated reads. Furthermore, the RealignerTargetCreator and IndelRealigner modules of GATK v. 3.4.0 (McKenna et al., 2010) were used to perform local realignment of reads around INDELs. Reads having an alignment score lower than $(0.93 \times r e a d l e n g t h) - (r e a d l e n g t h \times 4 \times 0.07)$ , corresponding to more than 7 mismatches per 100 bp, were excluded using Pysam v. 0.9.0 (pysam-developers, 2026). SNP calling was then performed with SAMtools v. 1.2 mpileup (Li, 2011) and VarScan v. 2.4.1 (Koboldt et al., 2012) with the following thresholds: a minimum mapping quality of 20, a minimum base quality at a position of 20, minimum read depth at a position of 7 and no strand bias. Positions in repetitive regions such as PE, PPE, and PGRS genes or phages were excluded, as described in Stucki et al., 2016. A whole-genome Fasta file was created from the resulting VCF file. We applied some additional filters; genomes were excluded from downstream analysis if they had a sequencing coverage of lower than 15 or if they contained SNPs suggestive of different MTBC lineages (i.e. mixed infections). We identified lineages and sublineages using the SNP-based classification by Steiner et al., 2014 and Coll et al., 2014, respectively. In addition, we identified drug resistance mutations published in the WHO catalogue (World Health Organization, 2021) and determined the respective drug resistance profile. The majority of strains were susceptible (95%), while 70 (5%) were isoniazid mono-resistant and two contained isoniazid and rifampicin resistance and were considered as multi-drug resistant.

Identification of bacterial SNPs diagnostic for the successful MTBC introductions

We previously identified several successful MTBC introductions into Dar es Salaam (McHenry et al., 2021). For these, we aimed to obtain a set of diagnostic SNPs that would allow assigning MTBC strains not included in our previous study to these genotypes. For that, we merged the VCF files from the 1,082 MTBC genomes included in that previous dataset by using BCFtools (v1.9). We then used VCFtools (v0.1.16) to remove Indels and positions that were variable in less than 12 genomes (12 was the minimal threshold selected when identifying the successful introductions in our previous publication Zwyer et al., 2023). By using the R package VariantAnnotation (Obenchain et al., 2014) and a customized Python script, SNPs specific to one of the most successful introductions were extracted. To ensure the SNPs identified as markers for the introductions were specific, we also identified phylogenetic SNPs on a bigger and global dataset representing the human-adapted MTBC diversity (Menardo et al., 2018) and tested whether any of the phylogenetic SNPs identified for any of the successful introductions was present in any of the MTBC lineages or sublineages. We compiled a subset of 25 SNPs (Supplementary file 2) that we used as phylogenetic markers for the different MTBC introductions and identified strains belonging to one of the four most successful MTBC introductions (Introduction 1, Introduction 5, Introduction 9, and Introduction 10) in the expanded MTBC dataset based on these SNPs.

Measures of TB disease severity

We used three different proxies for TB disease severity. The first one was the TB-score, which is a clinical score adapted from Wejse et al., 2008 that consists of several signs and symptoms including the presence of fever and the body mass index (BMI). A point was given for each of the following symptoms or clinical measures if present: BMI below 18, BMI below 16, mid-upper arm circumference (MUAC) below 220, MUAC below 200, body temperature higher than 37°C, cough, hemoptysis, dyspnea, chest pain, night sweat, abnormal auscultation, and anemia. A maximum of 12 points could be achieved, and a TB-score below 6 was considered as mild and everything above as severe. As a second proxy, we assessed the amount of lung damage. Two independent radiologists assessed chest X-ray pictures of the patients and gave a Ralph score (Ralph et al., 2010). X-ray scores above 71 were considered severe, and everything below was considered mild. The Ralph score has been validated to grade chest X-ray severity in adult pulmonary TB patients, and 71 was the optimal cutoff point for predicting unfavorable outcome (Li, 2011). Furthermore, Ralph scores higher than 71 have been associated with a longer duration of symptoms, a lower BMI, and higher clinical scores (Chakraborthy et al., 2018). As a third proxy, we used the bacterial load in the sputum represented by the difference between the first (early cycle) and the last (late cycle) threshold during quantitative PCR (Ct-value) as determined by GeneXpert MTB/RIF assays. For each sputum sample, we took five probes, ran a quantitative PCR each, and reported the lowest Ct-value.

Genetic ancestry analysis of TB patients

To investigate the genetic ancestry of the TB patients, we combined our dataset with the data from ten other projects: The Gambian Genome Variation Project (GGVP) (Band et al., 2019), the 1000 Genomes Project (1000G) (Altshuler et al., 2015), the Human Genome Diversity Project (HGDP) (Bergström et al., 2020), Simons Genome Diversity Project (Mallick et al., 2016), as well as data generated by Patin et al., 2017; Patin et al., 2014, Hollfelder et al., 2017, Semo et al., 2020, Schlebusch et al., 2012, and Fortes-Lima et al., 2022. We used the GRCh37 version of all datasets. The dataset of HGDP was in GRCh38 version, and we thus did a lift over to GRCh37 using the picard (v2.26.10) (broadinstitute, 2019) tool LiftoverVcf. For all the datasets including only populations from one single continent, we excluded variants with a missingness of more than 10% and only included variants that did not deviate from Hardy–Weinberg equilibrium (p < 1e−5) using PLINK (version 1.9b, https://www.cog-genomics.org/plink/1.9/) (Chang et al., 2015). For the 1000G, SGDP, and HGDP data, we first identified variants that deviated from the Hardy–Weinberg equilibrium (p < 1e−5) in each superpopulation using PLINK (version 2.0a, https://www.cog-genomics.org/plink/2.0/) and removed them from the whole dataset. We additionally removed variants with a high missingness (>10%) from the full datasets using PLINK (version 1.9b). After extraction of 103,262 nucleotide positions common to all datasets, we merged the datasets using PLINK (version 1.9b). From the merged dataset, we removed second-degree relatives using PLINK (version 2.0a, king cutoff of 0.088) (N = 369) and patients from our cohort where the sex according to the genetic data did not correspond with the sex indicated in the clinical data, patients who were genetic outliers based on principal component analysis (PCA) or who did not cluster with any other African samples (N = 83). In addition, we removed regions of high linkage disequilibrium (https://genome.sph.umich.edu/wiki/Regions_of_high_linkage_disequilibrium_(LD)) and applied additional filters to the merged dataset (missingness >10%, minimum allele frequency of 5%, removal of sex chromosomes, variant pruning with –indep-pairwise 50 10 0.1, only biallelic positions) ending up with 53,255 positions and 7479 individuals from 249 populations.

To infer the ancestry proportions of the Tanzanian TB patients, we used ADMIXTURE (version 1.3.0) (Alexander et al., 2009). We estimated the number of ancestral populations (K) by running ADMIXTURE 15 times for each value for K from K = 2 until K = 29 with the option --cv. The --cv option performs fivefold cross-validation and allows identifying the value for K resulting in the lowest cross-validation error (Figure 1—figure supplement 3). The cross-validation error was lowest for K = 24. From the 15 runs performed with K = 24, we selected a representative of an output that was supported by most of the 15 runs (6/15) to extract the ancestry proportions of each individual. A PCA was performed using PLINK (version 1.9b) on all African populations included.

Spatial interpolation of human genetic ancestry proportions

To visualize the distributions of the different patient ancestries, we performed spatial interpolation using the OrdinaryKriging function from the pykrige module in Python (variogram_model = ‘linear’, grid space of 500). For the eBS and seBS ancestries, we included all African populations for the interpolation. For the wBS, the interpolation failed when using all African populations with an insufficient slope, suggesting little spatial variability. Since many populations were sampled in the region with the highest proportions of wBS genetic ancestry and among them many hunter–gatherer populations containing little to no wBS genetic ancestry, we repeated the interpolation with only a subset of the non-hunter–gatherer populations.

Correlation between distance matrices

To assess whether MTBC genotypes that are more closely related tend to infect people that are also more closely related genetically, which would be compatible with co-evolution, we investigated the correlation between the human and bacterial distance matrices. To calculate the pairwise bacterial genetic distances, alignments of variable positions where data was missing in less than 10% of the genomes were generated and used to create SNP distance matrices according to the Hamming distance (https://git.scicore.unibas.ch/TBRU/tacos). Insertions and deletions were considered as missing data. To get the human pairwise genetic distances, we calculated the Euclidean distance based on the first two PCs. When only looking at the Tanzanian TB patients, we calculated the PCs for the Tanzanians only, while for the continental dataset, we included all available African populations.

To investigate whether human populations that are geographically more distant are also genetically more distant, we calculated the correlation between the geographical and the genetic distance on an African level as well as on the level of Tanzania. For the geographical distance matrix, we calculated the Euclidean distance based on the latitude and the longitude. At the level of Tanzania, the broad geographic location of the original area of the ethnic group was considered (Supplementary file 1), and for the continental level, the coordinates of the hospital in Temeke were taken for the Tanzanian TB patients, considering that for the other studies only the sampling locations were known. The human and bacterial genetic distance matrices as well as the human genetic and geographic distance matrices were tested for correlations using the mantel.test() function from the mantel module in Python (options: perms = 10,000, method = ‘pearson’, tail = ‘upper’).

Statistical analyses

Clinical and sociodemographic characteristics, MTBC genotype as well as human genetic ancestry were summarized by the different disease severity measures using proportions or means and standard deviations. Similarly, the human genetic ancestries were summarized by MTBC genotype.

To estimate the associations between disease severity and the explanatory variables bacterial genotype (binary, belonging to Introduction 10 or not) and human ancestries (seBS, eBS, and wBS), we used a logistic regression model. We included three variables as proxies for disease severity: Lung damage based on X-ray score (mild versus severe), TB-score (mild versus severe), bacterial load (continuous, log₁₀ transformed). Categorization or transformation of the disease severity measures was performed because the distributions of the residues were not normal, violating the assumption of linear regression. We tested only for Introduction 10 because there were few observations of the other MTBC genotypes. To account for the compositional nature of the human ancestries (i.e. that they sum up to 1), we used the additive log transformation from the R package ‘compositions’ (van den Boogaart et al., 2022). The ancestry proportions were transformed and categorized with category 1 comprising the lowest amount of the respective ancestry and category 3 (4 in the case of wBS) the highest. The categories were chosen to have roughly equal numbers of patients in each. We used categories to allow a non-linear relationship without specifying polynomials and avoid having difficulty in interpretation, but we recognize that a small amount of information is lost. Similar results were also obtained with other parameterizations. We assessed whether there was an interaction between ancestry and genotype on TB severity using the likelihood ratio test. For that, we compared a model including the interaction between the MTBC genotype and human genetic ancestry to a model without the interaction using the ‘lmtest’ package (Zeileis and Hothorn, 2002). The estimates were adjusted for age, sex, smoking, the number of weeks with cough, the socioeconomic status (ratio between household income and number of household members), education (no, primary, secondary, university), drug resistance profile, TB category (new infection or relapse/reinfection), and malnutrition. Only HIV-negative patients were included in the analysis. All statistical analyses were carried out in R (version 4.1.2). Code for the statistical analysis can be found (https://github.com/mzwyer/TB-Dar_Mtb, copy archived at Zwyer, 2026).

To conduct the GWAS, we used logistic regression (TB-score and X-ray score) and a linear model (Ct-value) implemented by PLINK (version 1.9). A total of 6,665,541 common (MAF >0.05) human genetic variants were included. Genetic PCs were calculated using GCTA (version 1.93.2). As covariates, we included the top three human genetic PCs to correct for population stratification, along with sex, HIV status, age, cough duration, smoking, socioeconomic status, TB category, malnutrition, education, and drug resistance profile. Genomic inflation factors were below 1 of all tested outcomes (Figure 3—figure supplement 1).

Appendix 1

Appendix 1—figure 1

Download asset Open asset

Association between the Bantu genetic ancestries and TB-score and X-ray score for each of the most successful introductions.

(A, C, E) Southeastern, Eastern, and Western Bantu on mild or severe TB-score, (B, D, F) Southeastern, Eastern, and Western Bantu genetic ancestry lung damage assessed by lung X-ray.

Data availability

The bacterial WGS data can be found under the bioprojects PRJEB49562 and PRJNA670836 on ENA and the human WGS and genotyping data under EGAS00001005850 and EGAS00001007216, respectively. Clinical data, statistical analysis, and customized Python scripts are available on https://github.com/mzwyer/TB-Dar_Mtb, copy archived at Zwyer, 2026.

The following data sets were generated

1. Zwyer M
2. Hella J
3. Sasamalo M
4. Reither K
5. Fellay J
6. Portevin D
7. Gagneux S
8. Brites D
(2022) EBI European Nucleotide Archive
ID PRJEB49562. Tuberculosis epidemic in Dar es Salaam, Tanzania.

https://www.ebi.ac.uk/ena/browser/view/PRJEB49562
1. Zwyer M
2. Hella J
3. Sasamalo M
4. Reither K
5. Fellay J
6. Portevin D
7. Gagneux S
8. Brites D
(2021) European Genome-phenome Archive
ID EGAD00001008400. TB-DAR Whole Genome Sequencing Study.

https://ega-archive.org/datasets/EGAD00001008400
1. Zwyer M
2. Hella J
3. Sasamalo M
4. Reither K
5. Fellay J
6. Portevin D
7. Gagneux S
8. Brites D
(2023) European Genome-phenome Archive
ID EGAD00010002507. TB-DAR Genotyping Study.

https://ega-archive.org/datasets/EGAD00010002507
1. Zwyer M
2. Hella J
3. Sasamalo M
4. Reither K
5. Fellay J
6. Portevin D
7. Gagneux S
8. Brites D
(2023) European Genome-phenome Archive
ID EGAS00001007216. TB-DAR Genotyping Study.

https://ega-archive.org/studies/EGAS00001007216
1. Zwyer M
2. Hella J
3. Sasamalo M
4. Reither K
5. Fellay J
6. Portevin D
7. Gagneux S
8. Brites D
(2021) European Genome-phenome Archive
ID EGAS00001005850. TB-DAR Whole Genome Sequencing Study.

https://ega-archive.org/studies/EGAS00001005850

The following previously published data sets were used

(2021) EBI European Nucleotide Archive
ID PRJNA670836. Biogeography of MTB lineage 1 and 3.

https://www.ebi.ac.uk/ena/browser/view/PRJNA670836

References

(2010) Mycobacterium tuberculosis strains with the Beijing genotype demonstrate variability in virulence associated with transmission
Tuberculosis 90:319–325.

https://doi.org/10.1016/j.tube.2010.08.004
- PubMed
- Google Scholar
1. Akcay IM
2. Katrinli S
3. Ozdil K
4. Doganay GD
5. Doganay L
(2018) Host genetic factors affecting hepatitis B infection outcomes: Insights from genome-wide association studies
World Journal of Gastroenterology 24:3347–3360.

https://doi.org/10.3748/wjg.v24.i30.3347
- PubMed
- Google Scholar
(2009) Fast model-based estimation of ancestry in unrelated individuals
Genome Research 19:1655–1664.

https://doi.org/10.1101/gr.094052.109
- PubMed
- Google Scholar
1. Alexander DH
2. Lange K
(2011) Enhancements to the ADMIXTURE algorithm for individual ancestry estimation
BMC Bioinformatics 12:246.

https://doi.org/10.1186/1471-2105-12-246
- PubMed
- Google Scholar
1. Alling DW
2. Bosworth EB
(1960) The after-history of pulmonary tuberculosis: VI. the first fifteen years following diagnosis
The American Review of Respiratory Disease 81:839–849.

https://doi.org/10.1164/arrd.1960.81.6.839
- PubMed
- Google Scholar
(2015) A global reference for human genetic variation
Nature 526:68–74.

https://doi.org/10.1038/nature15393
- PubMed
- Google Scholar
(2025) Soluble immune mediators orchestrate protective in vitro granulomatous responses across Mycobacterium tuberculosis complex lineages
eLife 13:RP99062.

https://doi.org/10.7554/eLife.99062
- PubMed
- Google Scholar
1. Asare P
2. Asante-Poku A
3. Prah DA
4. Borrell S
5. Osei-Wusu S
6. Otchere ID
7. Forson A
8. Adjapong G
9. Koram KA
10. Gagneux S
11. Yeboah-Manu D
(2018) Reduced transmission of Mycobacterium africanum compared to Mycobacterium tuberculosis in urban West Africa
International Journal of Infectious Diseases 73:30–42.

https://doi.org/10.1016/j.ijid.2018.05.014
- PubMed
- Google Scholar
1. Asgari S
2. Luo Y
3. Huang C-C
4. Zhang Z
5. Calderon R
6. Jimenez J
7. Yataco R
8. Contreras C
9. Galea JT
10. Lecca L
11. Jones D
12. Moody DB
13. Murray MB
14. Raychaudhuri S
(2022) Higher native Peruvian genetic ancestry proportion is associated with tuberculosis progression risk
Cell Genomics 2:100151.

https://doi.org/10.1016/j.xgen.2022.100151
- PubMed
- Google Scholar
1. Bai H
2. Song M
3. Lei S
4. Jiao L
5. Hu X
6. Wu T
7. Song J
8. Liu T
9. Peng W
10. Zhao Z
11. Meng Z
12. Ying B
(2023) Genome‐wide association study of tuberculosis in the western Chinese Han and Tibetan population
MedComm 4:e250.

https://doi.org/10.1002/mco2.250
- PubMed
- Google Scholar
(2004) Silent nucleotide polymorphisms and a phylogeny for Mycobacterium tuberculosis
Emerging Infectious Diseases 10:1568–1577.

https://doi.org/10.3201/eid1009.040046
- PubMed
- Google Scholar
1. Band G
2. Le QS
3. Clarke GM
4. Kivinen K
5. Hubbart C
6. Jeffreys AE
7. Rowlands K
8. Leffler EM
9. Jallow M
10. Conway DJ
11. Sisay-Joof F
12. Sirugo G
13. d’Alessandro U
14. Toure OB
15. Thera MA
16. Konate S
17. Sissoko S
18. Mangano VD
19. Bougouma EC
20. Sirima SB
21. Amenga-Etego LN
22. Ghansah AK
23. Hodgson AVO
24. Wilson MD
25. Enimil A
26. Ansong D
27. Evans J
28. Ademola SA
29. Apinjoh TO
30. Ndila CM
31. Manjurano A
32. Drakeley C
33. Reyburn H
34. Phu NH
35. Quyen NTN
36. Thai CQ
37. Hien TT
38. Teo YY
39. Manning L
40. Laman M
41. Michon P
42. Karunajeewa H
43. Siba P
44. Allen S
45. Allen A
46. Bahlo M
47. Davis TME
48. Simpson V
49. Shelton J
50. Spencer CCA
51. Busby GBJ
52. Kerasidou A
53. Drury E
54. Stalker J
55. Dilthey A
56. Mentzer AJ
57. McVean G
58. Bojang KA
59. Doumbo O
60. Modiano D
61. Koram KA
62. Agbenyega T
63. Amodu OK
64. Achidi E
65. Williams TN
66. Marsh K
67. Riley EM
68. Molyneux M
69. Taylor T
70. Dunstan SJ
71. Farrar J
72. Mueller I
73. Rockett KA
74. Kwiatkowski DP
75. Malaria Genomic Epidemiology Network
(2019) Insights into malaria susceptibility using genome-wide data on 17,000 individuals from Africa, Asia and Oceania
Nature Communications 10:5732.

https://doi.org/10.1038/s41467-019-13480-z
- PubMed
- Google Scholar
1. Bates MN
2. Khalakdina A
3. Pai M
4. Chang L
5. Lessa F
6. Smith KR
(2007) Risk of tuberculosis from exposure to tobacco smoke: a systematic review and meta-analysis
Archives of Internal Medicine 167:335–342.

https://doi.org/10.1001/archinte.167.4.335
- PubMed
- Google Scholar
1. Bergström A
2. McCarthy SA
3. Hui R
4. Almarri MA
5. Ayub Q
6. Danecek P
7. Chen Y
8. Felkel S
9. Hallast P
10. Kamm J
11. Blanché H
12. Deleuze JF
13. Cann H
14. Mallick S
15. Reich D
16. Sandhu MS
17. Skoglund P
18. Scally A
19. Xue Y
20. Durbin R
21. Tyler-Smith C
(2020) Insights into human genetic variation and population history from 929 diverse genomes
Science 367:eaay5012.

https://doi.org/10.1126/science.aay5012
- PubMed
- Google Scholar
1. Bird N
2. Ormond L
3. Awah P
4. Caldwell EF
5. Connell B
6. Elamin M
7. Fadlelmola FM
8. Matthew Fomine FL
9. López S
10. MacEachern S
11. Moñino Y
12. Morris S
13. Näsänen-Gilmore P
14. Nketsia V NK
15. Veeramah K
16. Weale ME
17. Zeitlyn D
18. Thomas MG
19. Bradman N
20. Hellenthal G
(2023) Dense sampling of ethnic groups within African countries reveals fine-scale genetic structure and extensive historical admixture
Science Advances 9:eabq2616.

https://doi.org/10.1126/sciadv.abq2616
- PubMed
- Google Scholar
1. Boisson-Dupuis S
2. Ramirez-Alejo N
3. Li Z
4. Patin E
5. Rao G
6. Kerner G
7. Lim CK
8. Krementsov DN
9. Hernandez N
10. Ma CS
11. Zhang Q
12. Markle J
13. Martinez-Barricarte R
14. Payne K
15. Fisch R
16. Deswarte C
17. Halpern J
18. Bouaziz M
19. Mulwa J
20. Sivanesan D
21. Lazarov T
22. Naves R
23. Garcia P
24. Itan Y
25. Boisson B
26. Checchi A
27. Jabot-Hanin F
28. Cobat A
29. Guennoun A
30. Jackson CC
31. Pekcan S
32. Caliskaner Z
33. Inostroza J
34. Costa-Carvalho BT
35. de Albuquerque JAT
36. Garcia-Ortiz H
37. Orozco L
38. Ozcelik T
39. Abid A
40. Rhorfi IA
41. Souhi H
42. Amrani HN
43. Zegmout A
44. Geissmann F
45. Michnick SW
46. Muller-Fleckenstein I
47. Fleckenstein B
48. Puel A
49. Ciancanelli MJ
50. Marr N
51. Abolhassani H
52. Balcells ME
53. Condino-Neto A
54. Strickler A
55. Abarca K
56. Teuscher C
57. Ochs HD
58. Reisli I
59. Sayar EH
60. El-Baghdadi J
61. Bustamante J
62. Hammarström L
63. Tangye SG
64. Pellegrini S
65. Quintana-Murci L
66. Abel L
67. Casanova J-L
(2018) Tuberculosis and impaired IL-23-dependent IFN-γ immunity in humans homozygous for a common TYK2 missense variant
Science Immunology 3:eaau8714.

https://doi.org/10.1126/sciimmunol.aau8714
- PubMed
- Google Scholar
(2014) Trimmomatic: a flexible trimmer for Illumina sequence data
Bioinformatics 30:2114–2120.

https://doi.org/10.1093/bioinformatics/btu170
- PubMed
- Google Scholar
1. Brielle ES
2. Fleisher J
3. Wynne-Jones S
4. Sirak K
5. Broomandkhoshbacht N
6. Callan K
7. Curtis E
8. Iliev L
9. Lawson AM
10. Oppenheimer J
11. Qiu L
12. Stewardson K
13. Workman JN
14. Zalzala F
15. Ayodo G
16. Gidna AO
17. Kabiru A
18. Kwekason A
19. Mabulla AZP
20. Manthi FK
21. Ndiema E
22. Ogola C
23. Sawchuk E
24. Al-Gazali L
25. Ali BR
26. Ben-Salem S
27. Letellier T
28. Pierron D
29. Radimilahy C
30. Rakotoarisoa JA
31. Raaum RL
32. Culleton BJ
33. Mallick S
34. Rohland N
35. Patterson N
36. Mwenje MA
37. Ahmed KB
38. Mohamed MM
39. Williams SR
40. Monge J
41. Kusimba S
42. Prendergast ME
43. Reich D
44. Kusimba CM
(2023) Entwined African and Asian genetic roots of medieval peoples of the Swahili coast
Nature 615:866–873.

https://doi.org/10.1038/s41586-023-05754-w
- PubMed
- Google Scholar
1. Brites D
2. Gagneux S
(2015) Co-evolution of Mycobacterium tuberculosis and Homo sapiens
Immunological Reviews 264:6–24.

https://doi.org/10.1111/imr.12264
- PubMed
- Google Scholar
Software
1. broadinstitute
(2019) Picard toolkit, version 1.8.x
Github.

https://broadinstitute.github.io/picard/
Software
1. broadinstitute
(2025) Picard, version fc0b084
Github.

https://github.com/broadinstitute/picard
Book
(1994)
The History and Geography of Human Genes

Princeton university press.
- Google Scholar
(2018) Chest X ray score (Timika score): an useful adjunct to predict treatment outcome in tuberculosis
Advances in Respiratory Medicine 86:205–210.

https://doi.org/10.5603/ARM.2018.0032
- PubMed
- Google Scholar
1. Chang CC
2. Chow CC
3. Tellier LC
4. Vattikuti S
5. Purcell SM
6. Lee JJ
(2015) Second-generation PLINK: rising to the challenge of larger and richer datasets
GigaScience 4:7.

https://doi.org/10.1186/s13742-015-0047-8
- PubMed
- Google Scholar
1. Chapman SJ
2. Hill AVS
(2012) Human genetic susceptibility to infectious disease
Nature Reviews Genetics 13:175–188.

https://doi.org/10.1038/nrg3114
- Google Scholar
1. Chihota VN
2. Niehaus A
3. Streicher EM
4. Wang X
5. Sampson SL
6. Mason P
7. Källenius G
8. Mfinanga SG
9. Pillay M
10. Klopper M
11. Kasongo W
12. Behr MA
13. Gey van Pittius NC
14. van Helden PD
15. Couvin D
16. Rastogi N
17. Warren RM
(2018) Geospatial distribution of Mycobacterium tuberculosis genotypes in Africa
PLOS ONE 13:e0200632.

https://doi.org/10.1371/journal.pone.0200632
- PubMed
- Google Scholar
(2012) Relationship between Mycobacterium tuberculosis phylogenetic lineage and clinical site of tuberculosis
Clinical Infectious Diseases 54:211–219.

https://doi.org/10.1093/cid/cir788
- PubMed
- Google Scholar
1. Coll F
2. McNerney R
3. Guerra-Assunção JA
4. Glynn JR
5. Perdigão J
6. Viveiros M
7. Portugal I
8. Pain A
9. Martin N
10. Clark TG
(2014) A robust SNP barcode for typing Mycobacterium tuberculosis complex strains
Nature Communications 5:4812.

https://doi.org/10.1038/ncomms5812
- PubMed
- Google Scholar
1. Comas I
2. Gagneux S
(2009) The past and future of tuberculosis research
PLOS Pathogens 5:e1000600.

https://doi.org/10.1371/journal.ppat.1000600
- PubMed
- Google Scholar
1. Comas I
2. Chakravartti J
3. Small PM
4. Galagan J
5. Niemann S
6. Kremer K
7. Ernst JD
8. Gagneux S
(2010) Human T cell epitopes of Mycobacterium tuberculosis are evolutionarily hyperconserved
Nature Genetics 42:498–503.

https://doi.org/10.1038/ng.590
- PubMed
- Google Scholar
1. Comas I
2. Hailu E
3. Kiros T
4. Bekele S
5. Mekonnen W
6. Gumi B
7. Tschopp R
8. Ameni G
9. Hewinson RG
10. Robertson BD
11. Goig GA
12. Stucki D
13. Gagneux S
14. Aseffa A
15. Young D
16. Berg S
(2015) Population genomics of Mycobacterium tuberculosis in ethiopia contradicts the virgin soil hypothesis for human tuberculosis in Sub-Saharan Africa
Current Biology 25:3260–3266.

https://doi.org/10.1016/j.cub.2015.10.061
- PubMed
- Google Scholar
1. Coscolla M
2. Gagneux S
(2010) Does M. tuberculosis genomic diversity explain disease diversity?
Drug Discovery Today. Disease Mechanisms 7:e43–e59.

https://doi.org/10.1016/j.ddmec.2010.09.004
- PubMed
- Google Scholar
1. Coussens AK
2. Wilkinson RJ
3. Nikolayevskyy V
4. Elkington PT
5. Hanifa Y
6. Islam K
7. Timms PM
8. Bothamley GH
9. Claxton AP
10. Packe GE
11. Darmalingam M
12. Davidson RN
13. Milburn HJ
14. Baker LV
15. Barker RD
16. Drobniewski FA
17. Mein CA
18. Bhaw-Rosun L
19. Nuamah RA
20. Griffiths CJ
21. Martineau AR
(2013) Ethnic variation in inflammatory profile in tuberculosis
PLOS Pathogens 9:e1003468.

https://doi.org/10.1371/journal.ppat.1003468
- PubMed
- Google Scholar
(2014) The role of ancestry in TB susceptibility of an admixed South African population
Tuberculosis 94:413–420.

https://doi.org/10.1016/j.tube.2014.03.012
- PubMed
- Google Scholar
1. de Jong BC
2. Hill PC
3. Aiken A
4. Awine T
5. Antonio M
6. Adetifa IM
7. Jackson-Sillah DJ
8. Fox A
9. Deriemer K
10. Gagneux S
11. Borgdorff MW
12. McAdam KPWJ
13. Corrah T
14. Small PM
15. Adegbola RA
(2008) Progression to active tuberculosis, but not transmission, varies by Mycobacterium tuberculosis lineage in The Gambia
The Journal of Infectious Diseases 198:1037–1043.

https://doi.org/10.1086/591504
- PubMed
- Google Scholar
(2010) Mycobacterium africanum--review of an important cause of human tuberculosis in West Africa
PLOS Neglected Tropical Diseases 4:e744.

https://doi.org/10.1371/journal.pntd.0000744
- PubMed
- Google Scholar
1. Dooley KE
2. Chaisson RE
(2009) Tuberculosis and diabetes mellitus: convergence of two epidemics
The Lancet. Infectious Diseases 9:737–746.

https://doi.org/10.1016/S1473-3099(09)70282-8
- PubMed
- Google Scholar
(2004) Correlation of virulence, lung pathology, bacterial load and delayed type hypersensitivity responses after infection with different Mycobacterium tuberculosis genotypes in a BALB/c mouse model
Clinical and Experimental Immunology 137:460–468.

https://doi.org/10.1111/j.1365-2249.2004.02551.x
- PubMed
- Google Scholar
Preprint
1. Du DH
2. Geskus RB
3. Zhao Y
4. Codecasa LR
5. Cirillo DM
6. van Crevel R
7. Pascapurnama DN
8. Chaidir L
9. Niemann S
10. Diel R
11. Omar SV
12. Grandjean L
13. Rokadiya S
14. Ortitz AT
15. Lân NH
16. Hà ÐTM
17. Smith EG
18. Robinson E
19. Dedicoat M
20. Nhat LTH
21. Thwaites GE
22. Van LH
23. Thuong NTT
24. Walker TM
(2023) The effect of M. tuberculosis lineage on clinical phenotype
medRxiv.

https://doi.org/10.1101/2023.03.14.23287284
- Google Scholar
1. Durbin R
(2014) Efficient haplotype matching and storage using the positional Burrows-Wheeler transform (PBWT)
Bioinformatics 30:1266–1272.

https://doi.org/10.1093/bioinformatics/btu014
- PubMed
- Google Scholar
(2013) HIV infection disrupts the sympatric host-pathogen relationship in human tuberculosis
PLOS Genetics 9:e1003318.

https://doi.org/10.1371/journal.pgen.1003318
- PubMed
- Google Scholar
(2022) Demographic and selection histories of populations across the sahel/savannah belt
Molecular Biology and Evolution 39:msac209.

https://doi.org/10.1093/molbev/msac209
- PubMed
- Google Scholar
1. Frascella B
2. Richards AS
3. Sossen B
4. Emery JC
5. Odone A
6. Law I
7. Onozaki I
8. Esmail H
9. Houben RMGJ
(2021) Subclinical tuberculosis disease-a review and analysis of prevalence surveys to inform definitions, burden, associations, and screening methodology
Clinical Infectious Diseases 73:e830–e841.

https://doi.org/10.1093/cid/ciaa1402
- PubMed
- Google Scholar
(2011) Signatures of environmental genetic adaptation pinpoint pathogens as the main selective pressure through human evolution
PLOS Genetics 7:e1002355.

https://doi.org/10.1371/journal.pgen.1002355
- PubMed
- Google Scholar
1. Gagneux S
2. DeRiemer K
3. Van T
4. Kato-Maeda M
5. de Jong BC
6. Narayanan S
7. Nicol M
8. Niemann S
9. Kremer K
10. Gutierrez MC
11. Hilty M
12. Hopewell PC
13. Small PM
(2006) Variable host-pathogen compatibility in Mycobacterium tuberculosis
PNAS 103:2869–2873.

https://doi.org/10.1073/pnas.0511240103
- PubMed
- Google Scholar
1. Gagneux S
(2018) Ecology and evolution of Mycobacterium tuberculosis
Nature Reviews. Microbiology 16:202–213.

https://doi.org/10.1038/nrmicro.2018.8
- PubMed
- Google Scholar
(2007) Inferring human population sizes, divergence times and rates of gene flow from mitochondrial, X and Y chromosome resequencing data
Genetics 177:2195–2207.

https://doi.org/10.1534/genetics.107.077495
- PubMed
- Google Scholar
1. Gröschel MI
2. Pérez-Llanos FJ
3. Diel R
4. Vargas R Jr
5. Escuyer V
6. Musser K
7. Trieu L
8. Meissner JS
9. Knorr J
10. Klinkenberg D
11. Kouw P
12. Homolka S
13. Samek W
14. Mathema B
15. van Soolingen D
16. Niemann S
17. Ahuja SD
18. Farhat MR
(2024) Differential rates of Mycobacterium tuberculosis transmission associate with host-pathogen sympatry
Nature Microbiology 9:2113–2127.

https://doi.org/10.1038/s41564-024-01758-y
- PubMed
- Google Scholar
1. Guyeux C
2. Senelle G
3. Le Meur A
4. Supply P
5. Gaudin C
6. Phelan JE
7. Clark TG
8. Rigouts L
9. de Jong B
10. Sola C
11. Refrégier G
(2024) Newly identified mycobacterium africanum lineage 10, Central Africa
Emerging Infectious Diseases 30:560–563.

https://doi.org/10.3201/eid3003.231466
- PubMed
- Google Scholar
(2017) Northeast African genomic variation shaped by the continuity of indigenous groups and Eurasian migrations
PLOS Genetics 13:e1006976.

https://doi.org/10.1371/journal.pgen.1006976
- PubMed
- Google Scholar
1. Holt KE
2. McAdam P
3. Thai PVK
4. Thuong NTT
5. Ha DTM
6. Lan NN
7. Lan NH
8. Nhu NTQ
9. Hai HT
10. Ha VTN
11. Thwaites G
12. Edwards DJ
13. Nath AP
14. Pham K
15. Ascher DB
16. Farrar J
17. Khor CC
18. Teo YY
19. Inouye M
20. Caws M
21. Dunstan SJ
(2018) Frequent transmission of the Mycobacterium tuberculosis Beijing lineage and positive selection for the EsxW Beijing variant in Vietnam
Nature Genetics 50:849–856.

https://doi.org/10.1038/s41588-018-0117-9
- PubMed
- Google Scholar
(2017) Alcohol consumption as a risk factor for tuberculosis: meta-analyses and burden of disease
The European Respiratory Journal 50:1700216.

https://doi.org/10.1183/13993003.00216-2017
- PubMed
- Google Scholar
Software
1. John JS
(2011) SeqPrep, version 575507b
Github.

https://github.com/jstjohn/SeqPrep
1. Koboldt DC
2. Zhang Q
3. Larson DE
4. Shen D
5. McLellan MD
6. Lin L
7. Miller CA
8. Mardis ER
9. Ding L
10. Wilson RK
(2012) VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing
Genome Research 22:568–576.

https://doi.org/10.1101/gr.129684.111
- PubMed
- Google Scholar
(2024) Outcomes for people with TB by disease severity at presentation
The International Journal of Tuberculosis and Lung Disease 28:142–147.

https://doi.org/10.5588/ijtld.23.0254
- PubMed
- Google Scholar
1. Lee JC
2. Espéli M
3. Anderson CA
4. Linterman MA
5. Pocock JM
6. Williams NJ
7. Roberts R
8. Viatte S
9. Fu B
10. Peshu N
11. Hien TT
12. Phu NH
13. Wesley E
14. Edwards C
15. Ahmad T
16. Mansfield JC
17. Gearry R
18. Dunstan S
19. Williams TN
20. Barton A
21. Vinuesa CG
22. Parkes M
23. Lyons PA
24. Smith KGC
25. UK IBD Genetics Consortium
(2013) Human SNP links differential outcomes in inflammatory and infectious disease to a FOXO3-regulated pathway
Cell 155:57–69.

https://doi.org/10.1016/j.cell.2013.08.034
- PubMed
- Google Scholar
1. Li H
2. Durbin R
(2009) Fast and accurate short read alignment with Burrows-Wheeler transform
Bioinformatics 25:1754–1760.

https://doi.org/10.1093/bioinformatics/btp324
- PubMed
- Google Scholar
1. Li H
(2011) A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data
Bioinformatics 27:2987–2993.

https://doi.org/10.1093/bioinformatics/btr509
- PubMed
- Google Scholar
Website
1. Llc M
(2026) Dar es Salaam, Tanzania Metro Area Population (1950–2026). Macrotrends
Accessed February 23, 2026.

https://www.macrotrends.net/global-metrics/cities/22894/dar-es-salaam/population
1. Loh P-R
2. Danecek P
3. Palamara PF
4. Fuchsberger C
5. A Reshef Y
6. K Finucane H
7. Schoenherr S
8. Forer L
9. McCarthy S
10. Abecasis GR
11. Durbin R
12. L Price A
(2016) Reference-based phasing using the haplotype reference consortium panel
Nature Genetics 48:1443–1448.

https://doi.org/10.1038/ng.3679
- PubMed
- Google Scholar
1. López B
2. Aguilar D
3. Orozco H
4. Burger M
5. Espitia C
6. Ritacco V
7. Barrera L
8. Kremer K
9. Hernandez-Pando R
10. Huygen K
11. van Soolingen D
(2003) A marked difference in pathogenesis and immune response induced by different Mycobacterium tuberculosis genotypes
Clinical and Experimental Immunology 133:30–37.

https://doi.org/10.1046/j.1365-2249.2003.02171.x
- PubMed
- Google Scholar
1. Macallan DC
(1999) Malnutrition in tuberculosis
Diagnostic Microbiology and Infectious Disease 34:153–157.

https://doi.org/10.1016/s0732-8893(99)00007-3
- PubMed
- Google Scholar
1. Mallick S
2. Li H
3. Lipson M
4. Mathieson I
5. Gymrek M
6. Racimo F
7. Zhao M
8. Chennagiri N
9. Nordenfelt S
10. Tandon A
11. Skoglund P
12. Lazaridis I
13. Sankararaman S
14. Fu Q
15. Rohland N
16. Renaud G
17. Erlich Y
18. Willems T
19. Gallo C
20. Spence JP
21. Song YS
22. Poletti G
23. Balloux F
24. van Driem G
25. de Knijff P
26. Romero IG
27. Jha AR
28. Behar DM
29. Bravi CM
30. Capelli C
31. Hervig T
32. Moreno-Estrada A
33. Posukh OL
34. Balanovska E
35. Balanovsky O
36. Karachanak-Yankova S
37. Sahakyan H
38. Toncheva D
39. Yepiskoposyan L
40. Tyler-Smith C
41. Xue Y
42. Abdullah MS
43. Ruiz-Linares A
44. Beall CM
45. Di Rienzo A
46. Jeong C
47. Starikovskaya EB
48. Metspalu E
49. Parik J
50. Villems R
51. Henn BM
52. Hodoglugil U
53. Mahley R
54. Sajantila A
55. Stamatoyannopoulos G
56. Wee JTS
57. Khusainova R
58. Khusnutdinova E
59. Litvinov S
60. Ayodo G
61. Comas D
62. Hammer MF
63. Kivisild T
64. Klitz W
65. Winkler CA
66. Labuda D
67. Bamshad M
68. Jorde LB
69. Tishkoff SA
70. Watkins WS
71. Metspalu M
72. Dryomov S
73. Sukernik R
74. Singh L
75. Thangaraj K
76. Pääbo S
77. Kelso J
78. Patterson N
79. Reich D
(2016) The simons genome diversity project: 300 genomes from 142 diverse populations
Nature 538:201–206.

https://doi.org/10.1038/nature18964
- PubMed
- Google Scholar
1. Manca C
2. Reed MB
3. Freeman S
4. Mathema B
5. Kreiswirth B
6. Barry CE III
7. Kaplan G
(2004) Differential monocyte activation underlies strain-specific Mycobacterium tuberculosis pathogenesis
Infection and Immunity 72:5511–5514.

https://doi.org/10.1128/IAI.72.9.5511-5514.2004
- PubMed
- Google Scholar
1. McHenry ML
2. Bartlett J
3. Igo RP Jr
4. Wampande EM
5. Benchek P
6. Mayanja-Kizza H
7. Fluegge K
8. Hall NB
9. Gagneux S
10. Tishkoff SA
11. Wejse C
12. Sirugo G
13. Boom WH
14. Joloba M
15. Williams SM
16. Stein CM
(2020a) Interaction between host genes and Mycobacterium tuberculosis lineage can affect tuberculosis severity: Evidence for coevolution?
PLOS Genetics 16:e1008728.

https://doi.org/10.1371/journal.pgen.1008728
- PubMed
- Google Scholar
(2020b) Genetics and evolution of tuberculosis pathogenesis: new perspectives and approaches
Infection, Genetics and Evolution 81:104204.

https://doi.org/10.1016/j.meegid.2020.104204
- PubMed
- Google Scholar
1. McHenry ML
2. Wampande EM
3. Joloba ML
4. Malone LL
5. Mayanja-Kizza H
6. Bush WS
7. Boom WH
8. Williams SM
9. Stein CM
(2021) Interaction between M. tuberculosis lineage and human genetic variants reveals novel pathway associations with severity of TB
Pathogens 10:1487.

https://doi.org/10.3390/pathogens10111487
- PubMed
- Google Scholar
1. McKenna A
2. Hanna M
3. Banks E
4. Sivachenko A
5. Cibulskis K
6. Kernytsky A
7. Garimella K
8. Altshuler D
9. Gabriel S
10. Daly M
11. DePristo MA
(2010) The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data
Genome Research 20:1297–1303.

https://doi.org/10.1101/gr.107524.110
- PubMed
- Google Scholar
1. Menardo F
2. Loiseau C
3. Brites D
4. Coscolla M
5. Gygli SM
6. Rutaihwa LK
7. Trauner A
8. Beisel C
9. Borrell S
10. Gagneux S
(2018) Treemmer: a tool to reduce large phylogenetic datasets with minimal loss of diversity
BMC Bioinformatics 19:164.

https://doi.org/10.1186/s12859-018-2164-8
- PubMed
- Google Scholar
1. Menardo F
2. Rutaihwa LK
3. Zwyer M
4. Borrell S
5. Comas I
6. Conceição EC
7. Coscolla M
8. Cox H
9. Joloba M
10. Dou HY
11. Feldmann J
12. Fenner L
13. Fyfe J
14. Gao Q
15. García de Viedma D
16. Garcia-Basteiro AL
17. Gygli SM
18. Hella J
19. Hiza H
20. Jugheli L
21. Kamwela L
22. Kato-Maeda M
23. Liu Q
24. Ley SD
25. Loiseau C
26. Mahasirimongkol S
27. Malla B
28. Palittapongarnpim P
29. Rakotosamimanana N
30. Rasolofo V
31. Reinhard M
32. Reither K
33. Sasamalo M
34. Silva Duarte R
35. Sola C
36. Suffys P
37. Batista Lima KV
38. Yeboah-Manu D
39. Beisel C
40. Brites D
41. Gagneux S
(2021) Local adaptation in populations of Mycobacterium tuberculosis endemic to the Indian Ocean Rim
F1000Research 10:60.

https://doi.org/10.12688/f1000research.28318.2
- PubMed
- Google Scholar
1. Michel AL
2. Bengis RG
3. Keet DF
4. Hofmeyr M
5. de Klerk L
6. Cross PC
7. Jolles AE
8. Cooper D
9. Whyte IJ
10. Buss P
11. Godfroid J
(2006) Wildlife tuberculosis in South African conservation areas: implications and challenges
Veterinary Microbiology 112:91–100.

https://doi.org/10.1016/j.vetmic.2005.11.035
- PubMed
- Google Scholar
1. Newport MJ
2. Finan C
(2011) Genome-wide association studies and susceptibility to infectious diseases
Briefings in Functional Genomics 10:98–107.

https://doi.org/10.1093/bfgp/elq037
- PubMed
- Google Scholar
1. Obenchain V
2. Lawrence M
3. Carey V
4. Gogarten S
5. Shannon P
6. Morgan M
(2014) VariantAnnotation: a Bioconductor package for exploration and annotation of genetic variants
Bioinformatics 30:2076–2078.

https://doi.org/10.1093/bioinformatics/btu168
- PubMed
- Google Scholar
(2012) “Lethal” combination of Mycobacterium tuberculosis Beijing genotype and human CD209 -336G allele in Russian male population
Infection, Genetics and Evolution 12:732–736.

https://doi.org/10.1016/j.meegid.2011.10.005
- PubMed
- Google Scholar
1. Patin E
2. Siddle KJ
3. Laval G
4. Quach H
5. Harmant C
6. Becker N
7. Froment A
8. Régnault B
9. Lemée L
10. Gravel S
11. Hombert J-M
12. Van der Veen L
13. Dominy NJ
14. Perry GH
15. Barreiro LB
16. Verdu P
17. Heyer E
18. Quintana-Murci L
(2014) The impact of agricultural emergence on the genetic history of African rainforest hunter-gatherers and agriculturalists
Nature Communications 5:3163.

https://doi.org/10.1038/ncomms4163
- PubMed
- Google Scholar
1. Patin E
2. Lopez M
3. Grollemund R
4. Verdu P
5. Harmant C
6. Quach H
7. Laval G
8. Perry GH
9. Barreiro LB
10. Froment A
11. Heyer E
12. Massougbodji A
13. Fortes-Lima C
14. Migot-Nabias F
15. Bellis G
16. Dugoujon J-M
17. Pereira JB
18. Fernandes V
19. Pereira L
20. Van der Veen L
21. Mouguiama-Daouda P
22. Bustamante CD
23. Hombert J-M
24. Quintana-Murci L
(2017) Dispersals and genetic adaptation of Bantu-speaking populations in Africa and North America
Science 356:543–546.

https://doi.org/10.1126/science.aal1988
- PubMed
- Google Scholar
(2023) Evolutionary genetics and admixture in african populations
Genome Biology and Evolution 15:evad054.

https://doi.org/10.1093/gbe/evad054
- PubMed
- Google Scholar
1. Phelan J
2. Gomez-Gonzalez PJ
3. Andreu N
4. Omae Y
5. Toyo-Oka L
6. Yanai H
7. Miyahara R
8. Nedsuwan S
9. de Sessions PF
10. Campino S
11. Sallah N
12. Parkhill J
13. Smittipat N
14. Palittapongarnpim P
15. Mushiroda T
16. Kubo M
17. Tokunaga K
18. Mahasirimongkol S
19. Hibberd ML
20. Clark TG
(2023) Genome-wide host-pathogen analyses reveal genetic interaction points in tuberculosis disease
Nature Communications 14:549.

https://doi.org/10.1038/s41467-023-36282-w
- PubMed
- Google Scholar
(2012) A genome wide association study of pulmonary tuberculosis susceptibility in Indonesians
BMC Medical Genetics 13:5.

https://doi.org/10.1186/1471-2350-13-5
- PubMed
- Google Scholar
Software
1. pysam-developers
(2026) Pysam, version 7929385
Github.

https://github.com/pysam-developers/pysam
1. Ralph AP
2. Ardian M
3. Wiguna A
4. Maguire GP
5. Becker NG
6. Drogumuller G
7. Wilks MJ
8. Waramori G
9. Tjitra E
10. Kenagalem E
11. Pontororing GJ
12. Anstey NM
13. Kelly PM
(2010) A simple, valid, numerical score for grading chest x-ray severity in adult smear-positive pulmonary tuberculosis
Thorax 65:863–869.

https://doi.org/10.1136/thx.2010.136242
- PubMed
- Google Scholar
1. Reed FA
2. Tishkoff SA
(2006) African human diversity, origins and migrations
Current Opinion in Genetics & Development 16:597–605.

https://doi.org/10.1016/j.gde.2006.10.008
- PubMed
- Google Scholar
1. Reed MB
2. Pichler VK
3. McIntosh F
4. Mattia A
5. Fallow A
6. Masala S
7. Domenech P
8. Zwerling A
9. Thibert L
10. Menzies D
11. Schwartzman K
12. Behr MA
(2009) Major Mycobacterium tuberculosis lineages associate with patient country of origin
Journal of Clinical Microbiology 47:1119–1128.

https://doi.org/10.1128/JCM.02142-08
- PubMed
- Google Scholar
1. Rutaihwa LK
2. Menardo F
3. Stucki D
4. Gygli SM
5. Ley SD
6. Malla B
7. Feldmann J
8. Borrell S
9. Beisel C
10. Middelkoop K
11. Carter EJ
12. Diero L
13. Ballif M
14. Jugheli L
15. Reither K
16. Fenner L
17. Brites D
18. Gagneux S
(2019) Multiple introductions of Mycobacterium tuberculosis lineage 2–Beijing into africa over centuries
Frontiers in Ecology and Evolution 7:112.

https://doi.org/10.3389/fevo.2019.00112
- Google Scholar
1. Schlebusch CM
2. Skoglund P
3. Sjödin P
4. Gattepaille LM
5. Hernandez D
6. Jay F
7. Li S
8. De Jongh M
9. Singleton A
10. Blum MGB
11. Soodyall H
12. Jakobsson M
(2012) Genomic variation in seven Khoe-San groups reveals adaptation and complex African history
Science 338:374–379.

https://doi.org/10.1126/science.1227721
- PubMed
- Google Scholar
1. Selwyn PA
2. Hartel D
3. Lewis VA
4. Schoenbaum EE
5. Vermund SH
6. Klein RS
7. Walker AT
8. Friedland GH
(1989) A prospective study of the risk of tuberculosis among intravenous drug users with human immunodeficiency virus infection
The New England Journal of Medicine 320:545–550.

https://doi.org/10.1056/NEJM198903023200901
- PubMed
- Google Scholar
1. Semo A
2. Gayà-Vidal M
3. Fortes-Lima C
4. Alard B
5. Oliveira S
6. Almeida J
7. Prista A
8. Damasceno A
9. Fehn AM
10. Schlebusch C
11. Rocha J
(2020) Along the indian ocean coast: genomic variation in mozambique provides new insights into the bantu expansion
Molecular Biology and Evolution 37:406–416.

https://doi.org/10.1093/molbev/msz224
- PubMed
- Google Scholar
(2021) Genetic substructure and complex demographic history of South African Bantu speakers
Nature Communications 12:2080.

https://doi.org/10.1038/s41467-021-22207-y
- PubMed
- Google Scholar
1. Silva ML
2. Cá B
3. Osório NS
4. Rodrigues PNS
5. Maceiras AR
6. Saraiva M
(2022) Tuberculosis caused by Mycobacterium africanum: Knowns and unknowns
PLOS Pathogens 18:e1010490.

https://doi.org/10.1371/journal.ppat.1010490
- PubMed
- Google Scholar
(1993) Tuberculosis and poverty
BMJ 307:759–761.

https://doi.org/10.1136/bmj.307.6907.759
- Google Scholar
1. Stanley S
2. Spaulding CN
3. Liu Q
4. Chase MR
5. Ha DTM
6. Thai PVK
7. Lan NH
8. Thu DDA
9. Quang NL
10. Brown J
11. Hicks ND
12. Wang X
13. Marin M
14. Howard NC
15. Vickers AJ
16. Karpinski WM
17. Chao MC
18. Farhat MR
19. Caws M
20. Dunstan SJ
21. Thuong NTT
22. Fortune SM
(2024) Identification of bacterial determinants of tuberculosis infection and treatment outcomes: a phenogenomic analysis of clinical strains
The Lancet. Microbe 5:e570–e580.

https://doi.org/10.1016/S2666-5247(24)00022-3
- PubMed
- Google Scholar
1. Stein CM
2. Sausville L
3. Wejse C
4. Sobota RS
5. Zetola NM
6. Hill PC
7. Boom WH
8. Scott WK
9. Sirugo G
10. Williams SM
(2017) Genomics of human pulmonary tuberculosis: from genes to pathways
Current Genetic Medicine Reports 5:149–166.

https://doi.org/10.1007/s40142-017-0130-9
- PubMed
- Google Scholar
1. Steiner A
2. Stucki D
3. Coscolla M
4. Borrell S
5. Gagneux S
(2014) KvarQ: targeted and direct variant calling from fastq reads of bacterial genomes
BMC Genomics 15:881.

https://doi.org/10.1186/1471-2164-15-881
- PubMed
- Google Scholar
1. Stucki D
2. Brites D
3. Jeljeli L
4. Coscolla M
5. Liu Q
6. Trauner A
7. Fenner L
8. Rutaihwa L
9. Borrell S
10. Luo T
11. Gao Q
12. Kato-Maeda M
13. Ballif M
14. Egger M
15. Macedo R
16. Mardassi H
17. Moreno M
18. Tudo Vilanova G
19. Fyfe J
20. Globan M
21. Thomas J
22. Jamieson F
23. Guthrie JL
24. Asante-Poku A
25. Yeboah-Manu D
26. Wampande E
27. Ssengooba W
28. Joloba M
29. Henry Boom W
30. Basu I
31. Bower J
32. Saraiva M
33. Vaconcellos SEG
34. Suffys P
35. Koch A
36. Wilkinson R
37. Gail-Bekker L
38. Malla B
39. Ley SD
40. Beck H-P
41. de Jong BC
42. Toit K
43. Sanchez-Padilla E
44. Bonnet M
45. Gil-Brusola A
46. Frank M
47. Penlap Beng VN
48. Eisenach K
49. Alani I
50. Wangui Ndung’u P
51. Revathi G
52. Gehre F
53. Akter S
54. Ntoumi F
55. Stewart-Isherwood L
56. Ntinginya NE
57. Rachow A
58. Hoelscher M
59. Cirillo DM
60. Skenders G
61. Hoffner S
62. Bakonyte D
63. Stakenas P
64. Diel R
65. Crudu V
66. Moldovan O
67. Al-Hajoj S
68. Otero L
69. Barletta F
70. Jane Carter E
71. Diero L
72. Supply P
73. Comas I
74. Niemann S
75. Gagneux S
(2016) Mycobacterium tuberculosis lineage 4 comprises globally distributed and geographically restricted sublineages
Nature Genetics 48:1535–1543.

https://doi.org/10.1038/ng.3704
- PubMed
- Google Scholar
1. Tishkoff SA
2. Verrelli BC
(2003) Patterns of human genetic diversity: implications for human evolutionary history and disease
Annual Review of Genomics and Human Genetics 4:293–340.

https://doi.org/10.1146/annurev.genom.4.070802.110226
- PubMed
- Google Scholar
1. Tishkoff SA
2. Reed FA
3. Friedlaender FR
4. Ehret C
5. Ranciaro A
6. Froment A
7. Hirbo JB
8. Awomoyi AA
9. Bodo J-M
10. Doumbo O
11. Ibrahim M
12. Juma AT
13. Kotze MJ
14. Lema G
15. Moore JH
16. Mortensen H
17. Nyambo TB
18. Omar SA
19. Powell K
20. Pretorius GS
21. Smith MW
22. Thera MA
23. Wambebe C
24. Weber JL
25. Williams SM
(2009) The genetic structure and history of Africans and African Americans
Science 324:1035–1044.

https://doi.org/10.1126/science.1172257
- PubMed
- Google Scholar
1. Uren C
2. Möller M
3. van Helden PD
4. Henn BM
5. Hoal EG
(2017) Population structure and infectious disease risk in southern Africa
Molecular Genetics and Genomics 292:499–509.

https://doi.org/10.1007/s00438-017-1296-2
- PubMed
- Google Scholar
Software
(2022) Compositions: compositional data analysis 2022, version 2.0-9
CRAN.

https://cran.r-project.org/web/packages/compositions/index.html
1. Wang K
2. Goldstein S
3. Bleasdale M
4. Clist B
5. Bostoen K
6. Bakwa-Lufu P
7. Buck LT
8. Crowther A
9. Dème A
10. McIntosh RJ
11. Mercader J
12. Ogola C
13. Power RC
14. Sawchuk E
15. Robertshaw P
16. Wilmsen EN
17. Petraglia M
18. Ndiema E
19. Manthi FK
20. Krause J
21. Roberts P
22. Boivin N
23. Schiffels S
(2020) Ancient genomes reveal complex patterns of population movement, interaction, and replacement in sub-Saharan Africa
Science Advances 6:eaaz0183.

https://doi.org/10.1126/sciadv.aaz0183
- PubMed
- Google Scholar
1. Wejse C
2. Gustafson P
3. Nielsen J
4. Gomes VF
5. Aaby P
6. Andersen PL
7. Sodemann M
(2008) TBscore: Signs and symptoms from tuberculosis patients in a low-resource setting have predictive value and may be used to assess clinical course
Scandinavian Journal of Infectious Diseases 40:111–120.

https://doi.org/10.1080/00365540701558698
- PubMed
- Google Scholar
Report
1. WHO
(2023) Global Tuberculosis Report
World Health Organization.

https://www.who.int/teams/global-programme-on-tuberculosis-and-lung-health/tb-reports/global-tuberculosis-report-2023
- Google Scholar
1. Wiens KE
2. Woyczynski LP
3. Ledesma JR
4. Ross JM
5. Zenteno-Cuevas R
6. Goodridge A
7. Ullah I
8. Mathema B
9. Djoba Siawaya JF
10. Biehl MH
11. Ray SE
12. Bhattacharjee NV
13. Henry NJ
14. Reiner RC Jr
15. Kyu HH
16. Murray CJL
17. Hay SI
(2018) Global variation in bacterial strains that cause tuberculosis disease: a systematic review and meta-analysis
BMC Medicine 16:196.

https://doi.org/10.1186/s12916-018-1180-x
- PubMed
- Google Scholar
Website
1. World Health Organization
(2021) Catalogue of mutations in Mycobacterium tuberculosis complex and their association with drug resistance
Accessed June 25, 2021.

https://www.who.int/publications/i/item/9789240028173
1. Xu ZM
2. Rüeger S
3. Zwyer M
4. Brites D
5. Hiza H
6. Reinhard M
7. Rutaihwa L
8. Borrell S
9. Isihaka F
10. Temba H
11. Maroa T
12. Naftari R
13. Hella J
14. Sasamalo M
15. Reither K
16. Portevin D
17. Gagneux S
18. Fellay J
(2022) Using population-specific add-on polymorphisms to improve genotype imputation in underrepresented populations
PLOS Computational Biology 18:e1009628.

https://doi.org/10.1371/journal.pcbi.1009628
- PubMed
- Google Scholar
1. Xu ZM
2. Zwyer M
3. Hiza H
4. Schmidiger S
5. Sasamalo M
6. Reinhard M
7. Doetsch A
8. Borrell S
9. Naret O
10. Rüeger S
11. Lawless D
12. Tang S
13. Isihaka F
14. Temba H
15. Maroa T
16. Naftari R
17. Beisel C
18. Hella J
19. Reither K
20. Brites D
21. Portevin D
22. Gagneux S
23. Fellay J
(2025) Genome-to-genome analysis reveals associations between human and mycobacterial genetic variation in tuberculosis patients from Tanzania
BMC Medical Genomics 18:99.

https://doi.org/10.1186/s12920-025-02164-x
- PubMed
- Google Scholar
1. Zeileis A
2. Hothorn T
(2002)
Diagnostic checking in regression relationships

R News 2:7–10.
- Google Scholar
1. Zheng R
2. Li Z
3. He F
4. Liu H
5. Chen J
6. Chen J
7. Xie X
8. Zhou J
9. Chen H
10. Wu X
11. Wu J
12. Chen B
13. Liu Y
14. Cui H
15. Fan L
16. Sha W
17. Liu Y
18. Wang J
19. Huang X
20. Zhang L
21. Xu F
22. Wang J
23. Feng Y
24. Qin L
25. Yang H
26. Liu Z
27. Cui Z
28. Liu F
29. Chen X
30. Gao S
31. Sun S
32. Shi Y
33. Ge B
(2018) Genome-wide association study identifies two risk loci for tuberculosis in Han Chinese
Nature Communications 9:4072.

https://doi.org/10.1038/s41467-018-06539-w
- PubMed
- Google Scholar
(2021) A new nomenclature for the livestock-associated Mycobacterium tuberculosis complex based on phylogenomics
Open Research Europe 1:100.

https://doi.org/10.12688/openreseurope.14029.2
- PubMed
- Google Scholar
1. Zwyer M
2. Rutaihwa LK
3. Windels E
4. Hella J
5. Menardo F
6. Sasamalo M
7. Sommer G
8. Schmülling L
9. Borrell S
10. Reinhard M
11. Dötsch A
12. Hiza H
13. Stritt C
14. Sikalengo G
15. Fenner L
16. De Jong BC
17. Kato-Maeda M
18. Jugheli L
19. Ernst JD
20. Niemann S
21. Jeljeli L
22. Ballif M
23. Egger M
24. Rakotosamimanana N
25. Yeboah-Manu D
26. Asare P
27. Malla B
28. Dou HY
29. Zetola N
30. Wilkinson RJ
31. Cox H
32. Carter EJ
33. Gnokoro J
34. Yotebieng M
35. Gotuzzo E
36. Abimiku A
37. Avihingsanon A
38. Xu ZM
39. Fellay J
40. Portevin D
41. Reither K
42. Stadler T
43. Gagneux S
44. Brites D
(2023) Back-to-Africa introductions of Mycobacterium tuberculosis as the main cause of tuberculosis in Dar es Salaam, Tanzania
PLOS Pathogens 19:e1010893.

https://doi.org/10.1371/journal.ppat.1010893
- PubMed
- Google Scholar
Software
1. Zwyer M
(2026) TB-dar_mtb, version swh:1:rev:7df5cba38dfee39493433d209a5dd41908bc5c1d
Software Heritage.

https://archive.softwareheritage.org/swh:1:dir:d3e82d79496f15682ee1d98d4f9a15275fccbdf8;origin=https://github.com/mzwyer/TB-Dar_Mtb;visit=swh:1:snp:cca76154ca9ac26628c945fcc74f6d99694f6880;anchor=swh:1:rev:7df5cba38dfee39493433d209a5dd41908bc5c1d

Article and author information

Author details

Michaela Zwyer
1. Swiss Tropical and Public Health Institute, Allschwil, Switzerland
2. University of Basel, Basel, Switzerland
Contribution
Conceptualization, Data curation, Software, Formal analysis, Investigation, Visualization, Methodology, Writing – original draft, Writing – review and editing

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0002-3864-1503
Zhi Ming Xu
1. Swiss Institute of Bioinformatics, Lausanne, Switzerland
2. School of Life Sciences, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
Contribution
Formal analysis, Writing – review and editing

Competing interests
No competing interests declared
Amanda Ross
1. Swiss Tropical and Public Health Institute, Allschwil, Switzerland
2. University of Basel, Basel, Switzerland
Contribution
Formal analysis, Writing – review and editing

Competing interests
No competing interests declared
Jerry Hella

Department of Intervention and Clinical Trials, Ifakara Health Institute, Ifakara, United Republic of Tanzania

Contribution
Resources, Methodology, Project administration

Competing interests
No competing interests declared
Mohamed Sasamalo

Department of Intervention and Clinical Trials, Ifakara Health Institute, Ifakara, United Republic of Tanzania

Contribution
Resources, Data curation, Formal analysis, Investigation, Methodology

Competing interests
No competing interests declared
Maxime Rotival

Human Evolutionary Genetics Unit, Institut Pasteur, Université Paris Cité, Paris, France

Contribution
Formal analysis, Methodology, Writing – review and editing

Competing interests
No competing interests declared
Hellen Charles Hiza
1. Swiss Tropical and Public Health Institute, Allschwil, Switzerland
2. University of Basel, Basel, Switzerland
Contribution
Resources, Data curation, Formal analysis, Methodology

Competing interests
No competing interests declared
Liliana K Rutaihwa

FIND, Foundation for Innovative New Diagnostics, Geneva, Switzerland

Contribution
Resources

Competing interests
No competing interests declared
Sonia Borrell
1. Swiss Tropical and Public Health Institute, Allschwil, Switzerland
2. University of Basel, Basel, Switzerland
Contribution
Resources, Project administration

Competing interests
No competing interests declared
Klaus Reither
1. Swiss Tropical and Public Health Institute, Allschwil, Switzerland
2. University of Basel, Basel, Switzerland
Contribution
Funding acquisition, Investigation, Project administration, Writing – review and editing

Competing interests
No competing interests declared
Jacques Fellay
1. Swiss Institute of Bioinformatics, Lausanne, Switzerland
2. School of Life Sciences, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
3. Precision Medicine Unit, Lausanne University Hospital and University of Lausanne, Lausanne, Switzerland
Contribution
Conceptualization, Funding acquisition, Investigation, Methodology, Project administration, Writing – review and editing

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0002-8240-939X
Damien Portevin
1. Swiss Tropical and Public Health Institute, Allschwil, Switzerland
2. University of Basel, Basel, Switzerland
Contribution
Conceptualization, Funding acquisition, Investigation, Methodology, Project administration, Writing – review and editing

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0003-2949-9557
Lluis Quintana-Murci
1. Human Evolutionary Genetics Unit, Institut Pasteur, Université Paris Cité, Paris, France
2. Chair of Human Genomics and Evolution, Collège de France, Paris, France
Contribution
Formal analysis, Supervision, Methodology, Writing – review and editing

Competing interests
No competing interests declared
Sebastien Gagneux
1. Swiss Tropical and Public Health Institute, Allschwil, Switzerland
2. University of Basel, Basel, Switzerland
Contribution
Conceptualization, Supervision, Funding acquisition, Investigation, Methodology, Writing – original draft, Project administration, Writing – review and editing

For correspondence
sebastien.gagneux@swisstph.ch

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0001-7783-9048
Daniela Brites
1. Swiss Tropical and Public Health Institute, Allschwil, Switzerland
2. University of Basel, Basel, Switzerland
Contribution
Conceptualization, Resources, Data curation, Formal analysis, Supervision, Funding acquisition, Investigation, Methodology, Writing – original draft, Writing – review and editing

For correspondence
d.brites@swisstph.ch

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0002-8090-2287

Funding

Schweizerischer Nationalfonds zur Förderung der Wissenschaftlichen Forschung (CRSII5_177163)

Klaus Reither
Jacques Fellay
Sebastien Gagneux

Schweizerischer Nationalfonds zur Förderung der Wissenschaftlichen Forschung (CRSII5_213514)

Klaus Reither
Jacques Fellay
Damien Portevin

Schweizerischer Nationalfonds zur Förderung der Wissenschaftlichen Forschung (10001893)

Sebastien Gagneux

European Research Council (883582)

Sebastien Gagneux

Schweizerischer Nationalfonds zur Förderung der Wissenschaftlichen Forschung (10000213)

Sebastien Gagneux

The funders had no role in study design, data collection, and interpretation, or the decision to submit the work for publication.

Acknowledgements

Calculations were performed at the sciCORE (http://scicore.unibas.ch/) scientific computing core facility at University of Basel. We thank all TB-DAR staff and study participants. Swiss National Science Foundation (Grant Nos. CRSII5_177163, CRSII5_213514, 10000213, and 10001893) and the European Research Council (Grant No. 883582). The funders had no role in the study design, data collection, and analysis, decision to publish, or preparation of the manuscript.

Ethics

Ethical approval for the TB-DAR cohort has been obtained from the Ethikkommission Nordwest- und Zentralschweiz (EKNZ UBE-15/42), the Ifakara Health Institute-Institutional Review Board (IHI/IRB/EXT/No. 24-2020) and the National Institute for Medical Research in Tanzania-Medical Research Coordinating Committee (NIMR/HQ/R.8c/Vol.I/1622). A written informed consent has been obtained from every patient who has been recruited into the TB-DAR cohort.

Version history

Sent for peer review: October 21, 2024
Preprint posted: October 26, 2024
Reviewed Preprint version 1: January 6, 2025
Reviewed Preprint version 2: August 6, 2025
Version of Record published: March 24, 2026

Cite all versions

You can cite all versions using the DOI https://doi.org/10.7554/eLife.103533. This DOI represents all versions, and will always resolve to the latest one.

Copyright

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.