Introduction

The study of tuberculosis (TB) pioneered infectious disease research in the modern scientific era, contributing to the formulation of Koch’s postulates demonstrating that an illness can have an infectious origin (1). Mycobacterium tuberculosis (Mtb) infects and survives within macrophages, subverting the host immune response by multiple mechanisms including inhibition of phagosome maturation and downregulation of antigen presenting molecules, leading to the formation of complex immune aggregates, known as granulomas (2). Even though TB was the first definitively identified infection, it remains the world’s deadliest infectious disease despite well over 100 years of research (3). This contrast raises a critical question: why is Mtb proving so resistant to human efforts to control it?

Mtb has eluded attempts to develop a fully protective vaccine, despite a partially effective vaccine being available since the 1920’s. Mycobacterium bovis BCG was developed by sequential culture of M bovis and protects children against disseminated TB but has limited protection against adult disease (4, 5). Although exciting progress has been made with a vaccine that reduces progression to overt TB disease by 50% when given to those with immunological evidence of latent infection in a phase II study (6), currently the immune mechanisms underpinning protection have not been identified. Subsequently, a major trial of an alternative vaccine showed no efficacy in preventing recurrence. Indeed, despite being highly immunogenic, the relapse rate tended to be higher in the vaccine group (7). Similarly, repeat BCG vaccination does not increase protection, despite inducing a strong Mtb-specific CD4 T cell response (8).

These difficulties highlight the priority of understanding the host-pathogen interaction more fully. We have insufficient knowledge of key steps in disease progression to develop transformative interventions. Heterogeneity across the spectrum of human TB is well described (9, 10), but the majority of fundamental investigations into disease mechanisms are based on the premise of a consistent underlying process, and this can be understood through reductionist scientific approaches. However, clinical observations demonstrate that there are multiple paths to TB, and so seeking to define a single mechanism is likely to be flawed.

Immunological insights from historic and recent clinical observations

TB has co-evolved with humans for millennia, with some estimates suggesting up to 70,000 years (11), though other analyses suggest the most recent common ancestor was ~6,000 years before present (12, 13). The field of paleoarchaeology provides extensive evidence of TB from the early neolithic period in the Middle East, with approximately 5% of skeletons from a 10,000-year-old village showing signs of TB (14). This was just before animal domestication and pottery, in huntergatherers who built stone houses, and so Mtb was already successfully transmitting in humans before the subsequent population growth that occurred with farming (14). Potentially, to survive in relatively small hunter-gatherer communities Mtb may have needed to have reduced virulence to avoid excessive deaths and a latent period to permit sustainable transmission in low population numbers (15). With the expansion and increased density of human populations, more rapid progression to TB disease can be sustained, consistent with analysis that most TB progression in high incidence settings occurs within 1–2 years of exposure (16).

Mtb successfully persisted over the ages and then flourished in the crowded populations that occurred with the industrial revolution (17), giving rise to the modern TB era approximately 250 years ago. The fundamental cause of TB remained unknown until Koch’s seminal work (1). Early investigations demonstrated that the first Mtb infection point was the lung base, while Mtb exits from the apices of the lungs (18). This life cycle must involve several distinct host/pathogen interactions, as initially immune evasion is required for Mtb to survive, but then later immune engagement is necessary to cause the inflammation and lung destruction needed to optimise transmission (19). Cavitary lung disease leads to proliferation of extracellular bacteria and increased transmission (20). Notably, most people (approximately 90%) initially infected with Mtb never progress to active, clinical disease (3). In addition, in the pre-antibiotic era, the progression and regression of different lesions in the same individual were observed on chest radiographs, and one third of patients with active TB disease self-healed, showing that the hostpathogen interaction is finely balanced at all stages of infection (17).

Modern immunological techniques and the development of biologic therapies that target specific immune processes have provided extensive insight into TB disease mechanisms. The greatly increased occurrence of TB in the context of HIV co-infection, for example, highlighted immunodeficiency as a major driver of disease (21). Similarly, the occurrence of TB after anti-TNF-α therapy for autoimmune disease confirmed the importance of TNF-α in control of latent infection (22). Furthermore, genetic investigations have identified numerous immunodeficiencies via studies of Mendelian Susceptibility to Mycobacterial Diseases (MSMD), with mutations typically along the IL-12/IFN-γ/STAT signalling pathway (2325). With less clearly defined immunologic mechanisms, malnutrition is a significant risk factor for TB (26), and food supplementation reduces TB incidence in contacts (27). Therefore, diverse immune deficiencies can lead to active TB.

The vast majority of patients who develop TB, however, have no clear identifiable immunodeficiency. Indeed, Comstock’s seminal study from the 1970s showed that children from Haiti with the strongest recall responses to Mtb antigens actually had the greatest subsequent risk of developing TB (28). These observations have been replicated in more recent studies using IFN-γ release assays (IGRAs) in response to TB antigens, where higher IFN-y production associates with increased risk of progressing to disease in both children and adults (29, 30). TB is commonest in young adults in their immunological prime, and more frequent in males than females, characterised by an excessive inflammatory response (31). The implication that TB can also be caused by immune excess is now supported by recently introduced cancer immunotherapies (32). Anti-PD-1 treatment, which activates the immune response and represents the immunological opposite to anti-TNF therapy, should control TB if immunodeficiency were the critical component. However, anti-PD-1 treatment can lead to rapid reactivation of latent TB infection, first identified in case reports (33). This finding is supported by studies in mice (34, 35), the non-human primate (36) and 3D cellular models (37), and ultimately has been validated by patient registry studies (38, 39). Similarly, type II diabetes is associated with an increased risk of TB (26), characterized by a hyper-inflammatory immune response (40). Therefore, diverse clinical evidence demonstrates that there are multiple immunological disturbances that can lead to TB disease (41) (Figure 1).

Multiple pathways can lead to TB disease, from opposing immunological extremes.

Generally, these can be viewed as immune deficiencies or immune excess, but the majority of patients who develop TB are relatively young with a competent immune response, illustrating the complexity of this spectrum. Though not quantitative, font size relates to contribution to global incidence. Left arrow: Ghon focus at lung base; Right arrow: cavity at lung apex.

Even when TB develops in the face of a “normal” immune system, there are likely to be many different subgroups that have not yet been identified due to limitations in standard immunological profiling. A quarter of the world’s population are thought to be exposed to Mtb (42), and so the vast majority of those infected with Mtb remain healthy lifelong (3), while those who progress to active TB likely represent distinct outlier populations. The diverse causes of active TB, such as anti-TNF, MSMD, HIV, diabetes and anti-PD1, demonstrate that patients do not reach TB by a single pathway, but instead lose the immunological balance that controls Mtb by multiple paths. These observations may explain why genetic studies have generally failed to find consistent predispositions. Evidence of heritability can be demonstrated but is often hard to validate in different populations (43). Mutations in the macrophage endosomal protein NRAMP1, for example, were shown to associate with TB in an early seminal study (44), but since then few consistent traits of genetic susceptibility to TB have been identified (45).

Recent genomic studies demonstrate the complex co-evolution of host and Mtb. The International Tuberculosis Host Genetics Consortium’s (ITHGC) first analysis found one significant host genetic variant, human leukocyte antigen (HLA)-II region (rs28383206), which conferred TB susceptibility across nine genome-wide association studies (GWAS) across three continents (43). However, other variants previously associated with TB susceptibility were not replicated. Another approach using genome-to-genome analysis of paired human and Mtb genomes in Peru identified another determinant on chromosome 6, rs3130660, in the flotillin-1 (FLOT1) gene (46). Together with our understanding of the co-adaptation of different Mtb lineages with human migrations (47), the host immune response to Mtb is likely to be dependent on ancestral-related genetic factors and complex host-pathogen dynamics which remain incompletely understood.

Studies of individuals who resist Mtb infection despite recurrent exposure generates a diverse list of potentially protective features, including T cell subsets or activation (4850), TNF-α responses (51), antibody activity (52) and innate immune training (53). These human observations are supported by murine genetic experiments, which show that increased susceptibility to Mtb can result from a wide range of immunological alterations (54). Similarly, infant vaccine studies show that there are distinct patterns of response, that may determine vaccine efficacy (55). However, despite this evidence of diversity, the majority of fundamental studies continue to seek a single underlying mechanism that leads to TB disease progression, which is incompatible with clinical observations.

Ultimately, to transmit efficiently, Mtb needs to cause pulmonary disease to then spread by airborne droplets (20), and for Mtb it does not matter the route taken, as long as it ends at pulmonary TB. Increasing evidence suggests that asymptomatic transmission may be important in high incidence settings, potentially in the absence of overt pulmonary disease (5658). Experimentally, TB is a fundamentally challenging disease to model, as the interaction is prolonged and Mtb is an obligate human pathogen (59). The primary driver of advances in immunology in the last decades has been transgenic mice (60), but the mouse model of TB does not accurately reflect human disease (61). Disease heterogeneity has been highlighted in describing clinical TB endotypes observed during active disease (62), with different endotypes exhibiting diverse immunological characteristics and association with outcome (63). However, we propose that insufficient attention has been given to the different immunological pathways that may converge on the same disease phenotype.

Mtb’s single successful establishment in humanity

Just as new tools are providing insights into the complexity of human TB progression, advances in mycobacterial genomics are highlighting the unique nature of the human-Mtb relationship (64). Mtb is a near clonal organism, with evidence suggesting that there has been only one successful and sustained penetration into the human population (47, 64, 65). The entire spectrum of Mtb strains globally only differs by a total of approximately 2,000 SNPs (64). This genetic conservation has persisted from the most recent common ancestor (66) during the expansion of Mtb in human populations since the industrial revolution, which created much denser human aggregations suitable for transmission (17). This suggests that Mtb was already close to being optimised for human hosts. Further evolution may have been to increase transmission within specific populations as humans diverged genetically (65), with a recent study suggesting that in fact Mtb may be becoming attenuated to increase spread in populations (67).

A very similar organism, M canetti, can cause disease but cannot transmit from human to human (68). Similarly, M bovis is 99.9% identical to Mtb (69), but has never achieved sustained human-to-human transmission, despite many millions of human exposures during the prepasteurisation era and common lymph node infections (65). Therefore, human TB is caused exclusively by Mtb, unlike other very closely related mycobacteria, which maintain infection cycles in other higher mammals but not humans (65). Clues to the key pathogenic mechanisms may lie in the differences between mycobacterial species (70), but ultimately similarities between Mtb strains must be critical to their ongoing success. For example, the hyper conservation of T cell epitopes may be evidence that Mtb manipulates the host immune response to favour disease and transmission (71). Given the size of the Mtb genome, identifying the critical conserved features will be challenging, especially as half of its genes still lack a known function 25 years after it was first sequenced (72).

One highly intriguing proposition is that latent TB may itself give humans an evolutionary advantage (72). Exposure to Mtb modifies innate immune training via reprogramming hematopoietic stem cells (73), and so a mechanism whereby Mtb could protect from other fatal infections is plausible. Similarly, Mbovis BCG, a live-attenuated strain of M bovis, causes innate immune training (74), suggesting this effect is common to Mtb and BCG. BCG vaccination reduces mortality much more significantly than can be explained by the effect on TB incidence alone (75), with a protective effect confirmed in diverse studies (76). For example, BCG reduces viral infections in infants in Africa (77) and experimentally challenged adults (78), and associates with improved survival in Europe (79). Mtb and BCG can protect against SARS-CoV2 infection (80, 81), although this did not translate to efficacy in a clinical trial (82). Together, these observations strongly imply that mycobacterial infection protects from other infectious causes of death.

Mtb almost certainly first became established in humans in East Africa (47, 65), potentially about 70,000 years ago (11), although this timeline is debated. Several human migrations out of Africa are thought to have preceded this date, but all ultimately became extinct, with the first sustained human dispersal 60 – 70,000 years ago (83, 84). Mtb diversity mirrors human mitochondrial genome diversity, further implying that Mtb disseminated with human populations from East Africa (47). The primary selective pressure in these early communities would have been infectious disease (85). This raises a novel hypothesis that a survival advantage for the first successful human migrants out of Africa was Mtb circulating in the community, reducing mortality from other infectious diseases and thereby enabling sustainable population growth.

Mtb transmission may have benefitted humans by increasing innate immune resistance to infection at the cost of 10% disease penetrance that permits Mtb propagation. This selective pressure over many millennia would progressively remove genotypes that lead to high susceptibility to TB, but equally would select against individuals with complete resistance to initial Mtb infection (Figure 2). This could explain why consistent genetic traits for susceptibility or resistance to TB have been hard to identify (45). More recent mass infection events, such as the smallpox epidemics that killed approximately 25% of the population (85), may have further favoured individuals immunologically trained by Mtb. Likewise, successive waves of plague killed approximately 25% of the population, at the end of the Roman empire and then in the European middle ages, adding to selective pressure from endemic infections (86, 87). If humans have been selected to be permissive to Mtb infection but resistant to TB disease, which could be regarded as colonisation, it suggests the development of active disease must be a relatively unusual event in a subset of outlier individuals. In some sense, we could be regarded as having a symbiotic relationship with Mtb, with disease representing a necessary evil, caused by an imbalance in the predominantly stable host pathogen interaction (88, 89).

Selection pressure of prolonged co-evolution favours individuals permissive to asymptomatic Mtb colonisation but resistant to active disease.

Over millennia, Mtb circulation in society will remove genetic traits that cause high susceptibility to active TB infection. Perhaps less intuitively, if Mtb generates trained immunity that protects against other fatal diseases, individuals with low susceptibility to initial Mtb infection will also be selected against due to increased mortality from other infections. The resulting population would then reflect modern humans; highly susceptible to initial Mtb colonisation but with low susceptibility to TB disease.

The host-pathogen interaction at a cellular level

The early histological era described the wide range of human TB lesions and granuloma types, and identified TB as a disease characterised by spatial organisation (90). Classically, the granuloma has been proposed to be where the outcome of the host-pathogen interaction is determined (91). Recent methodological advances are permitting much greater dissection of events and further highlight the importance of spatial organisation within the granuloma (9294). However, just as the early X-ray era showed some lesions progressing and some regressing, these studies demonstrate the great heterogeneity between granuloma types. Studies in the nonhuman primate have allowed investigation into features of progressing and controlling granulomas, identifying potential correlates of immune control (95), but even this model only partially recapitulates human disease.

Furthermore, the recent spatial studies highlight the complexity of cellular players, including the established fulcrum of macrophages and T cells, but additionally the importance of B cells, neutrophils and fibroblasts. For example, fibroblasts are emerging as important immune regulators in other lung diseases (96), and so it seems highly likely that they play an active role in TB-related inflammation. Fibroblast zonation can lead to feed-forward inflammatory loops and so may propagate disease (97). Consequently, the full spectrum of cell types in Mtb-infected lesions needs to be considered. Given that multiple underlying immunological pathways can lead to active TB, it seems unlikely that a single cellular component will fully explain the balance between Mtb containment and progression to active disease. However, the majority of studies continue to look for a single consistent immune mechanism; discussions rarely state “this is one of several potential routes to TB disease” (49, 95, 98100).

Considering the clinical and experimental evidence, immunological failure is likely to be a multistep process, whereby either one large deficit, or numerous small imbalances, can lead to progression and disease (Figure 3). Mtb and humans interact over many years, as the pathogen is difficult to eradicate due to its highly evolved survival mechanisms (101), and therefore in those individuals in whom it survives, there is a long period where it can escape host control. Potentially these different paths may ultimately cross or converge; if so, understanding the key nodes will allow more broadly effective treatments to emerge. For example, lung extracellular matrix degradation could be regarded as a final common pathway (102), but, again, this may result from different collagenases including macrophage-derived MMP-1 or neutrophil-derived MMP-8 (103, 104).

Schematic of the interactions needed for Mtb to escape the host immune response.

As control of Mtb requires a co-ordinated host response, there are multiple sequences of immune events that can ultimately result in progression to active TB disease. A major immune disturbance, such as TNF-α or PD-1 inhibition, gives a relatively direct pathway to active TB. However, most individuals develop TB due to a series of less apparent immune events and no clear global immune disturbance that can be identified by current immune profiling approaches.

Emerging methodologies and the challenge of data analysis

The recent adoption of “omic” methodologies, including single cell transcriptomics, spatial transcriptomics, and proteomics, offers unprecedented opportunities to dissect the mechanisms of human TB pathogenesis and accelerate the development of effective interventions. These approaches generate large-scale, complex datasets that capture the heterogeneity of TB lesions. However, this complexity also presents significant analytical challenges. Standard computational pipelines, if not adapted to account for the biological and technical variability in these data, are unlikely to deliver robust or reproducible insights. A major obstacle is the integration of data from multiple studies and platforms, which can differ both within a single omic layer (horizontal integration) and across multiple omic layers (vertical integration) (105). Such integration risks data loss and inconsistent results, especially when data are not harmonised.

The prolonged co-evolution between host and pathogen has resulted in multiple immunological pathways to active TB, significantly adding to the biological heterogeneity that complicates the analysis and interpretation of multi-omic data. Addressing this complexity requires well-annotated clinical cohorts that capture the full spectrum of TB heterogeneity, ideally with longitudinal outcome data rather than single time-point snapshots. Comprehensive clinical descriptors and associated data such as laboratory analyses and chest X-rays will permit definition of each clinical phenotype. To accommodate the diversity of disease pathways, novel bioinformatic approaches are needed that move beyond the assumption of a single sequence of events that results in active TB.

The power of multi-omic approaches to reveal complex molecular relationships depends on both the quality of the omic data and the fit between experimental design and data integration strategies. For example, correlation-based integration requires matched samples across omics, sufficient sample numbers, and comparable variance structures. These requirements are often overlooked, leading to insufficient power, noisy data, and unrealistic integration (106). Each omic technology brings its own challenges. Signal-to-noise ratios differ across modalities, and appropriate algorithms are needed to estimate sample size and power for each. Notably, using the same sample numbers across modalities does not ensure comparable statistical power; achieving equal power can require unbalanced sample sizes (107). Other recognised challenges include the interpretation and validation of multi-omic models, standardised annotation, and data sharing (107). Recent developments include web-based, user-friendly tools that enable both knowledge- and data-driven multi-omic integration (108). Importantly, a wider use of artificial intelligence (AI) is transforming omics data analysis by providing robust methods to interpret complex biological datasets (109). Machine learning and deep learning techniques are now routinely applied to DNA, transcriptomic, proteomic, and metabolomic data, enabling more integrated and comprehensive analyses (109). In proteomics, for example, AI-driven approaches have enhanced peptide measurement predictions and accelerated biomarker discovery, often outperforming conventional assays (110). To improve interpretability, explainable AI (XAI) methods are increasingly employed, with feature relevance mapping and visual explanations emerging as preferred post hoc strategies (111). However, despite these developments, significant challenges remain in implementing XAI. Further research is needed to overcome these barriers and unlock the full translational potential of AI in omics analysis (110, 111). Ultimately, the utility of these innovations will depend on the functional validation of the resulting models.

Given the complexity of human TB, careful study design and unbiased integration methods that accommodate data limitations are essential. Spatial context is particularly important, as TB pathology involves three-dimensional immune responses. Extracellular matrix remodelling is a hallmark of TB granulomas [81], and the matrix regulates host cell biology [84]. Consequently, multiple data inputs such as matrix composition and organisation may be needed alongside host and Mtb transcriptomic data. Ultimately, local cellular events must be modelled at the tissue level, providing a second layer of computational complexity [85].

Addressing these challenges will require substantial investment in analytical capacity. Multi-omics is already transforming our understanding of disease heterogeneity, facilitating the identification of previously unrecognized subgroups, refining prognostic and therapeutic approaches, and providing deeper mechanistic insights. This strategy has successfully stratified rare tumours (112), profiled healthy populations (113), and enabled health screening to reveal previously hidden disease and risk subgroups (114), supporting the shift toward precision medicine. As human disease processes are rarely uniform, advances in multi-omic study design and analysis for TB will likely benefit a broad range of conditions. Ultimately, this comprehensive understanding of disease pathophysiology can then lead to more targeted and individualised treatment, although implementation will require developments in companion diagnostics to accurately stratify patients.

Implications of diverse disease pathways for new TB interventions

Ultimately, the complexity of human TB underlies our inability to deploy a transformative intervention, and so global mortality remains depressingly high. The human-Mtb interaction is so closely co-evolved that experimental findings need to be interpreted in light of human disease phenomena. Multiomic studies in TB are currently being undertaken on small numbers due to cost and challenges in obtaining appropriate clinical samples, in specific regions, and so are unlikely to reflect the global heterogeneity. Therefore, results need to be interpreted with caution and wider studies will be needed to confirm the generalisability of findings. A critical aspect will be carefully curated and fully accessible metadata, so that as the body of omic datasets on human TB increase, they can be accurately integrated into wider analyses. Recurrent mining of these datasets is likely to be fundamental to understanding the breadth of TB pathogenesis. Missing metadata makes interpretation difficult, and in worst cases misleading. In addition, innovative computational approaches will be required whereby the analysis specifically accommodates multiple immune pathways to disease.

Given the toll that TB takes in the poorest parts of the world, we have a moral imperative to end the epidemic (115). To achieve this, we must acknowledge the complexity of human TB that has resulted from our prolonged co-evolution with the pathogen and the selective pressure of persistent Mtb exposure over millennia. Success depends on integrating the full spectrum of TB disease into our bioinformatic analyses, and ultimately understanding TB’s complexity can then inform logical interventions. If we seek a single mechanistic explanation of TB disease, this seems to be unlikely to be successful.

Acknowledgements

PE was supported by the MRC (MR/W025728/1), MR by the Rosetrees Trust (CF-2021-2\126), SM by the MRC (MR/S024220/1), LBT by Academy of Medical Sciences Springboard (REF: SBF0010\1085) and AL by the Wellcome Trust (210662/Z/18/Z) and the BMGF (OPP1137006).