Introduction

The study of tuberculosis (TB) pioneered infectious disease research in the modern scientific era, leading to the formulation of Koch’s postulates demonstrating that an illness can have an infectious origin (1). Mycobacterium tuberculosis (Mtb) infects and survives within macrophages, subverting the host immune response by multiple mechanisms including inhibition of phagosome maturation and downregulation of antigen presenting molecules, leading to the formation of complex immune aggregates, known as granulomas (2). Even though TB was the first definitively identified infection, it remains the world’s deadliest infectious disease despite well over 100 years of research (3). This contrast raises a critical question: why is Mtb proving so resistant to human efforts to control it?

Mtb has eluded attempts to develop a fully protective vaccine, despite a partially effective vaccine being available since the 1920’s. Mycobacterium bovis BCG was developed by sequential culture of M bovis and protects children against disseminated TB but has limited protection against adult disease (4, 5). Although exciting progress has been made with a vaccine that reduces TB disease by approximately 50% in a phase II study (6), currently the immune mechanisms underpinning protection have not been identified. Subsequently, a major trial of an alternative vaccine showed no efficacy in preventing recurrence. Indeed, despite being highly immunogenic, the relapse rate tended to be higher in the vaccine group (7). Similarly, repeat BCG vaccination does not increase protection, despite inducing a strong Mtb-specific CD4 T cell response (8).

These difficulties highlight the urgent need to understand the host-pathogen interaction more fully. We have insufficient knowledge of key steps in disease progression to develop transformative interventions. Currently, almost all investigations into the immune response to Mtb are based on the premise that a unifying disease mechanism exists, and this can be understood through reductionist scientific approaches. However, clinical observations demonstrate that there are multiple paths to TB, and so seeking to define a single mechanism is likely to be flawed.

Immunological insights from historic and recent clinical observations

TB has co-evolved with humans for millennia, with some estimates suggesting up to 70,000 years (9). The field of paleoarchaeology, for example, provides extensive evidence of TB from the early neolithic period in the Middle East, with approximately 5% of skeletons from a 10,000-year-old village showing signs of TB (10). This was just before animal domestication and pottery, in hunter-gatherers who built stone houses, and so Mtb was already successfully transmitting in humans before the subsequent population growth that occurred with farming (10). Potentially, to survive in relatively small hunter-gatherer communities Mtb may have needed to have reduced virulence to avoid excessive deaths and a latent period to permit sustainable transmission in low population numbers (11).

Mtb successfully persisted over the ages and then flourished in the more dense populations that occurred with the industrial revolution (12), giving rise to the modern TB era of high transmission approximately 250 years ago. The fundamental cause of TB remained unknown until Koch’s seminal work (1). Early investigations demonstrated that the first Mtb infection point was the lung base, while Mtb exits from the apices of the lungs (13). This life cycle must involve several distinct host/pathogen interactions, as initially immune evasion is required for Mtb to survive, but then later immune engagement is necessary to cause the inflammation and lung destruction needed to optimise transmission (14). Notably, most people (approximately 90%) initially infected with Mtb never progress to active, clinical disease (3). In addition, in the pre-antibiotic era, the progression and regression of different lesions in the same individual were observed on chest radiographs, and one third of patients with active TB disease self-healed, showing that the host-pathogen interaction is finely balanced at all stages of infection (12).

Modern immunological techniques and the development of biologic therapies that target specific immune processes have provided extensive insight into TB disease mechanisms. The greatly increased occurrence of TB in the context of HIV co-infection, for example, highlighted immunodeficiency as a major driver of disease (15). Similarly, the occurrence of TB after anti-TNF-α therapy for autoimmune disease confirmed the importance of TNF-α in control of latent infection (16). Furthermore, genetic investigations have identified numerous immunodeficiencies via studies of Mendelian Susceptibility to Mycobacterial Diseases (MSMD), with mutations typically along the IL-12/IFN-γ/STAT signalling pathway (1719). With less clearly defined immunologic mechanisms, malnutrition is a significant risk factor for TB (20), and food supplementation reduces TB incidence in contacts (21). Therefore, diverse immune deficiencies can lead to active TB.

The vast majority of patients who develop TB, however, have no clear identifiable immunodeficiency. Indeed, Comstock’s seminal study from the 1970s showed that children from Haiti with the strongest recall responses to Mtb antigens actually had the greatest subsequent risk of developing TB (22). These observations have been replicated in more recent studies using IFN-γ release assays (IGRAs) in response to TB antigens, where higher IFN-γ production associates with increased risk of progressing to disease in both children and adults (23, 24). TB is commonest in young adults in their immunological prime, and more frequent in males than females, characterised by an excessive inflammatory response (25). The implication that TB can also be caused by immune excess is now supported by recently introduced cancer immunotherapies (26). Anti-PD-1 treatment, which activates the immune response and represents the immunological opposite to anti-TNF therapy, should control TB if immunodeficiency were the critical component. However, anti-PD-1 treatment can lead to rapid reactivation of latent TB infection, first identified in case reports (27). This finding is supported by studies in mice (28, 29), the non-human primate (30) and 3D cellular models (31), and ultimately has been validated by patient registry studies (32, 33). Similarly, type II diabetes is associated with an increased risk of TB (20), characterized by a hyper-inflammatory immune response (34). Therefore, diverse clinical evidence demonstrates that there are multiple immunological routes to TB disease (35) (Figure 1).

Many routes can lead to TB via opposing immunological extremes.

Generally, these can be viewed as immune deficiencies or immune excess, but the majority of patients who develop TB are relatively young with a competent immune response. Though not quantitative, font size relates to contribution to global incidence. Left arrow: Ghon focus at lung base; Right arrow: cavity at lung apex.

Even when TB develops in the face of a “normal” immune system, there are likely to be many different subgroups that have not yet been identified due to limitations in standard immunological profiling. A quarter of the world’s population are thought to be exposed to Mtb (36), and so the vast majority of those infected with Mtb remain healthy lifelong (3), while those who progress to active TB likely represent distinct outlier populations. The diverse causes of active TB, such as anti-TNF, MSMD, HIV, diabetes and anti-PD1, demonstrate that patients do not reach TB by a single route, but instead can lose the immunological balance that controls Mtb by multiple paths. This may explain why genetic studies have generally failed to find consistent predispositions. Evidence of heritability can be demonstrated but is often hard to validate in different populations (37). Mutations in the macrophage endosomal protein NRAMP1, for example, were shown to associate with TB in an early seminal study (38), but since then few consistent traits of genetic susceptibility to TB have been identified (39). Similarly, studies of individuals who resist Mtb infection despite recurrent exposure generates a diverse list of potentially protective features, including T cell subsets or activation (4042), TNF-α responses (43), antibody activity (44) and innate immune training (45). These human observations are supported by murine genetic experiments, which show that increased susceptibility to Mtb can result from a wide range of immunological alterations (46). However, despite this evidence of diversity, the TB field continues to seek a single unifying explanation for disease progression, which is incompatible with clinical observations.

Ultimately, to transmit efficiently, Mtb needs to cause pulmonary disease to then spread by airborne droplets (47), and for Mtb it does not matter the route taken, as long as it ends at pulmonary TB. Experimentally, TB is a fundamentally challenging disease to model, as the interaction is prolonged and Mtb is an obligate human pathogen (48). The primary driver of advances in immunology in the last decades has been transgenic mice (49), but the mouse model of TB does not accurately reflect human disease (50). Disease heterogeneity has been highlighted in describing clinical TB endotypes observed during active disease (51), but less attention has been given to the different immunological pathways that can lead to the same disease phenotype.

Mtb’s single successful colonisation of humans

Just as new tools are providing insights into the complexity of human TB progression, advances in mycobacterial genomics are highlighting the unique nature of the human-Mtb relationship (52). Mtb is a near clonal organism, with evidence suggesting that there has been only one successful and sustained penetration into the human population (52, 53). The entire spectrum of Mtb strains globally only differs by a total of approximately 2,000 SNPs (52). This genetic conservation has persisted even though Mtb has been flourishing in human populations since the industrial revolution, suggesting that Mtb was already close to being optimised for human hosts. Further evolution may have been to increase transmission within specific populations as humans diverged genetically (53).

A very similar organism, M canetti, can cause disease but cannot transmit from human to human (54). Similarly, M bovis is 99.9% identical to Mtb (55), but has never achieved sustained human-to-human transmission, despite many millions of human exposures during the pre-pasteurisation era and common lymph node infections (53). Therefore, human TB is caused exclusively by Mtb, unlike other very closely related mycobacteria, which maintain infection cycles in other higher mammals but not humans (53). Clues to the key pathogenic mechanisms may lie in the differences between mycobacterial species, but ultimately similarities between Mtb strains must be critical to their ongoing success. For example, the hyper conservation of T cell epitopes may be evidence that Mtb manipulates the host immune response to favour disease and transmission (56). Given the size of the Mtb genome, identifying the critical conserved features will be challenging, especially as half of its genes still lack a known function 25 years after the genome was first sequenced (57).

One highly intriguing proposition is that latent TB may itself give humans an evolutionary advantage (57). Exposure to Mtb modifies innate immune training (58), and so a mechanism whereby Mtb could protect from other fatal infections is plausible. Similarly, M bovis BCG, a live-attenuated strain of M bovis, causes innate immune training (59), suggesting this effect is common to Mtb and BCG. BCG vaccination reduces mortality much more significantly than can be explained by the effect on TB incidence alone (60), with a protective effect confirmed in diverse studies (61). For example, BCG reduces viral infections in infants in Africa (62) and experimentally challenged adults (63), and associates with improves survival in Europe (64). Together, these observations strongly imply that mycobacterial infection protects from other infectious causes of death.

Mtb almost certainly first colonised humans in East Africa (53), potentially about 70,000 years ago (9). Several human migrations out of Africa are thought to have preceded this date, but all ultimately became extinct, with the first sustained human dispersal 60 – 70,000 years ago (65, 66). The primary selective pressure in these early communities would have been infectious disease (67). This raises a novel hypothesis that a survival advantage for the first successful human migrants out of Africa was Mtb circulating in the community, reducing mortality from other infectious diseases and thereby enabling sustainable population growth.

Mtb transmission may have benefitted humans by increasing innate immune resistance to infection at the cost of 10% disease penetrance that permits Mtb propagation. This selective pressure over many millennia would progressively remove genotypes that lead to high susceptibility to TB, but equally tend to select against individuals with complete resistance to Mtb colonisation (Figure 2). This could explain why consistent genetic traits for susceptibility or resistance to TB have been hard to identify (39). More recent mass infection events, such as the smallpox epidemics that killed approximately 25% of the population (67), may have further favoured individuals immunologically trained by Mtb. Likewise, successive waves of plague killed approximately 25% of the population, at the end of the Roman empire and then in the European middle ages, adding to selective pressure from endemic infections (68, 69). If humans have been selected to be permissive to Mtb colonisation but broadly resistant to disease, it suggests the development of active disease must be a relatively unusual event in a subset of outlier individuals. In some sense, we could be regarded as having a symbiotic relationship with Mtb, with disease representing a necessary evil, caused by an imbalance in the predominantly stable host pathogen interaction (70, 71).

Selection pressure of prolonged co-evolution favours individuals permissive to colonisation but resistant to active disease.

Over millennia, Mtb circulation in society will remove genetic traits resulting in high susceptibility to active TB infection. Perhaps less intuitively, if Mtb generates trained immunity that protects against other fatal infections, individuals highly resistant to Mtb colonisation will also be selected against. The resulting population would then reflect modern humans; susceptible to initial Mtb infection but generally resistant to TB disease.

The host-pathogen interaction at a cellular level

The early histological era described the wide range of human TB lesions and granuloma types, and identified TB as a disease characterised by spatial organisation (72). Classically, the granuloma has been proposed to be where the outcome of the host-pathogen interaction is determined (73). Recent methodological advances are permitting much greater dissection of events and further highlight the importance of spatial organisation within the granuloma (7476). However, just as the early X-ray era showed some lesions progressing and some regressing, these studies demonstrate the great heterogeneity between granuloma types. Studies in the non-human primate have allowed investigation into features of progressing and controlling granulomas, identifying potential correlates of immune control (77), but even this model only partially recapitulates human disease.

Furthermore, the recent spatial studies highlight the complexity of cellular players, including the established fulcrum of macrophages and T cells, but additionally the importance of B cells, neutrophils and fibroblasts. For example, fibroblasts are emerging as important immune regulators in other lung diseases (78), and so it seems highly likely that they play an active role in TB-related inflammation. Fibroblast zonation can lead to feed-forward inflammatory loops and so may propagate disease (79). Consequently, the full spectrum of cell types in Mtb-infected lesions needs to be considered. Given that multiple immunological routes can lead to active TB, it seems unlikely that a single cellular component will fully explain the balance between Mtb containment and progression to active disease. However, the majority of studies continue to look for a single protective immune mechanism (41, 77, 8082).

Considering the clinical and experimental evidence, immunological failure is likely to be a multistep process, whereby either one large deficit, or numerous small imbalances, can lead to progression and disease (Figure 3). Mtb and humans interact over many years, as the pathogen is difficult to eradicate due to its highly evolved survival mechanisms (83), and therefore in those individuals in whom it survives, there is a long period where it can escape host control. Potentially these different paths may ultimately cross or converge; if so, understanding the key nodes will allow more broadly effective treatments to emerge. For example, lung extracellular matrix degradation could be regarded as a final common pathway (84), but, again, this may result from different collagenases including macrophage-derived MMP-1 or neutrophil-derived MMP-8 (85, 86).

Escaping the human immune response from the pathogen’s perspective.

As control of Mtb requires a co-ordinated host response, there are multiple routes that Mtb can escape by. A major immune disturbance, such as TNF-α or PD-1 inhibition, gives a relatively direct route of exit. However, most individuals develop TB due to a series of less apparent immune events and no obvious global immune disturbance.

Emerging methodologies and the challenge of data analysis

The recent adoption of “omic” methodologies, including single cell transcriptomics, spatial transcriptomics, and proteomics, offers unprecedented opportunities to dissect the mechanisms of human TB pathogenesis and accelerate the development of effective interventions. These approaches generate large-scale, complex datasets that capture the heterogeneity of TB lesions. However, this complexity also presents significant analytical challenges. Standard computational pipelines, if not adapted to account for the biological and technical variability in these data, are unlikely to deliver robust or reproducible insights. A major obstacle is the integration of data from multiple studies and platforms, which can differ both within a single omic layer (horizontal integration) and across multiple omic layers (vertical integration) (87). Such integration risks data loss and inconsistent results, especially when data are not harmonised.

The prolonged co-evolution between host and pathogen has resulted in multiple routes to active TB, significantly adding to the biological heterogeneity that complicates the analysis and interpretation of multi-omic data. Addressing this complexity requires well-annotated clinical cohorts that capture the full spectrum of TB heterogeneity, ideally with longitudinal outcome data rather than single time-point snapshots. To accommodate the diversity of disease pathways, novel bioinformatic approaches are needed that move beyond the assumption of a single, unifying route to active TB.

The power of multi-omic approaches to reveal complex molecular relationships depends on both the quality of the omic data and the fit between experimental design and data integration strategies. For example, correlation-based integration requires matched samples across omics, sufficient sample numbers, and comparable variance structures. These requirements are often overlooked, leading to insufficient power, noisy data, and unrealistic integration (88). Each omic technology brings its own challenges. Signal-to-noise ratios differ across modalities, and appropriate algorithms are needed to estimate sample size and power for each. Notably, using the same sample numbers across modalities does not ensure comparable statistical power; achieving equal power can require unbalanced sample sizes (89). Other recognised challenges include the interpretation and validation of multi-omic models, standardised annotation, and data sharing (89). Recent developments include webbased, user-friendly tools that enable both knowledge- and data-driven multi-omic integration (90). Importantly, a wider use of artificial intelligence (AI) is transforming omics data analysis by providing robust methods to interpret complex biological datasets (91). Machine learning and deep learning techniques are now routinely applied to DNA, transcriptomic, proteomic, and metabolomic data, enabling more integrated and comprehensive analyses (91). In proteomics, for example, AI-driven approaches have enhanced peptide measurement predictions and accelerated biomarker discovery, often outperforming conventional assays (92). To improve interpretability, explainable AI (XAI) methods are increasingly employed, with feature relevance mapping and visual explanations emerging as preferred post hoc strategies (93). However, despite these developments, significant challenges remain in implementing XAI. Further research is needed to overcome these barriers and unlock the full translational potential of AI in omics analysis (92, 93). Ultimately, the utility of these innovations will depend on the functional validation of the resulting models.

Given the complexity of human TB, careful study design and unbiased integration methods that accommodate data limitations are essential. Spatial context is particularly important, as TB pathology involves three-dimensional immune responses. Extracellular matrix remodelling is a hallmark of TB granulomas [81], and the matrix regulates host cell biology [84]. Consequently, multiple data inputs such as matrix composition and organisation may be needed alongside host and Mtb transcriptomic data. Ultimately, local cellular events must be modelled at the tissue level, providing a second layer of computational complexity [85].

Addressing these challenges will require substantial investment in analytical capacity. Multi-omics is already transforming our understanding of disease heterogeneity, facilitating the identification of previously unrecognized subgroups, refining prognostic and therapeutic approaches, and providing deeper mechanistic insights. This strategy has successfully stratified rare tumours (94), profiled healthy populations (95), and enabled health screening to reveal previously hidden disease and risk subgroups (96), supporting the shift toward precision medicine. As human disease processes are rarely uniform, advances in multi-omic study design and analysis for TB will likely benefit a broad range of conditions.

Implications of diverse disease pathways for new TB interventions

Ultimately, the complexity of human TB underlies our inability to deploy a transformative intervention, and so global mortality remains depressingly high. The human-Mtb interaction is so closely co-evolved that experimental findings need to be interpreted in light of human disease phenomena. Multiomic studies in TB are currently being undertaken on small numbers due to cost and challenges in obtaining appropriate clinical samples, in specific regions, and so are unlikely to reflect the global heterogeneity. Therefore, results need to be interpreted with caution and wider studies will be needed to confirm the generalisability of findings. A critical aspect will be carefully curated and fully accessible metadata, so that as the body of omic datasets on human TB increase, they can be accurately integrated into wider analyses. Recurrent mining of these datasets is likely to be fundamental to understanding the breadth of TB pathogenesis. Missing metadata makes interpretation difficult, and in worst cases misleading. In addition, innovative computational approaches will be required whereby the analysis specifically accommodates multiple immune pathways to disease.

Given the toll that TB takes in the poorest parts of the world, we have a moral imperative to end the epidemic (97). To achieve this, we must acknowledge the complexity of human TB that has resulted from our prolonged co-evolution with the pathogen and the selective pressure of persistent Mtb exposure over millennia. Success depends on integrating the full spectrum of TB disease into our bioinformatic analyses, and ultimately understanding TB’s complexity can then inform logical interventions. If we seek a single unified explanation of TB disease, this seems to be unlikely to be successful.

Acknowledgements

PE was supported by the MRC (MR/W025728/1), MR by the Rosetrees Trust (CF-2021-2\126), SM by the MRC (MR/S024220/1) and AL by the Wellcome Trust (210662/Z/18/Z) and the BMGF (OPP1137006).