Tuberculosis: Fighting an old disease with next-generation sequencing

  1. Anzaan Dippenaar
  2. Robin M Warren  Is a corresponding author
  1. Stellenbosch University, South Africa

Mycobacterium tuberculosis is the causative agent of tuberculosis, a disease that is a major threat to human health worldwide. It is estimated that approximately 9 million people were diagnosed with tuberculosis during 2013, and that 1.5 million died from the disease. The global tuberculosis epidemic is being driven by co-infection with HIV and by the emergence and spread of drug-resistant strains of M. tuberculosis. The World Health Organization has reported that 3.5% of new tuberculosis patients, and 20.5% of patients who had been treated before, had multidrug-resistant forms of the disease in 2013. The diagnosis and treatment of drug-resistant tuberculosis is clearly a major global health challenge (World Health Organization, 2014).

The enormity of the tuberculosis epidemic has created a desperate need to develop methods to monitor the dynamics of the disease. In the early 1990s, the discovery of repetitive elements in the genome of M. tuberculosis laid the foundation for the development of the science of molecular epidemiology (van Embden et al., 1993). These methods have shown that in situations where tuberculosis is common, epidemics are driven by the transmission of the bacteria between individuals. However, in low incidence settings, epidemics are driven by the ‘reactivation’ of bacteria that have been lying dormant in individuals since an earlier infection. It is also known that recurrent disease—when the symptoms reappear after a patient has apparently been cured—can occur through a second infection event, and that drug resistance is spread by transmission (Mathema et al., 2006).

Traditional methods to identify strains of M. tuberculosis rely on the analysis of small windows of the genome, and it has been assumed that the DNA sequences in these windows are variable enough to allow researchers to separate strains of M. tuberculosis that are evolutionarily close or distant. However, the true complexity of disease dynamics cannot be resolved by tracking strains using a small section of the genome. The development of next-generation sequencing platforms has made it possible to view the complete genetic information of the bacteria, which should improve the accuracy of efforts to monitor strains of M. tuberculosis as they move through space and time (Roetzer et al., 2013). Rapid whole genome sequencing promises to be the ultimate tool for epidemiological investigations, diagnosis, and for testing whether strains of bacteria are susceptible to particular drugs.

Now, in eLife, Judith Glynn of the London School of Hygiene and Tropical Medicine (LSHTM) and co-workers—with Guerra-Assunção as first author—report how a long-term large-scale whole genome sequencing strategy has been used to decipher the tuberculosis epidemic in a high prevalence setting with multiple sources of infection (Guerra-Assunção et al., 2015). They analysed the whole genome sequences of 1687 M. tuberculosis samples (isolates) collected from patients in the Karonga District of Malawi over a period of 15 years. This represents 72% of the total number of confirmed tuberculosis cases during that time. The various strains of M. tuberculosis can be grouped into seven ‘lineages’ that each contain bacteria descending from a common ancestor. Guerra-Assunção et al. found that the epidemic was largely driven by members of one lineage, which implies that either this lineage arrived in the area earlier than the others, or that the members of this lineage were more successful.

The genome of M. tuberculosis consists of ∼4.4 million bases and is generally believed to be relatively stable (Jagielski et al., 2014). To identify isolates that were directly related in a transmission network (i.e., recently transmitted from one patient to the next), Guerra-Assunção et al. used a cut-off point of up to ten differences in single nucleotide polymorphisms between the genomes of the isolates. Next, they developed a clustering formula to group together directly related isolates. Using this formula in combination with network-analysis (where isolates are linked according to genome sequence similarity), they found that strains from certain lineages were more likely to be transmitted between patients than others. This suggests that there are differences in the abilities of bacteria in the different lineages to cause disease. In this high-incidence setting, 66% of identified cases clustered together, of which 38% of the patients had evidence of recent infection, implying ongoing transmission of the bacteria. This indicates that reactivation of previous infection was the primary driving force behind this epidemic.

Glynn, Guerra-Assunção and co-workers—who are based at the LSHTM, the Karonga Prevention Study in Malawi and the Wellcome Trust Sanger Institute—also showed that the proportion of tuberculosis cases due to reactivation increased over the duration of the 15 year study, as demonstrated by a marked decrease in transmission between 1999–2001 (45%) and 2008–2010 (30%). Guerra-Assunção et al. suggest that this decrease is due to the implementation of antiretroviral therapy and isoniazid preventative therapy in Karonga. However, this is counter-intuitive because both treatments should protect against reactivation, thereby raising an important question as to how reactivation may work in this context. Significantly, this study shows that the tuberculosis control program in Karonga has reduced transmission of the bacteria. It also demonstrates that whole genome sequencing can provide new insights into tuberculosis epidemics, which could be used to advise and fine tune control programs.

Despite the advantages of whole genome sequencing, it is important to acknowledge the complexity of the technology and data analysis. This questions how useful it could be in high-incidence settings where tens of thousands of cases are diagnosed annually. Furthermore, the current technology is restricted to clinical isolates that need to undergo a lengthy culturing and DNA extraction process, which prevents its use as a real-time monitoring tool. Additionally, whole genome sequencing is labor intensive and financially demanding, although costs have decreased significantly over the last decade. Regardless of these challenges, this technology has the potential to immediately revolutionise drug susceptibility testing by identifying the complete repertoire of mutations in target genes that confer drug resistance (Steiner et al., 2014). Application of this technology would decrease diagnostic delay, thereby reducing transmission, morbidity and mortality and, at the same time, improving treatment outcome.


    1. van Embden JD
    2. Cave MD
    3. Crawford JT
    4. Dale JW
    5. Eisenach KD
    6. Gicquel B
    7. Hermans P
    8. Martin C
    9. McAdam R
    10. Shinnick TM
    Strain identification of Mycobacterium tuberculosis by DNA fingerprinting: recommendations for a standardized methodology
    Journal of Clinical Microbiology 31:406–409.

Article and author information

Author details

  1. Anzaan Dippenaar

    DST/NRF Centre of Excellence for Biomedical Tuberculosis Research, SAMRC Centre for TB Research and the Division of Molecular Biology and Human Genetics, Stellenbosch University, Stellenbosch, South Africa
    Competing interests
    No competing interests declared.
  2. Robin M Warren

    DST/NRF Centre of Excellence for Biomedical Tuberculosis Research, SAMRC Centre for TB Research and the Division of Molecular Biology and Human Genetics, Stellenbosch University, Stellenbosch, South Africa
    For correspondence
    Competing interests
    No competing interests declared.

Publication history

  1. Version of Record published: March 3, 2015 (version 1)


© 2015, Dippenaar and Warren

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.


  • 1,902
    Page views
  • 248
  • 2

Article citation count generated by polling the highest count across the following sources: Scopus, Crossref, PubMed Central.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Anzaan Dippenaar
  2. Robin M Warren
Tuberculosis: Fighting an old disease with next-generation sequencing
eLife 4:e06782.

Further reading

  1. Whole genome sequencing is providing new insights into the spread of different lineages of tuberculosis.

    1. Epidemiology and Global Health
    2. Medicine
    Jeffrey Thompson, Yidi Wang ... Ulrich H von Andrian
    Research Article Updated


    Although there are several efficacious vaccines against COVID-19, vaccination rates in many regions around the world remain insufficient to prevent continued high disease burden and emergence of viral variants. Repurposing of existing therapeutics that prevent or mitigate severe COVID-19 could help to address these challenges. The objective of this study was to determine whether prior use of bisphosphonates is associated with reduced incidence and/or severity of COVID-19.


    A retrospective cohort study utilizing payer-complete health insurance claims data from 8,239,790 patients with continuous medical and prescription insurance January 1, 2019 to June 30, 2020 was performed. The primary exposure of interest was use of any bisphosphonate from January 1, 2019 to February 29, 2020. Bisphosphonate users were identified as patients having at least one bisphosphonate claim during this period, who were then 1:1 propensity score-matched to bisphosphonate non-users by age, gender, insurance type, primary-care-provider visit in 2019, and comorbidity burden. Main outcomes of interest included: (a) any testing for SARS-CoV-2 infection; (b) COVID-19 diagnosis; and (c) hospitalization with a COVID-19 diagnosis between March 1, 2020 and June 30, 2020. Multiple sensitivity analyses were also performed to assess core study outcomes amongst more restrictive matches between BP users/non-users, as well as assessing the relationship between BP-use and other respiratory infections (pneumonia, acute bronchitis) both during the same study period as well as before the COVID outbreak.


    A total of 7,906,603 patients for whom continuous medical and prescription insurance information was available were selected. A total of 450,366 bisphosphonate users were identified and 1:1 propensity score-matched to bisphosphonate non-users. Bisphosphonate users had lower odds ratios (OR) of testing for SARS-CoV-2 infection (OR = 0.22; 95%CI:0.21–0.23; p<0.001), COVID-19 diagnosis (OR = 0.23; 95%CI:0.22–0.24; p<0.001), and COVID-19-related hospitalization (OR = 0.26; 95%CI:0.24–0.29; p<0.001). Sensitivity analyses yielded results consistent with the primary analysis. Bisphosphonate-use was also associated with decreased odds of acute bronchitis (OR = 0.23; 95%CI:0.22–0.23; p<0.001) or pneumonia (OR = 0.32; 95%CI:0.31–0.34; p<0.001) in 2019, suggesting that bisphosphonates may protect against respiratory infections by a variety of pathogens, including but not limited to SARS-CoV-2.


    Prior bisphosphonate-use was associated with dramatically reduced odds of SARS-CoV-2 testing, COVID-19 diagnosis, and COVID-19-related hospitalizations. Prospective clinical trials will be required to establish a causal role for bisphosphonate-use in COVID-19-related outcomes.


    This study was supported by NIH grants, AR068383 and AI155865, a grant from MassCPR (to UHvA) and a CRI Irvington postdoctoral fellowship, CRI2453 (to PH).