Large-scale whole genome sequencing of M. tuberculosis provides insights into transmission in a high prevalence area

  1. JA Guerra-Assunção
  2. AC Crampin
  3. RMGJ Houben
  4. T Mzembe
  5. K Mallard
  6. F Coll
  7. P Khan
  8. L Banda
  9. A Chiwaya
  10. RPA Pereira
  11. R McNerney
  12. PEM Fine
  13. J Parkhill
  14. TG Clark
  15. JR Glynn  Is a corresponding author
  1. London School of Hygiene and Tropical Medicine, United Kingdom
  2. Karonga Prevention Study, Malawi
  3. Wellcome Trust Sanger Institute, United Kingdom
4 figures and 3 tables

Figures

Phylogenetic tree of all samples from Karonga.

Lineages form monophyletic groups within the phylogeny, as expected. Lineage 1 (Indo Oceanic) is represented in dark blue, Lineage 2 (Beijing/East Asian) in light blue, Lineage 3 (East African Indian) in green, and Lineage 4 (Europe American) in red.

https://doi.org/10.7554/eLife.05166.003
Figure 2 with 1 supplement
Pairwise SNP distances between all pairs of samples with known RFLP.

The y axis shows the relative frequency within each subgroup: same RFLP pattern (red), different RFLP patterns (blue); same individual, same RFLP (green). (A) shows the full data set, and (B) is part of the same figure drawn at a larger scale (each bar corresponds to 1 SNP) to show the smaller distances more clearly.

https://doi.org/10.7554/eLife.05166.005
Figure 2—figure supplement 1
Pairwise mutation rates between all pairs of samples with known RFLP (calculated as number of SNPs/number of days between dates of disease onset between individuals).

The y axis shows the relative frequency within each subgroup: same RFLP pattern (red), different RFLP patterns (blue); same individual, same RFLP (green). (A) shows the full data set, and (B) is part of the same figure drawn at a larger scale (each bar corresponds to 0.001 SNP/day) to show the smaller distances more clearly.

https://doi.org/10.7554/eLife.05166.006
Figure 3 with 1 supplement
Examples of clusters built using SeqTrack.

All clusters are shown in Figure 3—figure supplement 1. Each polygon represents a patient, with larger polygons representing two or more patients with identical sequences. The patient details are written inside the polygon: F = female, M = male. The number is the year of the start of the disease episode. The shapes describe drug resistance of the strain: squares = drug sensitive, circles = drug resistant. The colour of the polygon refers to HIV status of the patient: red = positive, blue = negative, grey = unknown (or multiple patients). The colour of the edge refers to the lineage: Lineage 1 (Indo Oceanic) dark blue (B), Lineage 2 (Beijing/East Asian) light blue (C), Lineage 3 (East African Indian) green (A), and Lineage 4 (Europe American) red (D). The numbers on the arrows between the polygons are the number of SNPs between them.

https://doi.org/10.7554/eLife.05166.007
Figure 3—figure supplement 1
Clusters built using SeqTrack.

Each polygon represents a patient, with larger polygons representing two or more patients with identical sequences. The patient details are written inside the polygon: F = female, M = male. The number is the year of the start of the disease episode. The shapes describe drug resistance of the strain: squares = drug sensitive, circles = drug resistant, octagons = unknown. The colour of the polygon refers to HIV status of the patient: red = positive, blue = negative, grey = unknown. The colour of the edge refers to the lineage: Lineage 1 (Indo Oceanic) dark blue, Lineage 2 (Beijing/East Asian) light blue, Lineage 3 (East African Indian) green, and Lineage 4 (Europe American) red. The numbers on the arrows between the polygons are the number of SNPs between them.

https://doi.org/10.7554/eLife.05166.008
Figure 4 with 2 supplements
Distribution of clusters and SNPs.

(A) Number of clusters of different sizes and percentage of patients in clusters of different sizes. Cluster size 1 refers to unclustered patients. (B) Cluster size by lineage. The p values are for the comparison of each lineage with lineage-4 (Wilcoxon rank sum test). (C) Relationship between number of SNPs between individuals and the time interval between disease onset in each individual of the pair. (Random noise has been introduced to allow multiple similar results to be visualized.) Linear regression gives r2 = 10%, p < 0.001, slope 0.26 SNPs per year (95% CI 0.21–0.31). (D) Number of SNPs between individuals in clusters, by lineage. The p values are for the comparison of each lineage with lineage-4 (Wilcoxon rank sum test).

https://doi.org/10.7554/eLife.05166.009
Figure 4—figure supplement 1
Relationship between number of SNPs and the number of days between samples from individuals with more than one specimen available from the same of episode of disease or from a relapse.

For each individual, we selected the first and last specimens if there were more than two. (Random noise has been introduced to allow multiple similar results to be visualized.) The slope is given in SNPs/year.

https://doi.org/10.7554/eLife.05166.010
Figure 4—figure supplement 2
Relationship between number of SNPs and the number of days between dates of disease onset for transmissions identified from the network, by lineage.

(Random noise has been introduced to allow multiple similar results to be visualized.) The slopes are given in SNPs/year.

https://doi.org/10.7554/eLife.05166.011

Tables

Table 1

Characteristics of patients included in the analysis and distribution of lineages

https://doi.org/10.7554/eLife.05166.004
LineageOverallp*
1234
Overall269 (16.0)74 (4.4)205 (12.2)1139 (67.5)1687
Age
 <209 (12.3)7 (9.6)9 (12.3)48 (65.7)73
 20–2946 (10.3)26 (5.8)48 (10.7)327 (73.2)447
 30–39109 (18.4)17 (2.9)81 (13.7)386 (65.1)593
 40–4961 (19.8)18 (5.8)39 (12.7)190 (61.7)308
 50+44 (16.5)6 (2.3)28 (10.5)188 (70.7)2660.001
Sex
 Female130 (14.6)47 (5.3)94 (10.6)617 (69.5)888
 Male139 (17.4)27 (3.4)111 (13.9)522 (65.3)7990.02
Year
 1995–199855 (15.5)8 (2.3)29 (8.2)263 (74.1)355
 1999–200143 (11.5)23 (6.1)43 (11.5)266 (70.9)375
 2002–200480 (19.4)22 (5.3)54 (13.1)257 (62.2)413
 2005–200754 (17.4)11 (3.5)44 (14.2)202 (65.0)311
 2008–201037 (15.9)10 (4.3)35 (15.0)151 (64.8)2330.004
TB type
 Smear+212 (17.3)52 (4.3)156 (12.8)804 (65.7)1224
 Smear−46 (12.1)19 (5.0)38 (10.0)276 (72.8)379
 Extrapulmonary11 (13.1)3 (3.6)11 (13.1)59 (70.2)840.1
HIV status
 Negative47 (10.8)23 (5.3)57 (13.0)310 (70.9)437
 Positive148 (19.3)28 (3.6)107 (13.9)486 (63.2)7690.001
Previous TB
 No251 (16.7)66 (4.4)171 (11.4)1019 (67.6)1507
 Yes18 (10.0)8 (4.4)34 (18.9)120 (66.7)1800.007
Isoniazid resistance
 Resistant20 (17.2)0 (0.0)21 (18.1)75 (64.7)116
 Sensitive244 (15.9)74 (4.8)181 (11.8)1033 (67.4)15320.03
Residence
 Karonga198 (16.4)53 (4.4)148 (12.3)806 (66.9)1205
 Malawi48 (16.6)13 (4.5)32 (11.1)196 (67.8)289
 Other country11 (11.5)7 (7.3)17 (17.7)61 (63.5)960.4
Birth place
 Karonga174 (17.0)46 (4.5)135 (13.2)667 (65.3)1022
 Malawi55 (16.3)14 (4.1)31 (9.2)238 (70.4)338
 Other country34 (11.7)14 (4.8)37 (12.7)206 (70.8)2910.2
  1. *

    From Χ2 comparison between lineages.

Table 2

Characteristics associated with disease due to recent infection

https://doi.org/10.7554/eLife.05166.012
CharacteristicLinked/TotalAssociation with links (unadjusted)p (lrtest)Adjusted for age, sex, year, lineageAdjusted for other variables included in model*p (lrtest)
n/N%OR (95% CI)OR (95% CI)OR (95% CI)
Overall409/107438.1
Lineage
 156/18330.60.76 (0.53–1.1)0.81 (0.57–1.2)0.81 (0.57–1.2)
 234/5265.43.2 (1.8–5.9)3.0 (1.6–5.4)3.2 (1.7–5.8)
 358/12945.01.4 (0.96–2.1)1.5 (1.0–2.2)1.5 (1.0–2.2)
 4261/71036.81<0.00111<0.001
Age
 <2019/3665.82.9 (1.4–6.0)2.5 (1.2–5.4)2.6 (1.2–5.6)
 20–29113/27645.81.8 (1.2–2.7)1.6 (1.1–2.5)1.8 (1.2–2.8)
 30–39152/40439.61.5 (1.0–2.3)1.5 (0.99–2.2)1.6 (1.0–2.3)
 40–4981/20144.21.7 (1.1–2.7)1.0 (1.0–2.6)1.7 (1.1–2.6)
 50+44/15733.510.007110.03
Sex
 Female229/57539.81
 Male180/49936.10.85 (0.67–1.1)0.050.93 (0.72–1.2)0.94 (0.72–1.2)0.4
Year
 1999–2001141/31145.3111<0.001
 2002–2004117/32236.30.69 (0.50–0.95)0.73 (0.52–1.0)0.69 (0.50–0.97)
 2005–200792/24437.70.73 (0.52–1.0)0.78 (0.55–1.1)0.70 (0.49–1.0)
 2008–201059/19730.00.52 (0.35–0.75)0.0010.53 (0.36–0.77)0.48 (0.32–0.70)
TB type
 Smear-positive pulmonary312/82138.011
 Smear-negative pulmonary97/25338.31.0 (0.76–1.4)0.90.95 (0.71–1.3)
HIV status
 HIV−102/28336.01
 HIV+ no ART173/43639.71.2 (0.85–1.6)1.1 (0.75–1.5)
 HIV+ on ART27/7735.10.96 (0.56–1.6)0.51.0 (0.56–1.8)
INH resistance
 No375/97938.311
 Yes28/6443.81.3 (0.75–2.1)0.41.4 (0.81–2.3)
 Unknown
Recent residence
 Karonga328/81640.2110.005
 Other Malawi56/17631.80.69 (0.49–0.98)0.58 (0.41–0.84)0.58 (0.40–0.84)
 Other country16/5429.60.63 (0.34–1.1)0.040.48 (0.26–0.91)0.48 (0.26–0.91)
Birth place
 Karonga267/65940.511
 Other Malawi81/22735.70.81 (0.60–1.1)0.79 (0.57–1.1)
 Other country59/18032.80.72 (0.51–1.0)0.10.67 (0.47–0.97)
  1. In this analysis individuals are defined as linked (‘backwards links’) using the cut-offs described in the text and if the closest link was with a patient within the previous 5 years. Extrapulmonary, recurrent cases, and cases before 1999 were excluded. Odds ratios (OR) calculated using logistic regression.

  2. *

    In this model a dummy variable was used for the 32 individuals with missing data on recent residence.

  3. Test for trend.

Table 3

Characteristics associated with transmissibility

https://doi.org/10.7554/eLife.05166.013
CharacteristicAny Linked/TotalAssociation with linkspAdjusted for age, sex, year, lineage, smear statusp (lrtest)
n/N%OR (95% CI)OR (95% CI)
Overall431/134632.0
Lineage
 159/21727.20.87 (0.63–1.2)0.94 (0.66–1.3)
 227/6144.31.7 (1.0–2.7)1.9 (1.1–3.2)
 365/15442.21.6 (1.2–2.3)1.9 (1.4–2.7)
 4280/91430.610.0061<0.001
Age
 <2020/5040.02.3 (1.2–4.4)1.9 (0.98–3.7)
 20–29134/34938.42.3 (1.5–3.3)2.2 (1.5–3.3)
 30–39159/49032.51.7 (1.2–2.5)2.0 (1.3–2.9)
 40–4971/23829.81.6 (1.0–2.4)1.7 (1.1–2.7)
 50+47/21921.51<0.00110.002
Sex
 Female239/71833.311
 Male192/62830.60.87 (0.69–1.1)0.20.93 (0.73–1.2)0.5
Year
 1995–1998159/31450.611
 1999–2001119/34534.50.49 (0.36–0.66)0.42 (0.31–0.58)
 2002–200495/38924.40.30 (0.22–0.41)0.27 (0.19–0.37)
 2005–200758/29819.50.22 (0.16–0.32)<0.0010.20 (0.14–0.29)<0.001
TB type
 Smear pos pulm338/100333.711
 Smear neg pulm93/34327.10.72 (0.55–0.94)0.010.73 (0.55–0.96)<0.001
HIV status
 HIV−91/31828.611
 HIV+ no ART170/54031.51.1 (0.83–1.5)1.1 (0.81–1.6)
 HIV+ on ART11/4822.90.70 (0.35–1.4)0.31.4 (0.62–3.1)0.6
Previous TB
 No391/120032.611
 Yes40/14627.40.77 (0.53–1.1)0.20.85 (0.58–1.3)0.4
INH resistance
 No402/123732.511
 Yes29/10029.00.86 (0.55–1.3)0.50.86 (0.54–1.4)0.5
Recent residence
 Karonga284/94230.211
 Other Malawi80/23434.21.2 (0.89–1.6)1.0 (0.74–1.4)
 Other country20/7427.00.88(0.52–1.5)0.40.57 (0.33–0.98)0.09
Birth place
 Karonga276/81134.011
 Other Malawi80/27229.40.83 (0.62–1.1)0.82 (0.60–1.1)
 Other country64/23427.40.77 (0.56–1.1)0.20.71 (0.51–0.99)0.08
  1. The numbers of likely transmissions (‘forward links’) were compared by individual characteristics using ordered logistic regression. Extrapulmonary cases and cases occurring after 2007 were excluded.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. JA Guerra-Assunção
  2. AC Crampin
  3. RMGJ Houben
  4. T Mzembe
  5. K Mallard
  6. F Coll
  7. P Khan
  8. L Banda
  9. A Chiwaya
  10. RPA Pereira
  11. R McNerney
  12. PEM Fine
  13. J Parkhill
  14. TG Clark
  15. JR Glynn
(2015)
Large-scale whole genome sequencing of M. tuberculosis provides insights into transmission in a high prevalence area
eLife 4:e05166.
https://doi.org/10.7554/eLife.05166