Pneumococcal genetic variability in age-dependent bacterial carriage

  1. Philip HC Kremer  Is a corresponding author
  2. Bart Ferwerda
  3. Hester J Bootsma
  4. Nienke Y Rots
  5. Alienke J Wijmenga-Monsuur
  6. Elisabeth AM Sanders
  7. Krzysztof Trzciński
  8. Anne L Wyllie
  9. Paul Turner
  10. Arie van der Ende
  11. Matthijs C Brouwer
  12. Stephen D Bentley
  13. Diederik van de Beek
  14. John A Lees  Is a corresponding author
  1. Department of Neurology, Amsterdam UMC, University of Amsterdam, Netherlands
  2. Department of Clinical Epidemiology, Biostatistics and Bioinformatics, University of Amsterdam, Netherlands
  3. Centre for Infectious Disease Control, National Institute for Public Health and the Environment, Netherlands
  4. Department of Pediatric Immunology and Infectious D, Wilhelmina Children's Hospital, Netherlands
  5. Epidemiology of Microbial Diseases, Yale School of Public Health, United States
  6. Cambodia Oxford Medical Research Unit, Angkor Hospital for Children, Cambodia
  7. Centre for Tropical Medicine and Global Health, Nuffield Department of Medicine, University of Oxford, United Kingdom
  8. Department of Medical Microbiology and Infection Prevention, Amsterdam UMC, Netherlands
  9. The Netherlands Reference Laboratory for Bacterial Meningitis, Netherlands
  10. Parasites and Microbes, Wellcome Sanger Institute, United Kingdom
  11. European Molecular Biology Laboratory–European Bioinformatics Institute, United Kingdom
  12. MRC Centre for Global Infectious Disease Analysis, Department of Infectious Disease Epidemiology, Imperial College London, United Kingdom
4 figures, 2 tables and 14 additional files

Figures

Figure 1 with 1 supplement
Serotype and strain (global pneumococcal sequence clusters [GPSC]) distribution by age and between cohorts.

Blue dots represent frequency of serotype and strain in child carriage, yellow dots represent frequency in adult carriage. Red and green dots show odds ratio of prevalence in children in the Maela and Dutch cohorts, respectively, on a log scale for serotype. Lines show differences. Top row: dominant serotypes, ordered by presence in cohort, and internally by overall frequency. Vaccine serotypes shown in red. (A) Serotype frequency in the Dutch cohort. (B) Serotype frequency in the Maela cohort. (C) Comparison of adult/child log odds in each cohort for serotype. Second row: dominants strains (GPSCs), ordered by presence in cohort, and internally by overall frequency. (D) Strain frequency in Dutch cohort. (E) Strain frequency in Maela cohort. (F) Comparison of adult/child log odds in each cohort for strain.

Figure 1—figure supplement 1
Histogram for child age (in months) in (A) Dutch cohort (red bars) and (B) Maela cohort (blue bars).

Due to the differences in sampling strategies in the studies from which the samples were obtained, children in the Dutch cohort were sampled at ages 9–11 months and 22–24 months, while in the Maela cohort children were sampled in the age range 1–24 months.

Figure 2 with 1 supplement
Phylogenetic tree of carriage samples from both cohorts.

The rings show metadata for the samples. Depicted from inside to outside, these are serotype, sequence cluster (global pneumococcal sequence clusters [GPSC]), age, and source (Maela, Netherlands). Scale bar: 0.013 substitutions per site. An interactive version is available at here (project link available here).

Figure 2—figure supplement 1
Phylogenetic tree of carriage samples from both cohorts.

The rings show metadata for the samples. Depicted from inside to outside, these are presence or absence of the unitig upstream to the aSec gene, age (adult or child), and source (Maela or Netherlands). Scale bar: 0.013 substitutions per site.

Prediction of host age from pan-genomic variation in each cohort.

The smoothed receiver-operating characteristic (ROC) curve based on a linear predictor (elastic net fitted to unitigs, with strains used as folds for cross-validation) is shown. Area under the curve (AUC) is 0.5 for no predictive ability and 1 for perfect prediction.

Figure 4 with 2 supplements
Association of variants after meta-analysis with carriage age 0–24 months.

(A) Minus log-transformed p-value on the y-axis and position of unitig and single-nucleotide polymorphism (SNP) variants on the S. pneumoniae genome on the x-axis (Manhattan plot). (B) Minus log-transformed p-value on the y-axis and sorted lowest to highest p-value for rare variant burden in genes (purple) and clusters of orthologous genes (COGs, blue) on the x-axis.

Figure 4—figure supplement 1
Association of variants in the Dutch cohort with carriage age 0–24 months.

(A) Minus log-transformed p-value on the y-axis and position of unitig and single-nucleotide polymorphism (SNP) variants on the S. pneumoniae genome on the x-axis (Manhattan plot). (B) Minus log-transformed p-value on the y-axis and sorted lowest to highest p-value for rare variant burden in genes (purple) and clusters of orthologous genes (COGs, blue) on the x-axis. Variants of interest have annotations added.

Figure 4—figure supplement 2
Association of variants in the Maela cohort with carriage age 0–24 months.

(A) Minus log-transformed p-value on the y-axis and position of unitig and single-nucleotide polymorphism (SNP) variants on the S. pneumoniae genome on the x-axis (Manhattan plot). (B) Minus log-transformed p-value on the y-axis and sorted lowest to highest p-value for rare variant burden in genes (purple) and clusters of orthologous genes (COGs, blue) on the x-axis. Variants of interest have annotations added.

Tables

Table 1
Chi-squared values for serotypes in the Dutch and Maela cohorts and the age group that the serotype is affiliated with.
SerotypeDutch cohortMaela cohort
χ2 p-valueAge groupχ2 p-valueAge group
Non-typeable0.188Adults3.0 × 10–4Adults
19A0.089Children0.690Children
11A0.591Children0285Adults
19F1Adults0.131Children
6C0.022Children1Adults
6B0.099Children0.040Children
35F0.279Children0.100Children
32.5 × 10–5Adults0.129Adults
6A0.709Children1Children
23A1Adults--
15B0.023Children--
17F0.943Children--
23B0.727Children--
10A0.155Adults--
15C1.000Adults--
35B0.775Adults--
22F1Adults--
33F0.132Adults--
23F--0.040Children
14--0.949Children
35C--0.961Children
34--0.690Children
13--0.756Adults
10B--0.756Adults
4--0.966Children
5--0.710Adults
33B--1Children
28F--0.652Children
19B--0.710Adults
7F--0.971Children
20--0.971Children
18C--1Adults
  1. χ2, chi-square; -, not applicable.

Table 2
Chi-squared values for strains in the Dutch and Maela cohorts and the age group that the strain is affiliated with.
GPSCDutch cohortMaela cohort
χ2 p-valueAge groupχ2 p-valueAge group
600.568Adults0.727Adults
40.298Children--
30.392Adults--
70.858Children--
110.03Children--
35 and 360.617Adults--
290.049Children--
460.563Children--
750.666Adults--
190.978Children--
121.2 × 10–4Adults--
441Adults--
240.094Children--
491Children--
1090.817Adults--
160.249Adults--
382.1 × 10–4Adults--
1460.489Children--
991Children--
150.22Adults--
42--0.134Children
1--0.276Adults
28--0.110Children
73--0.253Children
10--0.777Adults
9--1Children
30--0.993Children
20--7.0 × 10–3Adults
128--0.042Children
66--1Children
87--0.450Adults
63--1Adults
45--0.129Adults
130--1Adults
74--0.040Adults
149--0.686Adults
8--0.364Children
25--1Adults
187--0.371Adults
154--1Adults
118--0995Children
110--0.995Children
106--0.073Adults
  1. χ2, chi-square; -, not applicable; GPSC, global pneumococcal sequence clusters.

Additional files

Supplementary file 1

Serotypes in the Dutch cohort, and the number of samples isolated from child or adult.

https://cdn.elifesciences.org/articles/69244/elife-69244-supp1-v2.xlsx
Supplementary file 2

Number of samples for each of the vaccine serotypes found in the Dutch cohort.

https://cdn.elifesciences.org/articles/69244/elife-69244-supp2-v2.txt
Supplementary file 3

Serotypes in the Maela cohort, and the number of samples isolated from child or mother.

https://cdn.elifesciences.org/articles/69244/elife-69244-supp3-v2.xlsx
Supplementary file 4

Strains (global pneumococcal sequence clusters [GPSCs]) in the Dutch cohort, and the number of samples isolated from child or adult.

https://cdn.elifesciences.org/articles/69244/elife-69244-supp4-v2.xlsx
Supplementary file 5

Strains (global pneumococcal sequence clusters [GPSCs]) in the Maela cohort, and the number of samples isolated from child or mother.

https://cdn.elifesciences.org/articles/69244/elife-69244-supp5-v2.xlsx
Supplementary file 6

Serotypes and strains (global pneumococcal sequence clusters [GPSCs]) in the subset of the Maela cohort with unique samples only, and the number of samples isolated from child or mother for each, including percentages.

https://cdn.elifesciences.org/articles/69244/elife-69244-supp6-v2.xlsx
Supplementary file 7

Serotypes and strains (global pneumococcal sequence clusters [GPSCs]) in the subset of the Maela cohort with unique paired (mother–child) samples only, and the number of samples isolated from child or mother for each, including percentages.

https://cdn.elifesciences.org/articles/69244/elife-69244-supp7-v2.xlsx
Supplementary file 8

Unitigs associated with carriage age in the Dutch cohort when not corrected for population structure of the bacterial population (lrt-p-value).

The other columns provide parameters of the regression line for the unitig. The final column (annotation) provides the location of the unitig in the Streptococcus_pneumoniae_D39V genome.

https://cdn.elifesciences.org/articles/69244/elife-69244-supp8-v2.txt
Supplementary file 9

Unitigs associated with carriage age in the Maela cohort when not corrected for population structure of the bacterial population (lrt-p-value).

The other columns provide parameters of the regression line for the unitig. The final column (annotation) provides the location of the unitig in the Streptococcus_pneumoniae_D39V genome.

https://cdn.elifesciences.org/articles/69244/elife-69244-supp9-v2.txt
Supplementary file 10

Unitigs represent the top hits for carriage age after meta-analysis of both cohorts.

These unitigs are not found in any currently available reference genome, but are found to be upstream of an accessory Sec-dependent serine-rich glycoprotein adhesin in a subset of samples from these cohorts.

https://cdn.elifesciences.org/articles/69244/elife-69244-supp10-v2.txt
Supplementary file 11

Statistics on the assembly of the sequences from the Dutch cohort.

https://cdn.elifesciences.org/articles/69244/elife-69244-supp11-v2.txt
Supplementary file 12

Sample name, sample accession, lane name, and lane accession in the European Nucleotide Archive for the sequences from the Dutch cohort.

https://cdn.elifesciences.org/articles/69244/elife-69244-supp12-v2.txt
Supplementary file 13

Sample name, sample accession, lane name, and lane accession in the European Nucleotide Archive for the sequences from the Maela cohort.

https://cdn.elifesciences.org/articles/69244/elife-69244-supp13-v2.txt
Transparent reporting form
https://cdn.elifesciences.org/articles/69244/elife-69244-transrepform1-v2.docx

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Philip HC Kremer
  2. Bart Ferwerda
  3. Hester J Bootsma
  4. Nienke Y Rots
  5. Alienke J Wijmenga-Monsuur
  6. Elisabeth AM Sanders
  7. Krzysztof Trzciński
  8. Anne L Wyllie
  9. Paul Turner
  10. Arie van der Ende
  11. Matthijs C Brouwer
  12. Stephen D Bentley
  13. Diederik van de Beek
  14. John A Lees
(2022)
Pneumococcal genetic variability in age-dependent bacterial carriage
eLife 11:e69244.
https://doi.org/10.7554/eLife.69244