1. Epidemiology and Global Health
Download icon

Improved characterisation of MRSA transmission using within-host bacterial sequence diversity

  1. Matthew D Hall  Is a corresponding author
  2. Matthew TG Holden
  3. Pramot Srisomang
  4. Weera Mahavanakul
  5. Vanaporn Wuthiekanun
  6. Direk Limmathurotsakul
  7. Kay Fountain
  8. Julian Parkhill
  9. Emma K Nickerson
  10. Sharon J Peacock
  11. Christophe Fraser
  1. University of Oxford, United Kingdom
  2. University of St Andrews, United Kingdom
  3. Sunpasitthiprasong Hospital, Thailand
  4. Mahidol University, Thailand
  5. University of Bath, United Kingdom
  6. University of Cambridge, United Kingdom
  7. Cambridge University Hospitals NHS Foundation, United Kingdom
Research Article
  • Cited 0
  • Views 1,119
  • Annotations
Cite this article as: eLife 2019;8:e46402 doi: 10.7554/eLife.46402

Abstract

Methicillin-resistant Staphylococcus aureus (MRSA) transmission in the hospital setting has been a frequent subject of investigation using bacterial genomes, but previous approaches have not yet fully utilised the extra deductive power provided when multiple pathogen samples are acquired from each host. Here, we used a large dataset of MRSA sequences from multiply-sampled patients to reconstruct colonisation of individuals in a high-transmission setting in a hospital in Thailand. We reconstructed transmission trees for MRSA. We also investigated transmission between anatomical sites on the same individual, finding that this either occurs repeatedly or involves a wide transmission bottleneck. We examined the between-subject bottleneck, finding considerable variation in the amount of diversity transmitted. Finally, we compared our approach to the simpler method of identifying transmission pairs using single nucleotide polymorphism (SNP) counts. This suggested that the optimum threshold for identifying a pair is 39 SNPs, if sensitivities and specificities are equally weighted.

Introduction

Whole-genome sequencing (WGS) has proved a valuable tool in the investigation of the transmission of infectious agents. Staphylococcus aureus, and in particular methicillin-resistant S. aureus (MRSA) has been a frequent subject, often in a hospital environment where it is responsible for a considerable burden of disease. The focus of previous studies has ranged from identifying bacterial spread between continents (Harris et al., 2010), to reconstructing the detailed timelines of outbreaks in single hospital wards (Köser et al., 2012; Harris et al., 2013). Several prospective studies have also been conducted in single settings (Price et al., 2014; Nübel et al., 2013).

In a previous study, Tong et al. (2015) acquired and sequenced MRSA isolates from the high-transmission setting of two intensive care units (ICUs) in a Thai hospital. All were of sequence type 239 (ST 239). They showed that bacterial genetic diversity in this single location provided evidence of at least five circulating clades in a state of temporal flux, with multiple clades colonising different subjects over the course of the study period. By sequencing a very large number of isolates from a single subject, they also demonstrated the existence of a considerable within-host ‘cloud of diversity’. As a result, they cautioned against using a single pathogen genome as a representative of the colonisation of a single subject.

In the time since the publication of that study, methodological work has emerged to demonstrate how phylogenetic reconstruction using multiple genomes per subject can be used to identify individuals involved in pathogen transmission or colonisation events, and to reconstruct the direction of that transmission (Romero-Severson et al., 2016). The software package phyloscanner (Wymant et al., 2017) was subsequently developed to perform analyses of this kind. Thus there are now additional reasons to perform multiple sampling beyond the cautionary point made by Tong et al.

An obvious extension of multiple sampling is to sample from different body sites of the same host, and from there investigate the dynamics of colonisation at the within-host level. The anterior nares are the primary ecological niche for human S. aureus colonisation (Kluytmans et al., 1997), but the sensitivity of detection of MRSA carriage based on nasal screening alone is only 68–75% (McKinnell et al., 2013). Colonisation of multiple anatomical sites excluding the nose, however, appears rare (Baker et al., 2010; Eveillard et al., 2006) and eradication of nasal S. aureus is often followed by disappearance of the bacterium from other sites (Parras et al., 1995; Reagan et al., 1991). The process by which S. aureus spreads between sites has been little studied.

Another potential investigation is the size of the transmission bottleneck of an MRSA strain during the colonisation of a given individual, in other words the quantity of genetic diversity that is passed from one subject to another at transmission. This is an important concern when transmission routes are to be reconstructed using pathogen genomes, but inference is challenging in the absence of previous studies of this nature for S. aureus. Most previous work has not had access to dense within-host sampling and has used one sequence per host, which severely limits what can be determined about the bottleneck. Recently, Worby et al. (2017) demonstrated that shared genomic variants, identified using deep sequencing data, can be a powerful tool in identifying transmission pairs, but conclude that they are much more useful where the bottleneck is wide, as shared variants are rare if it is narrow. Selection of a suitable method for identifying pairs therefore requires quantification of the bottleneck width.

As the availability of multiple samples per host in standard microbiological practice is limited, genetic studies of MRSA outbreaks have, at least until recently, usually used only a single pathogen sequence per patient (Harris et al., 2010; Harris et al., 2013; Nübel et al., 2013; Köser et al., 2012). The investigation of transmission between individuals has then usually relied on selecting a threshold for the number of single nucleotide polymorphisms (SNPs) separating isolates to identify possible transmission pairs, together with epidemiological investigation. Tong et al, in common with other studies (Young et al., 2012; Golubchik et al., 2013; Price et al., 2014), expressed caution that high S. aureus diversity in the source individual would reduce the sensitivity of this approach. Nevertheless, it may be the best available method for reasons of cost or circumstance. In those cases, the SNP threshold must be carefully chosen. The value used in past work has been between 23 and 40 SNPs (Price et al., 2014; Long et al., 2014; Azarian et al., 2015; Uhlemann et al., 2014), but at the same time pairwise distances as large as 40 SNPs have been encountered within-host (Golubchik et al., 2013). As sampling within-host diversity provides an alternative, more sophisticated method to identify transmission pairs, the results of an analysis utilising it can be used to test different threshold values, in order to aid researchers who have access only to single genomes.

In this paper, we return to the study previously described by Tong et al. We use a considerably expanded version of that dataset, which adds more extensive within-host sampling. Our objectives were to reconstruct transmission between patients in the study using the new methodological insights mentioned above, to examine the bottleneck at MRSA transmission, to investigate the movement of the bacterium between body sites of individual hosts (that is, its phyloanatomy [Salemi and Rife, 2016]), and to compare our identification of transmission pairs with those obtained using an SNP-based approach.

Results

Patients and setting

Recruitment to the study was described previously (Tong et al., 2015). Briefly, consenting patients admitted to two ICUs (a paediatric unit and a general adult surgical unit) at Sunpasitthiprasong Hospital, Ubon Ratchathani, Thailand, during a period of three months in 2008 were recruited. MRSA screening was performed on admission and then twice weekly until discharge. Nasal swabs and fingertip cultures were also taken from ICU health care workers (HCWs) at three time points. Each screen consisted of swabbing the anterior nares, throat (or endotracheal suction tube if intubated), axilla, catheter urine if catheterised, and wounds if present (including pressure sores). Microbiological culture and bacterial identification have also been described previously (Tong et al., 2015). Up to ten colonies on primary culture plates were saved per sample. All colonies from nasal swab cultures were selected for sequencing, together with up to one colony from cultures from each other positive body site. For one subject (an adult ICU patient designated T126), additional colonies were collected, up to a maximum of 29 for some time points, also as described previously (Tong et al., 2015).

The complete set of subjects screened consisted of 169 adult patients, 98 child patients and 37 HCWs. To the dataset of Tong et al., which comprised up to three sequences from each subject for a total of 76, we added 923 more, for a total of 999. Of these, four were excluded as suspected contaminants, leaving 995, from 55 subjects (compared to 51 in the previous study). Five subjects were HCWs, 21 were surgical ICU patients, and 29 were paediatric ICU patients. The number of sequences per subject ranged from 1 to 239 (mean 18). In addition, 20 sequences (one per patient) for isolates originating from an earlier study in the same hospital (Harris et al., 2010) were included for the purposes of comparison. All isolates belonged to ST 239.

Phylogeny reconstruction

The complete core chromosome sequence alignment was 3,043,210 base pairs in length. Bayesian phylogenetic analysis was conducted using ExaBayes 1.5 (Aberer et al., 2014). Figure 1 displays the 50% majority-rule consensus tree, rooted using the TW20 strain (Holden et al., 2010) as an outgroup. Eight clades are highlighted: those designated 1 to 5 are the five previously identified by Tong et al. (2015) but much enlarged by additional data, whereas clades 6 to 8 all include some sequences that were isolated tips in the phylogeny in that paper but are now part of larger clades. Of the 20 sequences from the earlier Harris et al. study, 16 were part of these clades (but generally basal tips within them) whereas the remaining four were isolated.

50% majority-rule consensus tree of the posterior distribution of ExaBayes phylogenies.

Branch lengths are in substitutions per site. Eight clades are highlighted. Clades 1 to 5 correspond to those identified by Tong et al. (2015), while the additional three are newly designated.

Identification of potential multiple colonisation events

The transmission process amongst the study subjects was investigated using phyloscanner v1.4.2 (Wymant et al., 2017). This tool, intended for the analysis of large genetic datasets of within- and between- host pathogen data, reconstructs the relative positions of hosts in the chain of transmission.

This reconstruction was performed with an awareness of the possibility that subjects may experience multiple independent infection or colonisation events, in which case each such event should be treated as a separate entity when investigating transmission. An initial phyloscanner investigation suggested that sequences from five subjects (designated T035, T099, T159, T271 and T327) formed two phylogenetic clades where the median patristic distance between clade MRCAs (across the ExaBayes posterior) was greater than 100 substitutions, sufficiently diverse that they were unlikely to be the product of single-strain colonisation events. The sequences from each of these were separated into two groups and each was subsequently treated independently. We will refer to links in the transmission chain as ‘colonisations’; each isolate belongs to a single colonisation. Numerical codes prefixed with a ‘T’ are used for study subjects and codes with ‘C’ for colonisations. In most cases the numbers after this prefix match; for example C009 is the sole colonisation of subject T009. For the five multiply-colonised subjects, a lowercase letter is used to differentiate them, for example C035a and C035b are the two colonisations of subject T035.

Trace colonisations

Of 60 identified colonisations from the 55 subjects, cultures for 33 came only from a single positive swab. Those for the remaining 27 were obtained either on at least two time points, from at least two body sites, or both. The median number of available sequences per colonisation was 1, but this rises to 14.5 with these singletons excluded. In 25 of the 33 single-swab colonisations, previous and/or subsequent examinations of the same patient did not identify MRSA belonging to the same colonisation, while in eight only one examination was performed. In what follows we refer to these 33 as trace colonisations.

When compared to those from multiple isolations, sequences from trace colonisations showed greater similarity to those from isolates collected from other study subjects at an earlier time. Figure 2 displays the distribution of the median number of SNPs separating the isolates in each colonisation from the most closely related isolate sampled on an earlier date. There was strong evidence supporting a difference in the distribution of these distances between trace and non-trace colonisations (Mann-Whitney U test p=0.018), suggesting that the nature of trace isolates could be fundamentally different to that of typical isolates from a non-trace colonisation, rather than being simply the product of colonisations of a similar nature subject to sparser sampling. A similar pattern was observed when the median SNP distance was calculated from only those isolates in a non-trace colonisation that were acquired at the time of the first positive swab (Figure 2—figure supplement 1) although in this case statistical support was lacking (p=0.119). For six trace colonisations (C104, C105, C223, C225, C270, and C271b), our subsequent phyloscanner analysis inferred an infector from amongst the patients already or previously present in the hospital at the time that the sample in question was taken. For the six patients corresponding to these colonisations, the positive swab was acquired on the day of or day after ICU admission, and within three days of hospital admission. Such a short time from hospital admission to a positive swab with a sample very similar to those already existing in the hospital was not observed for any non-trace isolates. Four of the six swabbed negative for MRSA on a subsequent occasion. These observations suggest that at least some of the trace category were of a different nature to colonisations providing multiple isolates, and that they potentially represent incidental, transient exposure of patients to strains in the overall hospital environment. The possibility of sample contamination also cannot be excluded in any individual example.

Figure 2 with 1 supplement see all
The median number of SNPs separating sequences from each colonisation from the most similar sequence from an isolate acquired before the date of the first positive sample from that colonisation.

Blue dots are colonisations isolated from a single swab only (trace colonisations), while red were acquired from colonisations where multiple swabs were isolated. For three colonisations (two trace) the first collection date was the commencement of the study and hence there was no such earlier isolate. The x-axis transfers to a log scale on the right of the dotted line.

Our main interest is in the transmission of established, rather than transient, colonisations, as these represent the bacterial reservoir from which further colonisations will be derived, meaning they should be prioritised when designing interventions for infection control. The only strong evidence available to us that a given sample is not the result of transient colonisation is the acquisition of multiple samples of the same strain. The confirmation of colonisation with multiple swabs also greatly reduces the possibility of contamination. As a result, we exercise caution and omit the trace category from consideration for the bulk of the main text of this paper. Versions with them included are presented as figure supplements.

Reconstructed transmission events

Phyloscanner leverages the signal that transmission leaves on the phylogenetic topology (Romero-Severson et al., 2016) in order to reconstruct the direction of transmission between colonised individuals. It works by performing ancestral state reconstruction on a phylogeny using maximum parsimony, using the set of colonisations as states. It identifies phylogenetic relationships between colonisations by examining the arrangement of subgraphs: contiguous blocks of phylogenetic nodes which all have the same reconstructed state. Pairs of colonisations that are closely related in the transmission chain are identified by proximity of their subgraphs and the direction of transmission between them inferred from their topological arrangement, as shown in Figure 3. We performed this procedure on the consensus phylogeny shown in Figure 1, but also, to allow for phylogenetic uncertainty, each of 100 random trees from the ExaBayes posterior.

phyloscanner identifies transmission pairs by ancestral state reconstruction of hosts (left) and subsequent classification of the topological relationships between the subgraphs reconstructed to each host (right).

In this example the hosts are designated (A to D). Here host A is inferred to be the infector of hosts (B and C). The transmission from (A to C) was of only a single pathogen lineage, while that from A to B was of two, with the result that host B has two subgraphs. The subgraph from host D forms a sibling clade to the rest of the phylogeny and, as a result, no inference is made about transmission.

Figure 4 superimposes the results of the reconstruction using the consensus phylogeny onto the timeline of patient stays in the hospital. The timings of positive and negative screening events are indicated by circles and crosses respectively. Horizontal lines represent hospital and ICU stays and are coloured by population (hospital ward or HCW). Reconstructed transmission events are indicated by grey arrows, appearing at a time indicating the upper bound for the date at which the transmission could have occurred (the earliest time of sampling amongst the tips in the recipient subgraph). Multiple arrows appearing between the same two subjects (for example between C012 and C159b) suggest the transmission of multiple lineages, either simultaneously or over a more extended period of time. All such events reconstructed here are within-ward, although four between-ward events were reconstructed with trace colonisations as recipients (see Figure 4—figure supplement 1). The great majority of events were consistent with the timeline of hospital and ICU stays, allowing that subjects may act as infectors subsequent to their departure from the premises due to environmental contamination or an unsampled intermediary carrier. Two exceptions are the descent of C271a from C327a when subject T271 left the hospital prior to subject T327’s arrival, and descent of C009 from C159a prior to the arrival of T159.

Figure 4 with 1 supplement see all
The reconstruction of the transmission process using the consensus tree overlaid on a timeline of hospital and ICU stays and sampling events.

Each row represents a colonisation, with thin lines representing the colonised subject’s presence in the hospital and thick lines their presence as a patient in an ICU. Colours of the lines and the y-axis labels indicate surgical ICU patients (green), paediatric ICU patients (red) and HCWs (blue). Crosses represent times of screens that were negative for MRSA, while circles those that returned positive swabs and sequenced isolates. The grey arrows represent reconstructed transmission events. These appear when at least one subgraph from the recipient is descended from an adjacent subgraph from the infector. Such a transmission may also involve unsampled intermediaries or the environment. The timings of these arrows represent the upper bound for the time at which they could have occurred rather than an exact estimate. The dotted vertical lines demarcate the period of sampling.

Epidemiologically plausible transmissions have high posterior support

To see if these apparently impossible reconstructed events were phylogenetically well-supported, we investigated whether the pattern persisted when the analysis was performed on the 100 sampled posterior trees. Specifically, for every subgraph of each host in each tree, we identified the last sampled colonisation in the transmission chain (if any), and checked whether that transmission was consistent with the timings of hospital stays and sample collection dates. Figure 5 shows the results. The transmission from C327a to C271a had posterior support of 1, but this would disappear entirely if a single tip from C271a was removed. Subgraphs from four other colonisations (and an additional six trace colonisations, see Figure 5—figure supplement 1) were the result of impossible infection events that had non-negligible support (in the range from 0.28 to 0.76). Two of these involved patient T159 as the source, and the remaining two involved the closely related triplet of C071, C092, and C099b.

Figure 5 with 2 supplements see all
Concordance of inferred infectors for each colonisation with recorded timings of hospital stays and sampling dates.

Each bar represents a colonisation, and the colours represent the proportions of the posterior set of trees where the transmission chain prior to that host involves no sampled subjects (blue), involves one or more sampled subjects all of which are possible given known timings of entry and departure to the hospital and sampling of isolates (green) and at least one sampled subject where the timings are in conflict, with the infector entering the hospital after isolates from the recipient were acquired (red).

We performed a separate analysis where the set of tips from each subject was randomly down-sampled to a maximum of five. This resulted in considerably more impossible reconstructed events (Figure 5—figure supplement 2).

Transmission clusters

Figure 6 presents the phyloscanner host relationship diagram, displaying the division of the 27 colonisations into 12 clusters of closely-related infections. (Clusters here indicate groups of colonisations linked by any number of reconstructed transmission events, and no genetic distance threshold was applied.) Links between colonisations are made up of three segments, whose colour represents the frequency of a given topological relationship between two colonisations in the ExaBayes posterior. The outer two, which have directional arrows, represent transmission in the direction of the arrow, while the central segment represents the ‘complex’ ancestral relationship (Wymant et al., 2017), which suggests a close genetic relationship for which the direction of transmission is unclear. Links are also labelled with the proportion of posterior trees featuring any one of these three relationships. To avoid excessive clutter in the figure, they are only displayed when this number is 0.5 or greater. Two of the large clusters revolve around the long-staying patients T012 in the paediatric ICU and T126 in the general ICU. 8 of the 12 clusters are singletons, representing colonisations not closely related to any other sample in this study. Figure 7 gives the equivalent diagram from an analysis where each subject’s isolates were further subdivided by the body site of origin; links with coloured backgrounds connect colonisations from the same patient. The same figures with trace colonisations included can be seen in Figure 6—figure supplement 1 and Figure 7—figure supplement 1. Separating colonisations by body site can clarify relationships, most strikingly for C249, whose close relationships with a large number of other colonisations, with no clear directionality, in Figure 6 is resolved into separate origins for the samples with an axillary origin, which show support for being transmitted from either C183 or C194, and the remainder, which are likely to have come from C197 or C358. Note that these diagrams are representations of the pairwise relationships between colonisations and do not attempt to resolve these into a single transmission history.

Figure 6 with 1 supplement see all
The phyloscanner host relationship diagram.

Each node represents all the sequences for one colonisation. Node fill colours designate patients in the two hospital ICUs and the HCWs. Edges appear where colonisations share a relationship with posterior support of at least 0.5 and consist of three elements: arrows representing transmission in either direction and a central line segment representing the ‘complex’ topological relationship, which is indicative of transmission but the direction is ambiguous. Each of these is coloured according to the proportion of posterior trees showing the corresponding relationship. Edges are also labelled with the overall posterior support for any topology suggesting transmission.

Figure 7 with 1 supplement see all
The phyloscanner host relationship diagram for a separate analysis where samples taken from distinct body sites on the same subject were treated as separate ‘hosts’.

Each node represents all the sequences for one colonisation of one body site. Node fill colours designate patients in the two hospital ICUs and the HCWs. Edges appear where colonisations share a relationship with posterior support of at least 0.5 and consist of three elements: arrows representing transmission in either direction and a central line segment representing the ‘complex’ topological relationship, which is indicative of transmission but the direction is ambiguous. Each of these is coloured according to the proportion of posterior trees showing the corresponding relationship. Edges are also labelled with the overall posterior support for any topology suggesting transmission, and edges connecting colonisations from sites from the same subject have a grey background. Nodes are annotated with colonisation IDs and a code for body site: A = axilla, C = endotracheal suction, N = nose, T = throat, W = wound.

Between-subject transmission bottlenecks are of very variable size

We investigated the size of the bottleneck at transmission for six pairs of colonisations in detail. (By ‘bottleneck’ we refer to the total number of genetic lineages passed from one colonisation to another; in this study we cannot differentiate between multiple lineage transmission at a single time point and the transmission of multiple single lineages at different points.) All six had a reconstructed transition in one or both directions in 95% of posterior trees, and, in the subgraphs involved in those transmissions, a posterior median of at least five tips in those belonging to the recipient. The latter condition ensured that a reasonable number of sequences were sampled from the recipient, to prevent the inference of narrow bottlenecks due simply to low sequence counts. These pairs are summarised in Table 1. The transition counts represent a lower limit on the number of lineages transmitted from one subject to another that are necessary to explain the phylogeny.

Table 1
Six pairs of colonisations for whom transmission was reconstructed with a posterior score of at least 0.95 and, for those reconstructed transmissions, a posterior median of at least five tips were in the recipient subgraphs.

Each row gives the posterior median and the limits of the 95% highest posterior density (HPD) interval for the number of reconstructed transmissions between the subjects, in either direction. The posterior median number of tips in the recipient subgraphs, and the nucleotide diversity amongst those tips, are also given. In five cases the inferred direction of transmission is clear, but for C183 and C194 it is not.

Colonisation AColonisation BNumber of A to B transitions 95 % HPDTips in descendant B subgraphs (median)Nucleotide diversity transmitted to B (median)Number of B to A transitions 95% HPDTips in descendant A subgraphs (median)Nucleotide diversity transmitted to A (median)Direction
MedianLowerUpperMedianLowerUpper
C012C137323791.46E-060000NAA to B
C012C159b342538392.20E-0610611.40E-06A to B
C126C234333101.92E-060000NAA to B
C126C271a111213.11E-070000NAA to B
C126C327a54592.68E-060000NAA to B
C183C19415123313.41E-075019127.08E-07Unclear

Two very long-stay ICU patients, T012 and T126, both appeared to transmit to other patients through bottlenecks of quite variable strength. C012 transmitted to C137 through a relatively narrow bottleneck and C159b through an extremely wide one, and the corresponding patients’ ICU stays each overlapped with T012 for at least four weeks, with both occupying a neighbouring bed to T012 for some (but not all) of that time. C126 transmitted a single lineage to C271a in every posterior tree, but showed a wider bottleneck in transmitting to both C234 and C327a. Subject T271 occupied the adjacent bed to subject T126 for 10 days but neither T234 nor T327 ever did.

Finally, C183 and C194 had such similar sequences that the reconstruction suggested back-and-forth movement of lineages. However, this may be due to poor phylogenetic resolution rather than the literal truth, especially as subjects T183 and T194 were never simultaneously present in the ICU. A third patient in that ICU, T178, also swabbed positive with a strain identical to some found in both T183 and T194 and one bed was at different times occupied by all three subjects, while a fourth, T249, was also colonised with a closely-related strain and occupied a bed which was later briefly used by T183. We lack the resolution necessary to definitively determine the order of events, but it seems most likely that, if there was no back-and-forth transmission, the transmission bottleneck was wide.

Table 1 also presents figures for the nucleotide diversity in the recipient population for these six events. This shows no obvious correlation with bottleneck size, indicating that the diversity in a sample may have be transmitted, but may also have arisen within-host.

The majority of subjects showed no evidence of phylogeny/body site correlation

23 study subjects had tested positive for MRSA colonising at least two different body sites. In every case, one site was the anterior nares. Two subjects (T178 and T322) provided only a single sequence from each of two sites, which rendered them unsuitable for further investigation as only one phylogenetic topology is possible under that circumstance. To investigate phylogeny-site associations in the remaining 21 subjects, we ran ExaBayes on the sequences from each separately and analysed the results using BaTS (Parker et al., 2008), to identify deviations from the null hypothesis of no phylogeny-site association according to the association index (AI), parsimony score (PS) and the size of largest monophyletic clade (MC) for each trait.

Deviation from the null in the AI statistic indicates the presence of phylogenetic nodes whose descendants are more frequently associated with some sites than would be expected by chance. This is the most sensitive of the three statistics to the existence of an association, but such an association does not imply the existence of the distinct monophyletic clades that would be expected if sites were separately colonised by different lineages, or if a single lineage from one site was transmitted to another. For the PS and the MC statistics, deviation from the null is much more suggestive of site-specific monophyletic clades. However, the PS statistic is invariant if all but one tip in the tree comes from the same site, as is the MC statistic for any site with just a single tip, and hence these statistics are useless in this scenario. If some sites have only one tip and the AI statistic shows significant deviation from the null, then further examination is required to check if, for example, the tip in question is basal. See Supplementary file 1 for some example phylogenies and statistic values.

Table 2 summarises the results of this analysis. For 11 of 21 subjects there was no evidence of any association between a tip’s position in the phylogeny and the site the corresponding sequence was acquired from. For another six, the only statistic to show deviation from the null at the p=0.05 level was the AI. The lack of significance for the PS suggested that no site-specific monophyletic clades were present, but this cannot identify divergent singleton tips. Examination of the ExaBayes consensus trees by eye revealed singletons for only one patient, T271. In this case the divergence (431.2 SNPs) of a single isolate from an endotracheal suction tube from the remaining isolates was so large that it had already been taken into account by the separation of the patient’s sequences into colonisations C271a and C271b; the trace colonisation C271b is the single endotracheal sequence.

Table 2
Results of the investigation of body site/phylogeny associations.

Each row corresponds to a single BATS analysis on a posterior set of phylogenies consisting just of the sequences from that subject. The p-values are given for the association index (AI), parsimony score (PS) and largest monophyletic clade (MC) size for all body sites present in the dataset. Values with an asterisk (*) correspond to statistics whose values are identical under both hypotheses due to singleton sequences from some sites.

SubjectNumber of Sequencesp-value
AIPSMC
AxillaNoseThroatTracheaWound
T009310.02611*0.0311
T0122230.240.18110.14
T06511<0.0011*0.291*
T071130.1610.161
T092120.035111
T095110.0991*0.11*
T0991911*0.161*
T126239<0.001<0.0010.0040.10.0041*
T137790.00410.1211*
T159460.711*11
T183480.0411*0.221
T1881911*11*
T194320.1311*10.371*1*
T197350.7411111
T23211<0.001<0.0010.0010.001
T2341011*11*
T249140.002<0.0010.0170.0161
T271230.04311*0.4911*1*
T3271011*0.511*
T330811*0.261*
T358220.00410.061

The remaining four subjects are T009, T126, T232 and T249, where the ExaBayes topologies were analysed for the signals of separate colonisation events and single lineage transmission from one site to another. Consensus trees are displayed in Figure 8. Posterior support for either signal in the tree sets for T009 and T126 was negligible or absent (less than 0.05). For T249, the axial samples formed a separate clade with posterior support of 1, but trees in which these were nested within the remaining diversity were absent from the posterior; it can be seen from Figure 7 that these probably originated with a different subject from the rest. For T232 there was considerable support for both nasal and throat samples forming clades (0.712 and 0.513) respectively, and also for the diversity of each being nested in that of the other (0.514 and 0.482).

50% majority-rule consensus phylogenies for the ExaBayes phylogenetic analyses of the sequences from patients T009, T126, T232 and T249.

Tips are coloured by body site of origin. Branch lengths are in substitutions per site. Trees were rooted using the TW20 outgroup (not shown).

Ultimately, only three of the 21 subjects (T232, T249 and T271) had phylogenies consistent with separate colonisation events affecting different body sites or groups of sites. Only T249 exhibited one nested clade from a single site, which would be indicative of a single transmission from another site through a narrow bottleneck, but statistical support for this remains limited.

Identification of transmission pairs from single genomes

Previous studies have not had access to such rich within-host sampling, and have frequently identified transmission pairs using SNP distance between individual isolates from hosts. As our data admits a more powerful approach based on the topology, we compared our results to those from a hypothetical study where only a single sequence is available from each subject. We used phyloscanner as a gold standard for identification of transmission pairs, and pairs of subjects that were definitively not transmission pairs. For the latter we used posterior support of less than 0.05 for having adjacent subgraphs and a topological relationship implying ancestry. We used two standards for positively identifying pairs. The first was in inverse of the previous, that is support greater than 0.95 for having adjacent subgraphs and a topological relationship implying ancestry. This version, however, has some problems as a gold standard as it does not attempt to eliminate the possibility of unsampled intermediaries in the transmission chain. For a second version where these are much less likely, we used support greater than 0.95 for having adjacent subgraphs and a topological relationship that implied the transmission of multiple lineages. This is the equivalent of the ‘PP’ topology of Romero-Severson et al. (Romero-Severson et al., 2016) which was shown to be highly suggestive of direct transmission. (Intuitively, for a missing intermediary to be present in this scenario, multiple lineages would need to be transmitted at least twice, which is a rarer event than repeated transmission of at least one lineage.) This classification, however, means that subjects from whom only trace colonisations were identified have to be excluded, as it is impossible to reconstruct multiple lineage transmission if one subject contributes only a single tip to the phylogeny (or both do). As this is an analysis from which there is little reason to remove trace colonisations, which would not be identifiable in our hypothetical single-sample study, we present both versions.

For each subject, we used the samples taken at the first positive swab as potential samples from a single-sequence study. For each pair of these acquired from different subjects, we calculated the raw SNP distance and used these figures to evaluate the performance of SNP distance as means to determine transmission pairs and members of clusters, using the above two measures as gold standards. The results can be seen in Figure 9. Subfigure A shows how the sensitivity and specificity vary with the threshold. To prevent bias in these results due to variable sample counts per subject, these figures are weighted by the probability of (uniformly) selecting a given pair of sequences from all those available from the same pair of subjects. (The maximum threshold shown in this figure is 60 SNPs, but see Figure 9—figure supplement 1 for the range up to 500 SNPs.)

Figure 9 with 1 supplement see all
Performance of SNP distance as a method for identifying transmission pairs.

(A) Plots of the sensitivity and specificity of using the number of SNPs to identify transmission pairs, for different distance thresholds. (B) ROC curves plotting true positive rate (sensitivity) against false positive rate (1-specificity) for different thresholds. The curve is annotated with selected threshold values. The gold standard for identifying transmission pairs in the version on the left is a topological relationship suggesting at least one transmitted lineage, while on the right at least two are required, a criterion which will occur much less often if there is a missing intermediary in transmission.

For both gold standards, SNP distance has generally good specificity for reasonable thresholds, staying above 0.75 for less than 140 SNPs. Sensitivity reaches 0.75 at a threshold of 16 SNPs for the version which is indifferent to unsampled intermediaries, but this requires 37 SNPs for the strict version.

Subfigure 9B shows receiver operating characteristic (ROC) curves for the same investigation. The area under the curve statistic is 0.872 for the first gold standard and 0.836 for the second, while the value of the threshold which minimises the distance to the upper-right corner of the plot (if sensitivities and specificities are equally weighted) is 39 SNPs for both analyses.

Discussion

In this study, we used a large genetic dataset including within-host diversity to reconstruct MRSA transmission in a hospital setting, utilising new methods that investigate the phylogenetic topology. We found great variation in the size of the transmission bottleneck between study subjects, and that in the majority of cases there was no suggestion of an association between the phylogenetic placement of an isolate and the body site that it was acquired from. Finally, we determined SNP thresholds for identifying transmission pairs with high sensitivity and specificity.

33 out of 60 colonisations consisted of a single swab (trace colonisations). If the nature of these was precisely the same as the multiply-isolated colonisations and they were merely sampled less often, one would expect to see no difference in the distribution of genetic distances to isolates already identified in the hospital between the two groups. We did not find this to be the case. One explanation for this would be transient carriage, due to incidental exposure of patients to bacterial lineages already well established in the hospital environment. Indeed, thirteen trace colonisations came from patients who underwent a subsequent screen with negative results (and five underwent several such screens, which never occurred at all for non-trace colonisations). Previous studies have found transient carriage to be quite common in both patients (Bradley et al., 1991) and HCWs (Cookson et al., 1989). However, it remains impossible to rule out the possibility that any individual example was the result of a contamination event leading to the source individual being wrongly identified, especially given the very short times from hospital admission to a positive swab that are implied for six subjects. This leads us to recommend that in future phylogenetic studies, positive swab results should be verified by undertaking multiple sampling wherever possible.

The concordance of the order of colonisation given by the phylogenetic topology and the known dates of hospital stays was generally good; with the exception of one case (C271a) which plausibly involved contamination, those transmissions that contradicted the reported timings of stays and MRSA screens did not achieve posterior support above 0.76 and a threshold of 0.95 would comfortably eliminate them.

The role of variable sample counts per colonisation in an analysis of this sort is complex. The relative number of tips derived from one colonisation is information that the parsimony reconstruction will use in determining the source and recipient in a putative transmission pair; the colonisation with more tips is more likely to be reconstructed as the source in situations of phylogenetic uncertainty. While this may appear at first glance to be a bias, we see here (Figure 5—figure supplement 2) that removing the effect by equalising counts leads to a much poorer reconstruction. This is because sample count is associated with a variable, ICU stay length, which is itself associated with source status. Adjusting for it is hence adjusting for a variable on the causal pathway between source status and identification as a source, which is inappropriate. As methodologies of this sort are likely to become more common in future, this issue is a candidate for further investigation.

Clear from the between-subject relationship diagram (Figure 6) is the pivotal role played by the two long-term ICU residents, T012 and T126, as sources of colonisation. The former appears to be the originator of colonisations in at least two other subjects (and perhaps as many as six), and the latter three (up to five). When treating separate body sites as separate hosts (Figure 7), the direction of transmission was usually away from nasal colonisations. This is consistent with the role of the anterior nares as the ecological niche of S. aureus (Kluytmans et al., 1997). However, in our study the denser sampling from nasal sites likely also introduced some bias.

In over half of the subjects who screened positive for MRSA on more than one body site, there was no evidence for any site/phylogeny correlation. Some of the individual patient datasets involved were small and contained very few non-nasal isolates, in which cases the lack of detected structure may be due to lack of diversity. However, the correlation was absent even in subject T012, who was in the ICU throughout almost the entire study period, providing 223 isolates from three sites (and at least seven from each). Even for patient T126, where the phylogeny (see Figure 8) shows distinct axilla and endotracheal suction tube clades, other tips from those same sites appear distributed amongst nasal sequences. While further investigation of between-site colonisation using a sampling protocol designed specifically for the task would be advisable, these results suggest that the establishment of a colony due to the transmission of a single lineage between sites, or the colonisation of distinct sites with distinct lineages, is quite rare, at least in a setting of this sort. It has been observed that eradication of nasal S. aureus frequently results in the disappearance of the bacterium from other sites (Parras et al., 1995; Reagan et al., 1991), and also that extranasal colonies are significantly less genetically diverse than nasal colonies (Harkins et al., 2018). Both observations suggest that the organism is readily and regularly spread from the anterior nares to other parts of the body, but that the resulting extranasal colonisations are frequently transient. Our results are consistent with this scenario.

While the bottleneck in colonisation of one body site from another is more often than not wide, that involved in subject-to-subject transmission was very variable amongst the five pairs examined. In one case only a single lineage appeared to be transmitted, while in others it was very wide indeed. This is perhaps unsurprising in a bacterial infection spread by contact, as the circumstances under which colonisation occurs, and the quantity of bacteria displaced, may vary greatly. There did not appear, in this small sample, to be any suggestion of an association between bottleneck width and the proximity of patient beds in an ICU. It appears that the size of the bottleneck can be very variable even in a single setting. This suggests that the shared genomic variant methodologies outlined by Worby et al. (2017) may be of limited use because they reconstruct sources of infection only infrequently when the bottleneck is tight, and that phylogenetic approaches that analyse multiple isolates per individual, of the type implemented in phyloscanner, are to be preferred as they enable reconstruction regardless of the bottleneck size. Short-read deep sequencing, however, is unsuitable for the reconstruction of within-host phylogenies in bacteria as their slow mutation rates do not provide sufficient phylogenetic resolution in segments of the genome of such a short length, while the approach of sequencing multiple colony picks that we employ here may be unsuitable for routine deployment for cost reasons. Ideally, a full bacterial genome could be sequenced without the intermediate step of growing the organism in culture, an approach which could be possible using single-cell genomics.

The SNP cut-off approach to identifying pairs has already been identified as potentially inadvisable; Köser et al. (2012), in investigating a British neonatal outbreak, identified one outbreak strain of seven as a ‘hypermutator’ whose genetic distance from the remainder was an outlier. Nevertheless, not every investigation will have access to the means to do anything else. If sensitivity and specificity are considered equally important, then we recommend 39 SNPs as a threshold, which was suggested by both our gold standard sets. We would caution, however, that positive results are liable to lack predictive power at any SNP threshold in virtually any population survey unless the probability of a false positive test is extremely low. This is because, of the set of all pairs of subjects, the great majority will not be transmission pairs. In the ideal situation of a completely sampled transmission chain of n individuals, only n-1 of the n(n-1)/2 pairs are transmission pairs, and for a positive test to be even 50% predictive of a positive result requires a false positive rate of at most 2/(n-2). In large studies picking a threshold to fulfil this may in turn require unacceptably low sensitivity, and it should be noted that this is the best case scenario, in which every transmission pair is sampled. Ideally, sampling of within-host diversity and interrogation of the phylogenetic topology would be preferred to the use of a crude threshold, if this is feasible in a particular setting and using available resources.

A clear limitation of these results is their applicability to other settings, particularly to hospitals with greater (or poorer) resources for infection control. Better procedures would be expected to decrease the size of within-hospital transmission clusters, and also decrease the size of the bottleneck between both patients and body sites. In addition, our sampling was highly biased towards nasal isolates. We also lacked access to isolates from the general, extra-hospital, population that might have been used for phylogenetic comparison purposes. Finally, the sequencing protocol used here is expensive and replicating it will not be feasible in many settings.

This study demonstrates the additional insights into bacterial transmission that may be gleaned when sequencing multiple isolates per host. This is obviously essential when considering questions of bottleneck size and phyloanatomy, but also offers advantages in investigating the direction of between-host transmission. As costs decrease, it may become increasingly feasible to design and conduct studies with sampling frames that accommodate multiple sampling, and also to use it for investigations in clinical practice.

Materials and methods

Sequencing

Request a detailed protocol

DNA was extracted, libraries prepared and 100 bp paired end sequences determined for 2320 isolates on an Illumina HiSeq2000, as previously described (Reuter et al., 2016).

Tagged genomic library preparation and DNA sequencing

Request a detailed protocol

Illumina reads were mapped onto the TW20 (accession number FN433596) reference sequence using SMALT (http://www.sanger.ac.uk/resources/software/smalt/) as previously described (Tong et al., 2015). A minimum of 30x depth of coverage for more than 92% of the reference genomes was achieved for both references (see Supplemental Table 1 of Tong et al). The default mapping parameters recommended for reads were employed, but with the minimum score required for mapping increased to 30 to make the mapping more conservative. Candidate SNPs were identified using samtools mpileup (Li et al., 2009) with SNPs filtered to remove those at sites with a mapping depth less than five reads and a SNP score below 60. SNPs at sites with heterogeneous mappings were be filtered out if the SNP is present in less than 75% of reads at that site (Harris et al., 2010). Identification of the core genomes was performed as previously described (Holden et al., 2013; Harris et al., 2010). The coordinates of the accessory regions of the TW20 chromosome, which were removed from the alignment for all later analyses, are described in Supplementary file 2. Recombination was detected in the genomes using Gubbins (http://sanger-pathogens.github.io/gubbins/) using the default parameters (Croucher et al., 2015).The predicted recombination regions of the TW20 chromosome are described in Supplementary file 3. Regions identified as the location of recombination were also removed from the alignment for all later analyses. De novo assembly of genomes of all isolates was performed using Velvet v0.7.03 (Zerbino, 2010).

Phylogenetic reconstruction

Request a detailed protocol

The posterior set of phylogenies for the full alignment was generated using ExaBayes 1.5 (Aberer et al., 2014). The TW20 reference sequence was included as an outgroup. Due to computational memory limits, the alignment was divided into six sequential partitions of around 500,000 bp each, but in ExaBayes all parameters and the phylogenetic topology were linked across the six. Four MCMC runs, each in turn consisting of four coupled chains, were run for 5,000,000 generations with chain swaps every five generations and the heat factor set to 0.016. 50% of states in each run were discarded as burn-in, and the topologies forming output for all four runs were combined into a single tree set. The concatenated parameter trace for all four was examined to verify that the effective sample size (ESS) for the prior, likelihood and all numerical parameters was at least 200. The estimated ESS for the topology for the unified set of trees was calculated using RWTY 1.0.1 (Warren et al., 2017) and similarly verified to ensure a value of at least 200. The consensus tree was a 50% majority rule tree, with branch lengths, constructed using the sumt command in MrBayes 3.2.6 (Ronquist et al., 2012).

Input trees for the BaTS body site analysis were constructed by making separate alignments of all the sequences from each subject along with the TW20 outgroup. ExaBayes was run again on each with configurations varying depending on the number of sequences involved. Once again the output was checked to ensure all ESSs, including the estimated ESS for the topology, were at least 200, and consensus trees were built using sumt.

Phyloscanner

Request a detailed protocol

A random sample of 100 trees from the ExaBayes posterior were used as input for phyloscanner v.1.4.2 (Wymant et al., 2017). Identification of subjects who experienced clear multiple colonisation was performed in a pre-processing step using phyloscanner’s dual infection detection utility with a k value of 30,432.1. For a single tree, this will divide the tips from a single subject into separate colonisations if the patristic distance between the MRCA nodes of each colonisation is equal to or greater than 100 SNPs. For each subject, the most common division when this procedure was applied to all 100 trees was used to define the number of colonisations and the sequences making up each one; this is equivalent to dividing a host’s tips into two groups in cases where the median patristic distance between these MRCAs across the posterior was at least 100 SNPs.

Phyloscanner was then run in full on both the consensus phylogeny and the posterior set of trees. Four tips were identified by manual inspection as probable contaminants, due to suspicious genetic similarity to sequences from isolates from other subjects which were processed at the same time, and blacklisted. The tips coming from subject T126 were randomly reduced so that only ten per site per sample date were used, with the remainder also being blacklisted. The ancestral state reconstruction used phyloscanner's modified Sankoff algorithm, with the k parameter again set to 30,432.1. In the main analysis all tips from the same colonisation was given the same ‘host’ state in the reconstruction, but it was also repeated, further separating the set of tips by body site of origin. The collapsed tree from the results of phyloscanner applied to the consensus phylogeny was used to identify transmission events and plot these on the timeline of patient ICU stays and sampling events (Figure 4).

In the secondary analysis to investigate the effects of reducing the variation in tip counts between subjects on the results, phyloscanner’s downsampling tool was used to randomly reduce the tip counts for each subject to a maximum of five, and the analysis run again.

The cluster diagrams were created by defining two colonisations to be related in a single tree if they had at least one pair of adjacent subgraphs and all subgraphs from the pair were arranged in the ‘ancestry’, ‘multiple ancestry’ or ‘complex’ configurations (see Wymant et al., 2017). Departing slightly from default phyloscanner settings, which are designed primarily for HIV, we did not define relationships if the configuration was ‘no ancestry’ (this occurs where the reads from the two colonisations form sibling clades) and also did not use a patristic distance threshold. Edges were drawn between colonisations if they were related in at least 50% of posterior trees, and these edges used to identify the clusters.

When defining gold standard transmission pairs and non-transmission pairs for the SNP threshold analysis, both versions used posterior support of less than 0.05 for adjacency and a topological relationship of ‘ancestry’, ‘multiple ancestry’ or ‘complex’ to define the latter. The first version used posterior support of 0.95 of the same relationships to define pairs, whereas the second used support of 0.95 for adjacency and either ‘multiple ancestry’ or ‘complex’.

Phylogeny/body site association

Request a detailed protocol

The separate BaTS analysis (Parker et al., 2008) was performed on the posterior tree set constructed from the sequences from each patient whose isolates were obtained from more than one body site. In each such set, the TW20 outgroup was pruned from every tree before BaTS was run. 1000 replicates of state randomisation were used to estimate the null distribution of the AI, PS and MC statistics.

Visualisation

Request a detailed protocol

Phylogeny diagrams were created using ggtree 1.8.2 (Yu et al., 2017).

References

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
    Staff carriage of epidemic Methicillin-Resistant Staphylococcus aureus
    1. B Cookson
    2. B Peters
    3. M Webster
    4. I Phillips
    5. M Rahman
    6. W Noble
    (1989)
    Journal of Clinical Microbiology 27:1471–1476.
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
  11. 11
  12. 12
  13. 13
  14. 14
  15. 15
  16. 16
  17. 17
  18. 18
  19. 19
  20. 20
  21. 21
  22. 22
  23. 23
  24. 24
  25. 25
  26. 26
  27. 27
  28. 28
  29. 29
  30. 30
  31. 31
  32. 32
    PHYLOSCANNER: inferring transmission from within- and Between-Host pathogen genetic diversity
    1. C Wymant
    2. M Hall
    3. O Ratmann
    4. D Bonsall
    5. T Golubchik
    6. M de Cesare
    7. A Gall
    8. M Cornelissen
    9. C Fraser
    10. STOP-HCV Consortium, The Maela Pneumococcal Collaboration, and The BEEHIVE Collaboration
    (2017)
    Molecular Biology and Evolution, 10.1093/molbev/msx304, 29186559.
  33. 33
  34. 34
  35. 35
    Using the velvet de novo assembler for short-read sequencing technologies
    1. DR Zerbino
    (2010)
    Current Protocols in Bioinformatics, 11, 10.1002/0471250953.bi1105s31, 20836074.

Decision letter

  1. Mark Jit
    Reviewing Editor; London School of Hygiene & Tropical Medicine, and Public Health England, United Kingdom
  2. Neil M Ferguson
    Senior Editor; Imperial College London, United Kingdom
  3. Don Klinkenberg
    Reviewer
  4. Taj Azarian
    Reviewer; University of Central Florida, United States

In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.

Thank you for submitting your article "Improved characterisation of MRSA transmission using within-host bacterial sequence diversity" for consideration by eLife. Your article has been reviewed by two peer reviewers, and the evaluation has been overseen by a Reviewing Editor and Neil Ferguson as the Senior Editor. The following individuals involved in review of your submission have agreed to reveal their identity: Don Klinkenberg (Reviewer #1); Taj Azarian (Reviewer #2).

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

The reviewers agreed that this study is a step forward compared to earlier such analyses in that it makes use of multiple sequences from multiple samples per patient, from various body sites, and that it addresses questions that can only be addressed because of this intensive sampling scheme: direction of transmission based on the phylogeny only, width of the bottleneck at transmission, and transmission between body sites within patients. In addition, based on the identified transmission pairs and clusters, the paper investigates the use of a SNP threshold to detect pairs and clusters in settings with fewer sequences available.

As a whole, it is a valuable contribution to the field of outbreak analyses with pathogen sequence data. The major strengths of the analysis are:

1) The application of a novel analytical approach to an unmatched dataset (995 isolates from 55 patients), which allowed them to move beyond "proof of concept" pilot analyses or simulations.

2) In doing so, the paper is able to address (with supporting data/results) three major unknowns about S. aureus genomic epidemiology: (i) the range of transmission bottleneck sizes (at least as it relates to the healthcare setting), (ii) the sensitivity and specificity of using a SNP cut-offs to determine transmission events and (iii) the compartmentalization/ migration of S. aureus to various body sites during carriage.

3) The utility of the approach (and accompanying software) to investigating transmission. This level of dense within-host sampling has never been attempted. Considering this and the results, this study is a significant advance from the original analysis by Tong et al., 2015 as well as the PHYLOSCANNER manuscript by Wymant et al., 2017.

4) The characterisation of aspects of MRSA hospital dynamics that were previously unknown or only suspected based on other types of evidence, and as a showcase of methods that can be used with such data.

5) The manuscript is clearly written, and we very much appreciate the efforts made to conceptually explain some of the methods, notably Figure 3 and Supplementary file 1.

Essential revisions:

1) "median patristic distance" (first, please repeat the use of median in the Materials and methods section). How sensitive is the result to this choice? For instance, can't a simple hierarchical clustering algorithm be used? Also, if additional samples are taken from a patient, the distance of these samples to the sequence(s) at infection will increase; consequently, the median distance to another colonisation should increase with larger clusters. Wouldn't it therefore be better to use the minimum distance?

2) Subsection “Trace colonisations”: Were the 9 single-swab colonisations with only one examination similar to the 24, e.g. in Figure 2? These could have been true colonisations, after all.

3) Figure 2 displays the median number of SNPs, but as explained above in comment 1, just increasing the number of samples is expected to increase this median. That would definitely cause bias in the comparison in Figure 2 and the Mann-Whitney U test.

4) Figure 2 is visually a bit misleading because of the 6 samples that are not in the Figure, four of which should be blue. Maybe add them at the right, below a ">100" mark on the X-axis?

5) The arguments for the “six trace colonisations” is strong, but there are 27 more. Do the data for the other 27 not suggest the same, or is it impossible to say for the other 27?

6) In the end, it seems reasonable to focus on the non-trace colonisations for the cluster and bottleneck analyses. However, for the SNP threshold analysis it should be realised that more than half of the cases were left out, so that identified transmission pairs could very well have been separated by intermediate cases. That affects the "gold standard" transmission pairs set.

7) Subsection “Reconstructed transmission events”, last paragraph: The nature of multiple arrows between cases should be explained. From the caption it appears that each arrow is associated with a unique subclade, but it is possible to conclude that multiple arrows between hosts indicate multiple lineages being transmitted, and thus a wide bottleneck?

8) Figure 4 legend: One aspect of the analysis that was not clear was if or how the time of transmission was estimated. In at least one instance, it appears that multiple transmission events occurred between two patients. From the text it states that "the timings represent […] the earliest time of sampling of all the sequences involved in the recipient subgraphs." Is this an assumption made for simplicity or is this method able to determine more accurately the timing of the event? This would be good to clarify because it has implications for the generalizability of the method to settings where the exposure period (in this case from admission to discharge) is not as well defined.

9) Subsection “Epidemiologically plausible transmissions have high posterior support”: It's difficult to understand why downsampling results in more impossible events. Wouldn't fewer samples per patient results in fewer "ancestry" configurations and more "no ancestry" configurations, so more external sources rather than (incorrect) internal sources? Is it possible to explain this phenomenon in one or two sentences?

10) Subsection “Transmission clusters”: A transmission cluster would be a set of cases that is linked by direct transmission, and separated from other clusters by unobserved transmission links. You define transmission clusters with phyloscanner based on "ancestry", "multiple ancestry" and "complex" configurations of the phylogenetic tree, and exclude the "no ancestry" configuration. From this it appears that you would exclude the relation between A and D in Figure 3 but it's unclear why: Figure 3 shows the singleton D is linked to A and not to B and C, which is a transmission link, even more so because transmission links are already defined with the possibility of intermediary patients. The text states "colonisations not closely related", which can't be concluded about A and D in Figure 3. Isn't it possible to connect the clusters of Figure 6 based on being siblings in the phylogenetic tree? Or is the variation between the 100 sampled trees too large to make this useful?

The problem is not so much in this part of the paper, which just describes the results following from the definitions. The problem is more in the SNP threshold part, where these defined clusters are used as gold standards for defining clusters with SNP thresholds. For that, it should be made clear why these are indeed separate clusters, and why they are set apart from one another.

11) Absence/presence of arrows in Figures 6 and 7: when comparing Figures 6 and 7, absence and presence of arrows is often not the same. For instance, in Figure 6 there are no arrows between C183-C249 and between C194-C249, but in Figure 7 they both point towards C249. In general, Figure 7 seems to have many more directions resolved than Figure 6. Suggestions: first, explain one example to provide intuition for the difference; second, make directions bidirectional (or draw two arrows) if there is evidence for transmission either way (back and forth).

12) Subsection "Between-subject transmission bottlenecks are of very variable size":

The transmission bottleneck size results are extremely valuable as these results are of great interest to multiple fields. Though, it is hard to judge how generalizable the bottleneck size findings are to other settings. While the authors are cautious in stating this, it would be beneficial to put their findings in the context of previous estimates. Further, they currently present the bottleneck size as the estimated number of transferred lineages, which will depend on the number of sequenced pathogens. Can this also be presented in terms of nucleotide diversity of the transmitted population?

13) "narrow bottleneck", but still two lineages at least, so why is that not wide?

14) Subsection “Between-subject transmission bottlenecks are of very variable size”: in discussing T183 and T194, the link with T178 is included, but considering Figures 6 and 7, C249 appears even more interesting, as that forms a triangle with both C183 and C194 (but with arrows in Figure 7)

15) Subsection “The majority of subjects showed no evidence of phylogeny/body site correlation”, last paragraph: subject T232 is mentioned with evidence for separate colonisation events, but in Figure 7 there is an arrow with equal support for the direction as for the link itself. That is not intuitive.

16) Subsection “The majority of subjects showed no evidence of phylogeny/body site correlation”, last paragraph: Figure 7: C178 would be expected to have multiple colonisation events, because the nose and throat samples are at completely different parts of the graph. That one missing here (and from Table 2) is also not intuitive.

17) Subsection “Identification of transmission pairs from single genomes”: The problem addressed is very relevant, but the gold standard definitions are not convincing.

First, transmission pairs (true positives) are defined by a posterior support of 0.95, but what about the set of no-transmission pairs? Is that the complete matrix of all other possible pairs in the dataset? What about the lower-support pairs, and the siblings (A and D in Figure 3)? If included into the set of no-transmission pairs, this set will contain transmission pairs as well.

Second, transmission pairs from the phyloscanner analysis can be through an intermediary hosts, especially because more than half of the positives (the trace colonisations) are left out. So, a transmission pair may not be a true transmission pair

Third, the definition of a cluster, only consisting of proven ancestral relations. As argued above, siblings in the phylogenetic tree may also be related through direct transmission, only without evidence of the direction. In the current analysis, these are treated as separate clusters.

18) Regarding the single genome sensitivity and specificity – The authors select one pair of genomes at random from all the possible pairs. It seems more practical to select the first isolate identified during sampling as this would be consistent with practice – most facilities conducting either active or passive surveillance usually don't re-swab after a positive surveillance swab is identified.

19) Discussion, eighth paragraph: As you state, it is currently not feasible to sequence 223 isolates from a single individual. In their analysis, they pool the samples from multiple time points and also down-sample to five isolates. It would be of great interest for the field to know what number of isolates is required to achieve this level of transmission/cluster prediction using this approach. What we imagine is presenting the sensitivity/specificity with progressive down-sampling from the entire sample and/or from different epochs. Are five random isolates from a single collection time point sufficient to replicate the accuracy of their analysis? Are multiple isolates longitudinally collected more important?

https://doi.org/10.7554/eLife.46402.sa1

Author response

Essential revisions:

1) "median patristic distance" (first, please repeat the use of median in the Materials and methods section).

We have added text to the Materials and methods to clarify this (subsection “Phyloscanner”).

How sensitive is the result to this choice?

This threshold was selected as a number that split up what seemed obvious instances of multiple colonisation by eye, and is not at all sensitive to variations within a reasonable range.

For instance, can't a simple hierarchical clustering algorithm be used?

Absolutely – this tool was used just because it is already included in phyloscanner and would give similar results to many other simple approaches.

Also, if additional samples are taken from a patient, the distance of these samples to the sequence(s) at infection will increase; consequently, the median distance to another colonisation should increase with larger clusters. Wouldn't it therefore be better to use the minimum distance?

We apologise; the text was somewhat misleading. The median distance is, as per phyloscanner convention, between the MRCAs of each clade, not between tips. It thus should be at most the minimum distance between tips. (The median is across the ExaBayes posterior, not the tip set.) This has been clarified (subsection “Identification of potential multiple colonisation events”).

2) Subsection “Trace colonisations”: Were the 9 single-swab colonisations with only one examination similar to the 24, e.g. in Figure 2? These could have been true colonisations, after all.

The number is actually 8 rather than 9, and we apologise for the typo. There is scant evidence of a difference in distributions between those 8 and the rest according to the metric in Figure 2 (p=0.164) or indeed between the 13 trace colonisations with a subsequent negative swab and those without (p=0.327). We acknowledge that some trace colonisations may indeed be genuine and the choice to exclude them all is a cautious one (subsection “Trace colonisations”).

3) Figure 2 displays the median number of SNPs, but as explained above in comment 1, just increasing the number of samples is expected to increase this median. That would definitely cause bias in the comparison in Figure 2 and the Mann-Whitney U test.

The reason for the use of the median here is that the null hypothesis is that an isolate from a trace colonisation is no different from a typical isolate from a non-trace colonisation. It would represent a single draw from the same distribution that produces the many samples from the latter. Hence we take the isolates from non-trace colonisations with the median genetic distance as representing “typical” examples. We have clarified the text (subsection “Trace colonisations”).

We do, however, acknowledge that there is a potential temporal source of bias with this approach. Isolates from trace colonisations cannot be acquired at anything but the first possible opportunity, whereas some from non-trace colonisations may have undergone further days or even weeks of mutation. If instead we take the median distance to the most similar earlier sample from amongst the isolates acquired on the first sample date, we see a similar tendency for shorter distances amongst trace colonisations, but it is not significant at the p=0.05 level (p=0.119). This can be seen in the new Figure 2—figure supplement 1.

4) Figure 2 is visually a bit misleading because of the 6 samples that are not in the Figure, four of which should be blue. Maybe add them at the right, below a ">100" mark on the X-axis?

This figure has been revised to switch to a log scale at distances beyond 50bp.

5) The arguments for the “six trace colonisations” is strong, but there are 27 more. Do the data for the other 27 not suggest the same, or is it impossible to say for the other 27?

This is not suggested by the data in the other cases. However, while such a pattern implies transient colonisation, not all transient colonisations would be expected to show it; such an event need not happen in the early part of a patient’s ICU stay. The only information we really have that strongly argues against transient colonisation (or indeed contamination) is repeated isolation of the same strain. We choose to emphasise examples where we have this, out of an abundance of caution. We have revised the text somewhat to clarify this point (subsection “Trace colonisations”).

6) In the end, it seems reasonable to focus on the non-trace colonisations for the cluster and bottleneck analyses. However, for the SNP threshold analysis it should be realised that more than half of the cases were left out, so that identified transmission pairs could very well have been separated by intermediate cases. That affects the "gold standard" transmission pairs set.

The trace colonisations are now present in one of the two new versions of the SNP analysis; they had to be excluded from the other for reasons that are outlined below.

7) Subsection “Reconstructed transmission events”, last paragraph: The nature of multiple arrows between cases should be explained. From the caption it appears that each arrow is associated with a unique subclade, but it is possible to conclude that multiple arrows between hosts indicate multiple lineages being transmitted, and thus a wide bottleneck?

This configuration does suggest transmission through a wide bottleneck, which we define as also encompassing a scenario involving multiple separate transmission events between the pairs (subsection “Between-subject transmission bottlenecks are of very variable size”). Text has been added to clarify this (subsection “Reconstructed transmission events.”).

8) Figure 4 legend: One aspect of the analysis that was not clear was if or how the time of transmission was estimated. In at least one instance, it appears that multiple transmission events occurred between two patients. From the text it states that "the timings represent […] the earliest time of sampling of all the sequences involved in the recipient subgraphs." Is this an assumption made for simplicity or is this method able to determine more accurately the timing of the event? This would be good to clarify because it has implications for the generalizability of the method to settings where the exposure period (in this case from admission to discharge) is not as well defined.

It was indeed merely for simplicity as we do not have a means to estimate such timings from the genetic information. This has been clarified (Figure 4 legend).

9) Subsection “Epidemiologically plausible transmissions have high posterior support”: It's difficult to understand why downsampling results in more impossible events. Wouldn't fewer samples per patient results in fewer "ancestry" configurations and more "no ancestry" configurations, so more external sources rather than (incorrect) internal sources? Is it possible to explain this phenomenon in one or two sentences?

The reason for this is that there is a tendency to infer the more frequently-sampled individual as the source in a likely transmission pair. Equalising counts reduces the topological signal in one direction and makes reconstructions of the opposite direction more frequent in the posterior set. This might seem at first glance to be simply a bias but, as we see here, matters are actually more complex. In our case removing this bias by downsampling results in a reconstruction which is plainly less accurate. High sample counts are not irrelevant information as they indicate something (a long ICU stays here) that in of itself makes an individual a more probable source. A high sample count lies at least partially on the causal pathway from a factor making source status more likely to detection as a source, and a variable lying on the causal pathway is not considered confounding. There is, nonetheless, undoubtedly scope for further investigation of this issue. An explanation of this has been added to the Discussion (fourth paragraph).

(This is a phenomenon which we have also seen in HIV, where viral load is associated with both high sample counts and more readily transmitting the virus.)

10) Subsection “Transmission clusters”: A transmission cluster would be a set of cases that is linked by direct transmission, and separated from other clusters by unobserved transmission links. You define transmission clusters with phyloscanner based on "ancestry", "multiple ancestry" and "complex" configurations of the phylogenetic tree, and exclude the "no ancestry" configuration. From this it appears that you would exclude the relation between A and D in Figure 3 but it's unclear why: Figure 3 shows the singleton D is linked to A and not to B and C, which is a transmission link, even more so because transmission links are already defined with the possibility of intermediary patients. The text states "colonisations not closely related", which can't be concluded about A and D in Figure 3. Isn't it possible to connect the clusters of Figure 6 based on being siblings in the phylogenetic tree? Or is the variation between the 100 sampled trees too large to make this useful?

The problem is not so much in this part of the paper, which just describes the results following from the definitions. The problem is more in the SNP threshold part, where these defined clusters are used as gold standards for defining clusters with SNP thresholds. For that, it should be made clear why these are indeed separate clusters, and why they are set apart from one another.

See below for a full description of the changes to this analysis.

11) Absence/presence of arrows in Figures 6 and 7: when comparing Figures 6 and 7, absence and presence of arrows is often not the same. For instance, in Figure 6 there are no arrows between C183-C249 and between C194-C249, but in Figure 7 they both point towards C249. In general, Figure 7 seems to have many more directions resolved than Figure 6. Suggestions: first, explain one example to provide intuition for the difference; second, make directions bidirectional (or draw two arrows) if there is evidence for transmission either way (back and forth).

The updated versions of the network figures now have bidirectional arrows, and text has been added to explain the case of C249, which is the most striking example of the improved resolution in Figure 7 (subsection “Transmission clusters”).

12) Subsection "Between-subject transmission bottlenecks are of very variable size":

The transmission bottleneck size results are extremely valuable as these results are of great interest to multiple fields. Though, it is hard to judge how generalizable the bottleneck size findings are to other settings. While the authors are cautious in stating this, it would be beneficial to put their findings in the context of previous estimates.

To our knowledge, there are no previous estimates of transmission bottleneck size between human subjects in the literature.

Further, they currently present the bottleneck size as the estimated number of transferred lineages, which will depend on the number of sequenced pathogens. Can this also be presented in terms of nucleotide diversity of the transmitted population?

We have added these numbers to Table 1 and discuss them (subsection “Between-subject transmission bottlenecks are of very variable size”).

13) "narrow bottleneck", but still two lineages at least, so why is that not wide?

We have revised the text here (subsection “Between-subject transmission bottlenecks are of very variable size”) in order to talk about “narrow” and “wide” in more relative terms.

14) Subsection “Between-subject transmission bottlenecks are of very variable size”: in discussing T183 and T194, the link with T178 is included, but considering Figures 6 and 7, C249 appears even more interesting, as that forms a triangle with both C183 and C194 (but with arrows in Figure 7).

This was overlooked because of the lack of significant overlap with the other patients in the same bed. However, T183 did briefly occupy a bed that had previously been occupied by T249, which we now mention (subsection “Between-subject transmission bottlenecks are of very variable size”).

15) Subsection “The majority of subjects showed no evidence of phylogeny/body site correlation”, last paragraph: subject T232 is mentioned with evidence for separate colonisation events, but in Figure 7 there is an arrow with equal support for the direction as for the link itself. That is not intuitive.

This part of the manuscript was too cursory and relied entirely on examining the consensus trees. We have changed it to a description of the patterns over the full posterior (subsection “The majority of subjects showed no evidence of phylogeny/body site correlation”). T232 has a quite poorly resolved topology which can support multiple scenarios, and in the main analysis the direction depicted in the original version of Figure 7 was the most well supported by the rules used to construct the diagram.

16) Subsection “The majority of subjects showed no evidence of phylogeny/body site correlation”, last paragraph: Figure 7: C178 would be expected to have multiple colonisation events, because the nose and throat samples are at completely different parts of the graph. That one missing here (and from Table 2) is also not intuitive.

We neglected to mention that T178, while sampled from two body sites, provided only a single sequence from each, which makes investigation using BATS impossible. The same is, in fact, true of T322. We have added text to this effect (subsection “The majority of subjects showed no evidence of phylogeny/body site correlation”).

17) Subsection “Identification of transmission pairs from single genomes”: The problem addressed is very relevant, but the gold standard definitions are not convincing.

First, transmission pairs (true positives) are defined by a posterior support of 0.95, but what about the set of no-transmission pairs? Is that the complete matrix of all other possible pairs in the dataset? What about the lower-support pairs, and the siblings (A and D in Figure 3)? If included into the set of no-transmission pairs, this set will contain transmission pairs as well.

Second, transmission pairs from the phyloscanner analysis can be through an intermediary hosts, especially because more than half of the positives (the trace colonisations) are left out. So, a transmission pair may not be a true transmission pair

Third, the definition of a cluster, only consisting of proven ancestral relations. As argued above, siblings in the phylogenetic tree may also be related through direct transmission, only without evidence of the direction. In the current analysis, these are treated as separate clusters.

18) Regarding the single genome sensitivity and specificity – The authors select one pair of genomes at random from all the possible pairs. It seems more practical to select the first isolate identified during sampling as this would be consistent with practice – most facilities conducting either active or passive surveillance usually don't re-swab after a positive surveillance swab is identified.

For the SNP analysis:

1) We have concluded that the definition of a transmission cluster that was used probably cannot be made entirely coherent, and have entirely removed that part of the paper.

2) Sequences for SNP analysis are now taken from just the first positive swab for each subject, rather than the pool of all sequences.

3) We now use a cut-off of 0.05 posterior support for any ancestry to define non-transmission pairs in the gold standard. Any pairs lying between 0.05 and 0.95 are not used.

4) As a result of modification 3, the PPV and NPV statistics become meaningless as these are setting-dependent and the study “population” is no longer even the complete set of pairs in our sample. We have removed these from the results. Nevertheless, the point that in most circumstances most pairs of subjects cannot be transmission pairs, and thus the PPV will be low for even medium-sized study populations unless the false positive rate is also low, is still valid, and we now make it in the Discussion by means of a mathematical sketch (Discussion, eighth paragraph).

5) When using phyloscanner as a gold standard for SNP distance thresholds, we could not use patristic distance thresholds as the reasoning becomes circular. In particular, the relationship between A and D in Figure 3 cannot be established as that of a transmission pair without reference to branch lengths, and branch lengths are themselves obviously very closely related to SNP distance. We used only the topological signal of transmission instead, and sibling clades are not a definitive signal of transmission.

6) The point about missing intermediaries is valid. To address this, we created a second gold standard set for which membership required posterior support of 0.95 for the “multiple ancestry” or “complex” topological relationships, which are equivalent to the “PP” configuration in Romero-Severson et al. That paper showed that PP tends to imply direct transmission. While the simulations in that paper were designed to be similar to HIV, the reason, that it is much more unlikely that two distinct lineages will be transmitted twice in succession, remains valid here (subsection “Identification of transmission pairs from single genomes”).

7) We agree that there is no particular reason to exclude trace colonisations from this analysis, but anything involving a trace colonisation cannot be used in the new gold standard of point 6 because it is impossible to identify multiple lineage transmission to a subject with a single tip in the phylogeny. As a result, we present both – the original version with trace colonisations included, and the new one with them excluded.

19) Discussion, eighth paragraph: As you state, it is currently not feasible to sequence 223 isolates from a single individual. In their analysis, they pool the samples from multiple time points and also down-sample to five isolates. It would be of great interest for the field to know what number of isolates is required to achieve this level of transmission/cluster prediction using this approach. What we imagine is presenting the sensitivity/specificity with progressive down-sampling from the entire sample and/or from different epochs. Are five random isolates from a single collection time point sufficient to replicate the accuracy of their analysis? Are multiple isolates longitudinally collected more important?

While this would be a very interesting addition, we feel that the paper is quite long as it is and would prefer to leave it for future work.

https://doi.org/10.7554/eLife.46402.sa2

Article and author information

Author details

  1. Matthew D Hall

    Big Data Institute, University of Oxford, Oxford, United Kingdom
    Contribution
    Conceptualization, Software, Investigation, Visualization, Methodology, Writing—original draft, Writing—review and editing
    For correspondence
    matthew.hall@bdi.ox.ac.uk
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-2671-3864
  2. Matthew TG Holden

    School of Medicine, University of St Andrews, St Andrews, United Kingdom
    Contribution
    Resources, Data curation, Writing—review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-4958-2166
  3. Pramot Srisomang

    Department of Pediatrics, Sunpasitthiprasong Hospital, Ubon Ratchathani, Thailand
    Contribution
    Investigation, Project administration, Writing—review and editing
    Competing interests
    No competing interests declared
  4. Weera Mahavanakul

    Department of Medicine, Sunpasitthiprasong Hospital, Ubon Ratchathani, Thailand
    Contribution
    Investigation, Project administration
    Competing interests
    No competing interests declared
  5. Vanaporn Wuthiekanun

    Mahidol Oxford Tropical Medicine Research Unit, Mahidol University, Salaya, Thailand
    Contribution
    Resources, Writing—review and editing
    Competing interests
    No competing interests declared
  6. Direk Limmathurotsakul

    Mahidol Oxford Tropical Medicine Research Unit, Mahidol University, Salaya, Thailand
    Contribution
    Resources, Data curation, Supervision, Investigation, Methodology, Project administration, Writing—review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-7240-5320
  7. Kay Fountain

    Department of Biology and Biochemistry, University of Bath, Bath, United Kingdom
    Contribution
    Writing—review and editing, Comments and consultation when this work was first presented at a meeting resulted in very significant modifications to the interpretation of the analysis
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-9984-5702
  8. Julian Parkhill

    Department of Veterinary Medicine, University of Cambridge, Cambridge, United Kingdom
    Contribution
    Conceptualization, Resources, Supervision, Funding acquisition, Project administration
    Competing interests
    No competing interests declared
  9. Emma K Nickerson

    Department of Infectious Diseases, Cambridge University Hospitals NHS Foundation, Cambridge, United Kingdom
    Contribution
    Data curation, Writing—review and editing
    Competing interests
    No competing interests declared
  10. Sharon J Peacock

    Department of Medicine, University of Cambridge, Cambridge, United Kingdom
    Contribution
    Conceptualization, Resources, Supervision, Funding acquisition, Project administration, Writing—review and editing
    Competing interests
    No competing interests declared
  11. Christophe Fraser

    Big Data Institute, University of Oxford, Oxford, United Kingdom
    Contribution
    Conceptualization, Resources, Formal analysis, Supervision, Funding acquisition, Methodology, Writing—original draft, Project administration, Writing—review and editing
    Competing interests
    No competing interests declared

Funding

Wellcome (098051)

  • Matthew TG Holden
  • Sharon J Peacock

Chief Scientist Office (SIRN10)

  • Matthew TG Holden

Wellcome (106698/Z/14/Z)

  • Vanaporn Wuthiekanun

Medical Research Council (G1000803)

  • Sharon J Peacock

European Research Council (PBDR-339251)

  • Matthew D Hall
  • Christophe Fraser

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We thank Thibaut Jombart and Anne Cori for assistance in designing the sampling frame for the study, and Alice Ledda and Xavier Didelot for preliminary analyses.

Ethics

Human subjects: Ethical approval was obtained from the Ethical and Scientific Review subcommittee of the Royal Thai Government Ministry of Public Health (85/2550), and the Oxford Tropical Research Ethics Committee (024 07). All patients admitted to the two ICUs were eligible for inclusion and were enrolled after written informed consent, and consent to publish, was obtained.

Senior Editor

  1. Neil M Ferguson, Imperial College London, United Kingdom

Reviewing Editor

  1. Mark Jit, London School of Hygiene & Tropical Medicine, and Public Health England, United Kingdom

Reviewers

  1. Don Klinkenberg
  2. Taj Azarian, University of Central Florida, United States

Publication history

  1. Received: February 26, 2019
  2. Accepted: October 1, 2019
  3. Accepted Manuscript published: October 8, 2019 (version 1)
  4. Accepted Manuscript updated: October 11, 2019 (version 2)
  5. Version of Record published: January 10, 2020 (version 3)
  6. Version of Record updated: January 14, 2020 (version 4)

Copyright

© 2019, Hall et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 1,119
    Page views
  • 221
    Downloads
  • 0
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)

Further reading

    1. Epidemiology and Global Health
    Prabhat Jha
    Review Article
    1. Epidemiology and Global Health
    2. Microbiology and Infectious Disease
    Polycarp Mogeni et al.
    Research Advance