Dynamics of immune memory and learning in bacterial communities

  1. Madeleine Bonsma-Fisher
  2. Sidhartha Goyal  Is a corresponding author
  1. Department of Physics, University of Toronto, Canada
  2. Institute of Biomaterials & Biomedical Engineering, University of Toronto, Canada

Abstract

From bacteria to humans, adaptive immune systems provide learned memories of past infections. Despite their vast biological differences, adaptive immunity shares features from microbes to vertebrates such as emergent immune diversity, long-term coexistence of hosts and pathogens, and fitness pressures from evolving pathogens and adapting hosts, yet there is no conceptual model that addresses all of these together. To this end, we propose and solve a simple phenomenological model of CRISPR-based adaptive immunity in microbes. We show that in coexisting phage and bacteria populations, immune diversity in both populations is coupled and emerges spontaneously, that bacteria track phage evolution with a context-dependent lag, and that high levels of diversity are paradoxically linked to low overall CRISPR immunity. We define average immunity, an important summary parameter predicted by our model, and use it to perform synthetic time-shift analyses on available experimental data to reveal different modalities of coevolution. Finally, immune cross-reactivity in our model leads to qualitatively different states of evolutionary dynamics, including an influenza-like traveling wave regime that resembles a similar state in models of vertebrate adaptive immunity. Our results show that CRISPR immunity provides a tractable model, both theoretically and experimentally, to understand general features of adaptive immunity.

Editor's evaluation

In this important work, the authors develop a theory for the coevolutionary dynamics of bacteria and phages, where the major evolutionary pressure comes from CRISPR-Cas adaptive immunity in bacteria. Through extensive stochastic numerical simulations and analytical calculations, the article presents a compelling analysis of the emergent properties of immune interactions, in the regime of a single proto-spacer and a single spacer. Some of the trends highlighted by the model are recovered from experimental data. The main results concern how diversity in both phage and bacteria population is linked and is shaped by immunity, and should be of broad interest in immunology.

https://doi.org/10.7554/eLife.81692.sa0

Introduction

Adaptive immunity equips organisms to survive changing pathogen attacks across their lifetime. Many diverse organisms from bacteria to humans possess adaptive immune systems, and their presence shapes the survival, diversity, and evolution of both hosts and pathogens. How adaptive immunity changes the landscape of host-pathogen coexistence, how immune diversity emerges and evolves, and how the pressures of evolving pathogens and adaptive immunity are coupled to produce unique evolutionary outcomes: all of these factors are of fundamental importance to understanding the role of adaptive immunity in populations.

These questions have naturally been explored in the vertebrate adaptive immune system, which protects humans and other vertebrates from evolving pathogens. In these organisms, a diverse repertoire of T cell and B cell receptors can rapidly recognize and respond to a wide range of threats. Immune specificity is determined by the unique genetic sequence of each cell’s receptor, and individuals may harbour millions to billions of unique sequences distributed across four or more orders of magnitude of abundance (Desponds et al., 2016; Mora and Walczak, 2019; de Greef et al., 2020). Quantitative frameworks to model immune diversity and clone abundance have revealed that simple low-level interactions can give rise to complex outcomes including broad distributions of clone abundance (Desponds et al., 2016; Mora and Walczak, 2019; de Greef et al., 2020; Mayer et al., 2015; Gaimann et al., 2020; Dessalles et al., 2022), long-lived biologically realistic transient states (Yan et al., 2019; Gaimann et al., 2020), and clonal restructuring following immune challenges (Childs et al., 2015; Puelma Touzel et al., 2020; Sachdeva et al., 2020; Molari et al., 2020; Gaimann et al., 2020). Phenomenological models of pathogen coevolution with the immune system have accelerated our understanding of how the fitness landscape generated by the immune system constrains pathogen evolution (Luksza and Lässig, 2014; Marchi et al., 2019; Yan et al., 2019; Schnaack and Nourmohammad, 2021; Chardès et al., 2022), how the adaptive immune system responds to rapid pathogen evolution (Wang et al., 2015; Nourmohammad et al., 2019; Schnaack and Nourmohammad, 2021; Chardès et al., 2022), and what drives pathogen extinction (Yan et al., 2019; Marchi et al., 2019 or the extinction of particular clonal cell lineages Nourmohammad et al., 2019; Sachdeva et al., 2020). These models have also explored trade-offs such as between immune receptor specificity and cross-reactivity (Mayer et al., 2015; Nourmohammad et al., 2016), between the specificity of host-pathogen discrimination and sensitivity to pathogens (Childs et al., 2015; Downie et al., 2021; Metcalf et al., 2017), between the speed of an immune response and the efficiency of that response (Schnaack and Nourmohammad, 2021), or between metabolic resource use and immune coverage (Chardès et al., 2022). All of these models have shown rich dynamics and qualitatively different states of diversity and evolution arising from simple rules. However, experiments in vertebrates are difficult: vertebrate immunity depends on a complex interplay of many cell types and experiments are time-consuming because of long generation times (Altan-Bonnet et al., 2020).

Adaptive immunity in microbes is realized through the CRISPR system, conceptually related to the vertebrate adaptive immune system. The CRISPR system is functionally simple, yet it is incredibly powerful, as indicated by its widespread presence in many diverse bacteria and archaea Koonin and Makarova, 2019 and its experimentally demonstrated ability to provide strong immunity against phages (Paez-Espino et al., 2013; Paez-Espino et al., 2015; van Houte et al., 2016; Bondy-Denomy et al., 2013). Attacking phages expose their DNA to bacteria, and bacteria with a CRISPR immune system acquire small segments of phage DNA, called spacers. They store spacers in their genome and use them to recognize and destroy matching phage sequences in future infections: spacers are transcribed into RNA and guide DNA-cleaving CRISPR-associated proteins to recognize and cut re-infecting phages. Spacers provide a highly specific immune memory of infecting phages, preventing recognized phages from reproducing. In turn, phages can acquire mutations in the protospacer regions of their genome that are targeted by spacers. These features of the CRISPR immune system mean that (a) phage genetic evolution occurs by selection for escape mutants, and (b) the network of CRISPR immune interactions between bacteria and phages can be inferred by sequencing the genomes of co-living bacteria and phages. Spacer acquisition and phage mutation are rare random events, and many such events must be observed in order to understand their impact on populations. Bacteria and phages have short life cycles and can reach large population size, making it possible to build a statistical picture of the impacts of adaptive immunity.

The kinetics and interactions of phages and bacteria with CRISPR systems have been the subject of numerous experiments (van Houte et al., 2016; Common et al., 2019; Common et al., 2020; Chabas et al., 2021; Dimitriu et al., 2022; Guillemet et al., 2021). Some themes have emerged from experimental studies of CRISPR immunity: (a) high spacer diversity relative to phage diversity increases the likelihood of phage extinction (van Houte et al., 2016; Common et al., 2020; Guillemet et al., 2021), (b) bacteria become more immune to phages over time (Laanto et al., 2017; Morley et al., 2017; Common et al., 2019; Pyenson and Marraffini, 2020), and (c) phages readily gain mutations (Weinberger et al., 2012a; Paez-Espino et al., 2013; Levin et al., 2013; Pyenson et al., 2017; Watson et al., 2019; Pyenson and Marraffini, 2020; Guillemet et al., 2021; Guerrero et al., 2021a) and sometimes genome rearrangements (Paez-Espino et al., 2015) to escape CRISPR targeting. Explorations of CRISPR immunity in natural environments have also documented ongoing spacer acquisition and phage escape mutations (Weinberger et al., 2012a; Guerrero et al., 2021a). Likewise, previous theoretical work has addressed the impact of parameters such as spacer acquisition rate and phage mutation rate on spacer diversity (Childs et al., 2012; Han et al., 2013; Han and Deem, 2017) and population survival and extinction (Weinberger et al., 2012b), how costs of CRISPR immunity impact bacteria-phage coexistence (Skanata and Kussell, 2021) and the maintenance of CRISPR immunity (Levin, 2010; Weinberger et al., 2012b; Westra et al., 2015; Gurney et al., 2019), how spacer diversity impacts population outcomes (He and Deem, 2010; Weinberger et al., 2012a; Childs et al., 2012; Haerter and Sneppen, 2012; Han et al., 2013; Childs et al., 2014; Bradde et al., 2017; Han and Deem, 2017), and how stochasticity and initial conditions impact population survival (Bradde et al., 2019; Chabas et al., 2018). Notably, foundational work by Childs et al., 2014; Childs et al., 2012 and Weinberger et al., 2012a; Weinberger et al., 2012b found through simulations that spacer diversity readily emerges in a population of CRISPR-competent bacteria interacting with mutating phages.

However, the majority of both experiments and theory are based on observations and models of transient phenomena and short-term dynamics, while it is at long timescales that natural microbial communities experience bacteria-phage coexistence. Some notable experiments have measured long-term coexistence (Paez-Espino et al., 2015; Wei et al., 2011), and long-term sequential sequencing data from natural populations is becoming more available (Gómez and Buckling, 2011; Burstein et al., 2016), but appropriate theories to understand steady-state coexistence, sequence evolution and turnover, and immune memory in microbial populations remain rare. Because the processes of growth, death, and immune interaction are inherently random, understanding population establishment and extinction requires a fully stochastic analysis, and theoretical models that explore long-term coexistence have been partially deterministic to date (Weinberger et al., 2012a; Weinberger et al., 2012b; Childs et al., 2012; Levin et al., 2013; Childs et al., 2014; Santos et al., 2014; Weissman et al., 2018; Gurney et al., 2019). These models do not accurately capture rare stochastic events, in particular mutation, establishment, and extinction. Notable fully stochastic simulations of CRISPR immunity, on the other hand, have lacked rigorous analytic results (Han et al., 2013; Han and Deem, 2017).

To understand the emergent properties of immune memory and diversity in microbial populations and how phages and bacteria coexist long-term, we developed a simple theoretical model of bacteria and phages interacting with adaptive immunity. We model a population of bacteria with CRISPR immune systems interacting with phages that can mutate to escape CRISPR targeting, building on our previous work that assumed a clonal population of phages with multiple protospacers in each phage (Bonsma-Fisher et al., 2018). We model phages with single protospacers in this work to efficiently track mutations in large populations over long timescales. We stochastically simulate thousands of bacteria-phage populations across a range of population sizes, spacer acquisition rates, spacer effectiveness rates, and phage mutation rates, and derive analytic expressions for the probability of establishment for new phage mutants, the time to extinction for phage and bacterial clones, and the dependence of bacterial spacer diversity on spacer acquisition rate, effectiveness, and phage mutation rate. Our simulations are fully stochastic and run for many thousands of generations to accurately capture the dynamics of establishment and extinction, yet the underlying model is simple enough to solve analytically. We show that even with the simplest assumptions of uniform spacer acquisition and effectiveness, complex dynamics and a wide range of outcomes of diversity and population structure are possible. We recover and reinterpret experimentally observed feaures: (a) we find that high diversity is not beneficial for bacteria when phage and bacterial diversity is strongly coupled, (b) we show that bacterial immunity can either track new phage mutations rapidly or keep a memory for a long time, but not both, and (c) we find emergent diversity resulting from selection for phage mutations that evade CRISPR targeting, linking diversity to the dynamical quantities of establishment and extinction. We compute bacterial average immunity in our simulations and in available experimental data and show that our model predicts qualitative trends that are visible in data. Finally, we show that adding immune cross-reactivity leads to qualitatively different states of evolutionary dynamics: (a) a traveling wave regime that resembles a similar state in models of vertebrate adaptive immunity (Yan et al., 2019; Marchi et al., 2019; Marchi et al., 2021) emerges when high cross-reactivity creates a fitness gradient for phage evolution, and (2) a regime of low turnover protected from new establishment by the reduced fitness of new phage mutants.

Results

Bacteria and phages dynamically coexist and coevolve

We model bacteria and phage interacting and coevolving in a well-mixed system (Figure 1A and ‘Model’). Bacteria divide by consuming nutrients and phages reproduce by creating a burst of B new phages after successfully infecting a bacterium. Bacteria can contain a single CRISPR spacer that confers immunity against phages with a matching protospacer. Phages are labelled with a single protospacer type, a binary sequence of length L=30 that can mutate to a new type during a burst with probability μL, where μ is the per-base mutation rate per generation. All simulations begin with a single clonal phage population unless otherwise specified.

Figure 1 with 7 supplements see all
Model description.

(A) We model bacteria and phages interacting in a well-mixed vessel. We track nutrient concentration, phage population size (nV), and bacteria population size (nB). Bacteria can either have no spacer (nB0) or a spacer of type i (nBi, inBi=nBs), and phages can have a single protospacer of type j (nVj). With rate α, a phage interacts with a bacterium. If the bacterium does not have a matching spacer, the phage kills with probability pV and produces a burst of B new phages, while for bacteria with a matching spacer that probability is reduced to pVs=pV(1-e), 0e1. Bacteria without spacers that survive an attack have a chance to acquire a spacer with probability η, and bacteria with spacers lose them at rate r. Lower inset: average immunity is the weighted average pairwise immunity between spacer-containing bacteria and phages, given by 1-i,jnBinVjpV(i,j)pVi,jnBinVj. The probability of a phage with protospacer j successfully infecting a bacterium with spacer i is pV(i,j). (B) Three time points in a typical simulation with C0=104, e=0.95, η=10-4, and μ=10-5. Coloured circles represent unique protospacer or spacer sequences; shared sequences are shown with the same colour. The size of each circle is proportional to clone size, and new mutants are shown radially more distant from the centre. (C) Ten individual clone trajectories vs simulation time for phages (top) and bacteria (bottom). The mean clone size is shown with a horizontal dashed line. (D) Total phage, bacteria, and nutrient concentration as a function of phage success probability pV. Markers show an average over five independent simulations for different values of pV with C0=104,η=10-3,e=0.95, and μ=10-7. Solid lines show theoretical predictions for different constant values of effective e. As pV decreases, phages go extinct at a critical value given by A=1, where A=(BpV-1)(1-f)αfg. (E) Total phage and bacteria population size as a function of average bacterial immunity to phages. Colours indicate the fraction of simulations in which phage or bacteria go extinct before a set endpoint. Solid lines show the mean-field prediction. Error bars are the standard deviation across three or more independent simulations.

Coexistence occurs across a wide range of parameters but is not guaranteed: below a certain success probability pV0=1B(gf(1-f)α+1), phages are not able to reproduce often enough to overcome their base death rate due to outflow and adsorption and are driven extinct (grey area of Figure 1D; Bonsma-Fisher et al., 2018). In this expression, g is the bacterial growth rate, B is the phage burst size, α is the phage adsorption rate, f=F/(gC0) is a normalized outflow rate, and C0 is the inflow nutrient concentration. This is the same extinction threshold reported by Payne et al., 2018 as the cutoff for achieving herd immunity in a well-mixed bacterial population. To a first approximation, phages must successfully infect every 1/B bacteria they encounter, but if bacteria are growing quickly, then phages must do better to overcome bacterial growth, leading to the extra terms in this expression (see ‘Phage extinction threshold’). We write this extinction threshold as A=(BpV-1)(1-f)αfg. Above the phage extinction threshold (A>1), the phage population size increases with increasing pV but eventually decreases again as bacterial numbers are driven too low to support a large phage population (Bonsma-Fisher et al., 2018). A similar non-monotonicity as a function of the probability of naive bacterial resistance (1-pV) was described in theoretical work by Weinberger et al., 2012b. In our model the position of the peak in phage population size as a function of infection success probability is determined by e, the effectiveness of CRISPR spacers against phage; increasing e pushes the peak to higher pV (Figure 1D). While e is a constant parameter that determines the outcome of pairwise interactions between bacteria and phages, the bacterial population as a whole possesses an average immunity to phages that is a weighted average of all the possible pairwise interactions (Figure 1A inset). It is the overall average immunity that determines population outcomes, which we describe in detail in ‘Pathogen and host diversity must be considered together’.

To focus on regimes where bacteria and phages coexist, we select parameters within the deterministic coexistence regime to explore bacteria-phage coevolution. Even in this regime, stochastic extinction will eventually come for one or both populations in simulations (Figure 1E), though the timescale of extinction may be extremely long for large population sizes (Badali and Zilman, 2020). Phages are more susceptible to stochastic extinction than bacteria because of their large burst size B which increases their overall population fluctuations (Appendix 3). The length of coexistence before stochastic extinction depends on population size as well as simulation parameters and initial conditions (see ‘Stochastic population extinction’): phage populations can be rescued from extinction by high mutation rate (Figure 1—figure supplement 3) or high initial protospacer diversity (Figure 1—figure supplement 6), but are more likely to go extinct if spacer effectiveness is high (Figure 1—figure supplement 1). Conversely, bacteria are more likely to go extinct if spacer effectiveness or spacer acquisition rate are low (Figure 1—figure supplement 4). Population survival and persistence in natural populations is impacted by additional factors we do not address in our model, including immigration (Volkov et al., 2003; Chabas et al., 2016, niche partitioning Simek et al., 2010; Weitz et al., 2013; Mills et al., 2013; Badali and Zilman, 2020; Voigt et al., 2021), environmental fluctuations (Abreu et al., 2020; Voigt et al., 2021), and spatial structure (Haerter et al., 2011; Haerter and Sneppen, 2012; Heilmann et al., 2010; Heilmann et al., 2012; Simmons et al., 2018; Skanata and Kussell, 2021).

Across a wide range of coexistence parameters, our simulations show continual phage evolution and bacterial CRISPR adaptation in response (Figure 1B). New phage protospacer clones arise from a single founding clone by mutation, and a small fraction of new mutants grow to a large size and become established. Once phage clones become large, bacteria acquire matching spacers and an immune bacterial subpopulation becomes established. The specific protospacer and spacer types present in the population continually change as old types go extinct and new types are created by phage mutation, but the average total diversity and average overlap between bacteria and phage remains constant at steady state (Figure 8). Both bacteria and phage clones stochastically go extinct, completing the life cycle of a clonal population (Figure 1C).

Phages drive stable emergent sequence diversity

New phage protospacer clones continually arise and go extinct in our simulations, generating turnover in clone identity in the population. Despite constant turnover, however, the total number of clones remains fixed at steady state. We use the mean number of bacterial clones at steady state, designated m, as a measure of system diversity. This choice of diversity measurement is equivalent to the Hartley entropy of the clone size distribution, a special case of the Rényi entropy (Mora and Walczak, 2016; Altan-Bonnet et al., 2020). This definition weights all clones equally regardless of their abundance; such a measurement is not appropriate when clone size distributions are very broad and small clones may be unsampled, but is reasonable when clone size distributions are relatively narrow and all clones are sampled (Mora and Walczak, 2016). In our simulation results, both bacteria and phage populations exhibit relatively narrow clone size distributions across a range of parameters, with the exception of low values of spacer acquisition η (Figure 2A, Figure 2—figure supplement 3, Figure 2—figure supplement 4). Even at low η, however, clone size distributions are approximately exponential, indicating that they are not scale-invariant and that the mean clone size still captures important information about the full clone size distribution.

Figure 2 with 4 supplements see all
Diversity depends sub-linearly on parameters.

(A) Bacteria and phage clone size distributions normalized to the measured mean clone size for C0=105, μ=3×10-7, and e=0.95. As η increases, both clone size distributions become more sharply peaked. (B) The mean number of bacterial clones depends only on a combined parameter in the limit of small average immunity (generally coinciding with high C0). (Inset) The mean number of bacterial clones can be predicted by numerically solving Equation 1 for m. The two lowest values of η are shown with lighter shading. Error bars are the standard deviation across three or more independent simulations.

What determines clonal diversity? Many factors that correlate with transient diversity have been experimentally identified, such as phage extinction and slower phage evolution at high bacterial spacer diversity (van Houte et al., 2016; Common et al., 2020) and maintenance of a diverse bacterial population when exposed to diverse phages (Paez-Espino et al., 2015; Common et al., 2019; Guillemet et al., 2021; Lopatina et al., 2019), but a conceptual framework to understand emergent diversity has remained elusive. For instance, while initial high spacer diversity puts low-diversity phage populations under intense pressure to the point of driving them extinct (van Houte et al., 2016; Common et al., 2020), is the same true for emergent bacterial diversity after an extended period of coexistence? Is observed high bacterial spacer diversity indicative of successful bacterial escape from phage predation or an indicator of increased phage pressure? In our model, phage and bacterial diversity is tightly coupled: the number of large phage clones is approximately the same as the number of bacterial clones (Figure 22). This is also the case in experimental coevolution data from Paez-Espino et al., 2015: the number of phage protospacer types is on the same order of magnitude as the number of bacterial spacer types across most similarity thresholds (Figure 72). There is evidence that this coupling of diversity may also occur in the wild: a recent longitudinal study of Gordonia bacteria interacting with phage in a wastewater treatment plant identified 14 high-coverage phage genotypes and 11 high-coverage bacterial variants based on CRISPR spacer sequence (Guerrero et al., 2021a).

Using the tight correspondence between bacterial diversity and phage diversity in our model, we calculate the overall steady-state diversity by balancing the effective phage clone mutation rate μ¯=1gC0αBpV(1-e-μL)(1-eνm)nVnB, phage clone establishment probability Pest, and the time to extinction for large phage clones Text (details in ‘Measuring diversity):

(1) m=Pestμ¯Text

Equation 1 arises from the simple statement that the number of large clones must be equal to their establishment rate (Pestμ¯) multiplied by their average time to extinction (Text). This relationship successfully predicts the number of bacterial clones at steady state across a wide range of parameters and a wide range of diversity values (Figure 2B inset, Figure 2—figure supplement 1, and Figure 2—figure supplement 2). At the lowest value of η the prediction tends to overestimate the number of clones – in this regime, low acquisition means that phage clones go extinct because of clonal interference before bacteria are able to acquire spacers (Figure 16, ‘Measuring diversity’).

Through approximations (‘Analytic approximations for diversity’), we find that diversity depends on a single combined parameter to the power 1/3 (Figure 2B, Equation 2), and this parameter is proportional to spacer effectiveness e, the probability of bacterial survival followed by spacer acquisition (1-pV)η, and the phage mutation rate μL.

(2) m(4eη(1pV)μL(gC0(1f))3B2α2pV3r)13

Each of these parameters intuitively increases diversity (e.g., a higher phage mutation rate means that phage diversity increases and bacterial diversity follows suit). What is surprising is that their combined effect on diversity is to a power much less than 1: this 1/3 exponent means that if the mutation rate increased tenfold, the diversity would only increase by about a factor of two (Appendix 2—figure 1). In contrast, a simple neutral model of cell division with mutations gives a linearly proportional increase in diversity for the same increase in mutation rate (Appendix 2—figure 2).

To understand where the dependence of diversity on these key parameters come from, we look more closely at each component expression. The effective phage mutation rate μ¯ depends linearly on the parameter μ:μ¯gC0(1f)fμLαPV, while both the probability of establishment and the time to extinction depend inversely on diversity m:pest1mceη(1PV)gC0(1f)BpVr (Equation 5) and Text1m2gC0(1f)fαBpV (Equation 173, Appendix 3). By comparison with Equation 1, we find that m3 depends approximately linearly on mutation rate, resulting in the weak mμ13 dependence on mutation rate.

The dependence of diversity on both e and η comes from the probability of phage establishment since μ¯ depends only very weakly on these parameters through its dependence on total population sizes and Text depends explicitly on m alone, not e or η. The phage probability of establishment is proportional to eη(1-pV)m (Equation 5), and as before, this gives m3eη(1-pV). Bacteria are more successful at high η(1-pV)e, which increases the phage establishment probability. Previous theoretical work has predicted that diversity increases as spacer acquisition rate increases (Childs et al., 2012); here, we provide a quantitative prediction for this dependence. In the following sections, we explore phage establishment in more detail.

What determines the fitness and establishment of new mutants?

We find that diversity emerges in our model from the balance of phage clone establishment and extinction. However, only some phage mutants escape initial stochastic extinction and survive long enough to become established. What determines the fate of a new phage mutant? In our model, a single phage mutation in a protospacer can completely overcome CRISPR targeting, which means that new phage mutants can infect all bacteria equally well and their initial growth rate s0 is independent of CRISPR: s0αnB(BpV-1)-F, where F is the chemostat flow rate (a shared death rate for phages and bacteria). Surprisingly, however, even once bacteria start to acquire matching spacers, the probability of establishment for new phage mutants is still well-described by theory in which CRISPR targeting only influences total average population sizes (Figure 3C); that is, the specific interaction between a phage and its matching clone can be ignored. Intuitively, this is because phage clones must grow to a certain size before bacteria encounter them enough to begin to acquire spacers, and this size turns out to be large enough to avoid stochastic extinction (Figure 15). The probability of phage establishment is 2s0B(s0+δ0), where δ0=F+αnB(1-pV) is the initial phage mutant death rate. Importantly, these rates are independent of the population size of matching CRISPR bacteria clones; the only dependence on bacteria is on the total bacterial population size nB (Appendix 3—figure 1), which is fairly stable at steady state.

Figure 3 with 3 supplements see all
The fate of individual clones.

(A) Phage and bacteria coevolve in two timescale-separated regimes characterized by phage clone fitness. Average phage and bacteria clone size vs. time since phage mutation (right axis), and average clone growth rate vs. time since phage mutation (left axis). Markers show the average over all clone trajectories after steady state from six simulations with the same parameters. (B) Histograms of individual clone fitness grouped by time since phage mutation. Phage clones initially have fitness gt0, but rapidly most clones reach neutral growth (fitness 0). Bacteria clones also follow suit, initially having fitness gt0 and rapidly reaching 0 fitness on average. Because spacer acquisition for a clone only happens after that clone is created by phage mutation, the top-right panel of (B) is empty at the earliest time point following phage mutation. Individual clone trajectories are highly variable. (C) Probability of phage clone establishment vs. average immunity. Clones are considered established in simulations when they reach the mean clone size. Equation 3 with ν=1 is shown in green and with ν given by Equation 4 in blue. In (A, B), C0=104, e=0.95, η=10-3, and μ=3×10-6. Error bars are the standard deviation across three or more independent simulations.

Even though CRISPR targeting does not explicitly affect the establishment probability for new phage mutants, the probability of phage establishment increases as the average bacterial immunity increases (Figure 3C). Average immunity is a measure of the overall effectiveness of CRISPR immunity for the entire bacterial population; it is the average of all pairwise immunities between phage clones and bacterial clones weighted by their population sizes (Figure 1A inset). Because higher average immunity is beneficial for bacteria and leads to larger bacterial population sizes (Figure 3C), higher average immunity also means there is stronger selective advantage for new phage mutants that can escape CRISPR targeting (Figure 3C and Figure 5A).

For insight into the nature of this dependence, we approximated the probability of establishment at two extremes of average immunity (see ‘Dominant balance approximations for ν’). The probability of establishment can also be written

(3) Pest=2eνm(B1)

where ν is the fraction of bacteria that have spacers and m is the average number of bacterial clones at steady state. Low average immunity occurs when e/m is small: in this limit, the fraction of bacteria with spacers is

(4) ν11+rη(1pV)αn~Vem(pVη(1pV)ABpV(A1)(BpV1))

where A=(BpV-1)(1-f)αfg is the extinction threshold for phages (A>1 for phage survival), and n~V is the phage population size calculated with e=0 (Equation 52); this is the extreme limit of low average immunity where total population sizes approach their values in the absence of CRISPR immunity (Figure 26). The quantity rη(1-pV)αn~V can be understood as the ratio of the rate of spacer loss to the rate of spacer acquisition. If spacer loss is high and acquisition is low, the fraction of bacteria with spacers (ν) decreases. This in turn correlates with a lower probability of phage establishment (Equation 3) since phages are less likely to establish when there is less pressure from bacteria defending against them with CRISPR. Equation 4 is plotted in Figure 3C in blue. At very high average immunity, on the other hand, ν becomes close to 1 regardless of η (Figure 3—figure supplement 2), which is why the curves in Figure 3C overlap at high average immunity (ν=1 is plotted in green).

For more insight into the parameter dependence of Pest, we apply the same approximation as for m, expanding Pest in 1/B and η (‘Approximation for m’):

(5) Pest1m2eη(1pV)gC0(1f)BpVr[2(η(1pV)eα)2μLBr2]13

The probability of establishment increases with the probability of bacterial escape and spacer acquisition η(1-pV) and with spacer effectiveness e, but decreases with increasing mutation rate μL. This is consistent with the intuition that more successful bacteria increase the strength of selection for phage mutants (higher η and e), but that higher mutation rate reduces the probability of establishment for any particular mutant.

Cross-reactivity leads to dynamically unique evolutionary states

We next asked how cross-reactivity between spacer and protospacer types impacts population dynamics and outcomes. Adding cross-reactivity was motivated by several experimental observations in CRISPR immunity: (a) In type I and type II CRISPR systems, single mutations in the PAM or protospacer seed regions (approximately 8 nucleotides at the start of a protospacer) can facilitate phage escape, whereas mutations elsewhere in the protospacer are tolerated by the CRISPR system (Deveau et al., 2008; Pyenson and Marraffini, 2020). Even when a phage manages to escape direct targeting, in type I and II systems an imperfect spacer match can facilitate priming: when Cas machinery binds to a protospacer match, even if unable to cleave the target, the likelihood of acquiring a nearby spacer is increased (Westra et al., 2015; Rao et al., 2017; Pyenson and Marraffini, 2020; Weissman et al., 2020). (b) Type III CRISPR targeting in Staphylococcus epidermidis has been shown to be completely tolerant to all single and even double mutations, meaning that these systems are naturally cross-reactive (Pyenson et al., 2017).

We simulated two types of cross-reactivity: one in which phages experience an exponential decrease in CRISPR effectiveness with mutational distance (Equation 7), and a step-function cross-reactivity where phages require additional 1θ3 mutations to perfectly escape CRISPR targeting (Equation 8). Phage success probability without cross-reactivity is given by Equation 6, a special case of 7 and 8 with θ=0.

(6) pV(i,j)={pV(1e)if i=jpVif ij
(7) pV(i,j)=pV(1e exp[dijθ])
(8) pV(i,j)={pV(1e)if dijθpVif dij>θ

The mutational distance dij=|(Vi-Vj)| is the number of mutations between a protospacer and spacer, and θ scales the radius of cross-reactivity, with larger θ meaning more mutations are required to achieve the same immune escape (Yan et al., 2019).

Exponential cross-reactivity has been modelled extensively in vertebrate adaptive immunity (Chao et al., 2005; Mayer et al., 2015; Marchi et al., 2019; Yan et al., 2019; Schnaack and Nourmohammad, 2021; Chardès et al., 2022), and our definition of exponential cross-reactivity as a function of mutational distance is the same as in Yan et al., 2019. Step-function cross-reactivity, on the other hand, is reminiscent of the type III CRISPR system in which multiple point mutations are required to escape CRISPR targeting (Pyenson et al., 2017) and was also modelled theoretically by Han et al., 2013.

We find that cross-reactivity results in strikingly different dynamics of clone establishment and persistence (Figure 4 and ‘Cross-reactivity’), including a travelling wave regime in which genetically neighbouring clones ‘pull’ each other along (Figure 4B–D), giving way to a regime in which new mutants have very low establishment rates in the case of step-function cross-reactivity (Figure 4C). When cross-reactivity is high and phages require multiple mutations to escape targeting, we expect new phage mutants to have a lower fitness on average because they may already be within the immunity range of existing bacterial clones. However, unlike without cross-reactivity, not all new mutant fitnesses are the same because fitness now depends on the distribution of matching spacers in the bacterial population. Cross-reactivity adds a ‘direction’ in the genetic landscape and a fitness gradient for new phage mutants, leading to a series of rapid establishments and a travelling-wave regime. We believe these rapid establishments also suppress instantaneous diversity; diversity emerges longitudinally instead of concurrently in this regime (Figure 4 and Figure 61). This is true for both types of cross-reactivity we studied, but we also observed striking qualitative differences between exponential and step-function cross-reactivity. In the former, all phage mutants are guaranteed a slight fitness advantage, and the travelling wave appears right at the start of simulations, with occasional lineage-splitting events (Figure 4B). In contrast, step-function cross-reactivity means that mutants within the cross-reactivity radius have no fitness advantage at all. In this case, there is some variable initial length of time required to establish the travelling wave pattern: at the start of a simulation, all mutants are within the cross-reactivity radius and evolve purely neutrally. At least d=θ new mutants must establish through neutral dynamics before the next mutant can escape CRISPR targeting and grow with high fitness; this leads to the travelling-wave regime appearing later for θ=2 than for θ=1, and sometimes never appearing at all. Once the traveling wave appears, the dominant phage and bacteria clone types are exactly offset by θ: the most abundant phage clone will in general be θ mutations away from the most abundant concurrent bacteria clone. This is directly visible in the clone trajectories in Figure 4C–D inset and Figure 54: matching-colour bacteria and phage clones are offset in time. Another regime is also possible in the case of step-function cross-reactivity: if multiple large clones establish that are each outside of all the others’ cross-reactivity radii (Figure 63), then new mutants all have zero fitness and the travelling wave comes to an abrupt halt (around 7000 generations in Figure 4C). Establishment of new clones is once again extremely rare, and the existing large clones persist for a long time (Figures 53 and 62).

Figure 4 with 4 supplements see all
Cross-reactivity leads to ‘spindly’ phylogenies and regime switching.

Phage clone phylogenies for four simulations with different cross-reactivities: no cross-reactivity (A), exponential cross-reactivity with θ=4 (B), and step-function cross-reactivity with θ=1 (C) and θ=2 (D). All simulations share all other parameters: C0=104,η=10-4,μ=10-6,e=0.95. Phage clones are plotted at the first time they pass a population size of 2 to remove clutter from many new mutations destined for extinction, and the size of each circle is logarithmically proportional to the maximum size reached by that clone. Colours indicate the time of extinction of each clone. For each simulation with cross-reactivity, the left inset shows phage (top) and bacteria (bottom) clone sizes over time; colours indicate unique clone identities. Coloured rectangles above insets in (C) and (D) correspond to the dominant clone at each time. Dominant clone identities are offset by θ (vertical dashed line for visual aid).

Changes to the fitness landscape of individual clones caused by cross-reactivity also influence population-level outcomes such as diversity. In general, we find that for high cross-reactivity, average diversity is lower than predicted for simulations without cross-reactivity because the probability of phage establishment per mutant decreases (Figure 5A). A decrease in phage mutant fitness as the strength of cross-reactivity increases is in line with the dependence of fitness on cross-reactivity reported by Yan et al., 2019 and Rouzine and Rozhnova, 2018. Even with more complicated underlying dynamics, though, measuring average immunity alone is enough to reproduce total population sizes using our simple deterministic equations developed in the limit of no cross-reactivity (Figure 5C, 'Total population size'). This is because average immunity completely captures the population-level impact of CRISPR immunity, and we can replace the CRISPR effectiveness parameter e with measured average immunity in Equations 13–17 to get very good agreement between measured and predicted population sizes even away from steady state (Figure 5—figure supplement 1). In contrast, inferring average immunity from the number of clones using the approximation that effective ee/m does not give good agreement with simulation results for simulations with cross-reactivity (Figure 5B).

Figure 5 with 1 supplement see all
Average immunity underlies population outcomes.

(A) Probability of phage clone establishment vs. average immunity for different amounts and types of cross-reactivity. No cross-reactivity (θ=0) is shown as black stars, exponential cross-reactivity in red, and step-function cross-reactivity in blue. Simulation averages are shown for η=10-4 and μ=10-6. Error bars are the standard deviation across three or more independent simulations and are shown in both x directions and the positive y direction. (B, C) Total phage (purple) and total bacteria (teal) average population sizes vs. the mean number of bacterial clones m (B) and vs. average bacterial immunity (C) for η=10-4. Each point is an average at steady state over three or more independent simulations with the same parameters; error bars are standard deviation. Total sizes are scaled by the initial nutrient concentration C0. Lighter colours indicate stronger cross-reactivity, marker shapes match legends in (A) and (B). Solid lines are the predicted total population size given by solving Equations 13–17 and using the approximation effective ee/m in (B) and the measured average immunity for effective e in (C).

Figure 5B makes an additional subtle point: higher immune diversity promotes larger population sizes for phages but not for bacteria. This is counterintuitive and appears to conflict with several experimental results that show that more bacterial diversity increases the likelihood of phage extinction (van Houte et al., 2016; Common et al., 2020) and decreases the ability of phages to adapt (Morley et al., 2017). This discrepancy appears at first glance to be a consequence of limiting bacteria and phages to a single spacer and protospacer in our model, but we propose that it is actually because bacteria and phage diversity is decoupled in prior experiments. In the following section, we introduce a toy model to understand the conceptual impact of multiple protospacers and spacers when overall phage and bacterial diversity is coupled.

Pathogen and host diversity must be considered together

Our model restricts bacteria to a single spacer and phages to a single protospacer. This single-spacer assumption is a reasonable approximation for the experimental systems we consider: many liquid culture experiments find that most bacteria acquire a single spacer when exposed to phages, (Heler et al., 2015; Heler et al., 2019; Pyenson and Marraffini, 2020) even after up to 2 weeks (Paez-Espino et al., 2013; Common et al., 2019). Additionally, metagenomic results show that most spacer matches to viral sequences are located near the leader end of the CRISPR array Weinberger et al., 2012a. These results, combined with theory positing that recently acquired spacer provide more immune benefit than older spacers (Childs et al., 2012; Han et al., 2013), suggest that dynamics may be dominated by a very small number of spacers per bacterium.

Nevertheless, many bacteria possess tens to hundreds of spacers (Pavlova et al., 2021) and the question of multiple protospacers remains. Can we make inferences about how a more realistic multiple spacer or multiple protospacer scenario will change the relationship between diversity and average immunity in our model? First, we note that our model with exponential cross-reactivity is a good approximation for a realistic scenario where bacteria are limited to a single spacer but phages can have multiple protospacers. In this situation, multiple bacteria with unique spacers may target different protospacers in the same phage. A mutation in one protospacer will increase phage fitness but will not provide perfect immune escape, just as in the case of exponential cross-reactivity (Figure 4B). Even in this scenario, our qualitative results remain the same: bacteria and phage diversity is coupled and bacteria have higher average immunity at low diversity (Figure 5B and Figure 50).

We explored the consequences of both multiple spacers and multiple protospacers with a toy model. We generated synthetic sets of protospacers and spacers and varied the total diversity by changing the size of their pool and distributing their abundances exponentially to qualitatively match data (Bonsma-Fisher et al., 2018). We randomly assigned protospacers and spacers to individual phages and bacteria, then calculated average immunity. Importantly, we constrain phage and bacterial diversity to be coupled – overall spacer and protospacer diversity is the kept the same in all scenarios.

We find that across all combinations of array sizes and for two of three scenarios of protospacer diversity, average immunity is negatively correlated with diversity (Figure 69). Only when we limited phage diversity to a single mutating protospacer with all other protospacers conserved were bacteria with multiple spacers reliably able to target conserved protospacers regardless of the level of diversity in the variable protospacer position (Figure 69D). We compared our toy model to the experimental setup from Common et al., 2020. In their experiments, bacterial CRISPR spacer diversity was increased while phage diversity was kept constant with a single phage strain able to infect only one of the bacterial clones. This translated to an increasing initial average immunity as bacterial diversity increased and phages were able to infect a smaller proportion of the bacterial population (Figure 69A), which we think is not a realistic situation in a coevolving population of phages and bacteria.

Based on these results, we predict that in many realistic situations average immunity does not increase with diversity if phage and bacterial diversity are coupled. The intuition is this: if diversity of both bacteria and phages increase beyond the functional length of the CRISPR array, bacteria cannot be immune to all phage strains at once and average immunity must go down as diversity increases. The actual benefit of bacterial diversity depends very strongly on the characteristics of phage diversity (Figure 69) and manipulations of diversity should be understood in terms of their impact on average immunity. This has implications for understanding the so-called 'dilution effect' which is typically described as a decrease in the fraction of susceptible hosts as host diversity increases (Chabas et al., 2018; Common et al., 2020). Our model and analysis show that a dilution effect can only occur if host diversity increases out of proportion to pathogen diversity as in Common et al., 2020 (Figure 69A), whereas if both host and pathogen diversity increase in tandem, the dilution effect actually changes direction, decreasing the fraction of pathogens that a host may be immune to as pathogen diversity increases (Figure 69B–D). In the same vein, previous theoretical work has found that CRISPR with a fitness cost would be selected against when viral diversity is high for this reason (Weinberger et al., 2012b; Iranzo et al., 2013), notably with models that allowed for multiple spacers and protospacers. Our results, building on existing theory, show that phage and bacterial diversity must be explored in tandem and that CRISPR immunity may provide less benefit when phage diversity is high.

Dynamics are determined by diversity

How quickly does the phage population evolve? With explicitly modelled spacer and protospacer sequences of length L=30, we quantify the speed of evolution as the ‘mutational distance’ per generation (Rouzine and Rozhnova, 2018): between two time points, how far in genome space has the phage population travelled, or how many mutations have occurred on average (Speed of evolution)?

The speed of evolution is both stable and repeatable between simulations at steady state and highly correlated between bacteria and phage (Figure 38). We find that the speed of evolution is inversely proportional to the time to extinction for large phage clones (Figure 6B, 'Measuring speed of evolution'). Intuitively, if phage clones turn over more quickly (small time to extinction), the population is able to move more quickly to a different genetic state, while if the time to extinction is large, new mutants are seeded close to parent populations that persist for a long time, limiting the mutational distance. Since the phage time to extinction has a simple relationship to bacterial diversity, we can also relate speed to diversity, and we find that the speed of evolution is proportional to diversity and proportional to the same parameter combination to the power 1/3 (Figure 6C). As in the case of diversity, increasing phage mutation rate μ increases the speed of evolution sublinearly, and like diversity, the prediction diverges from simulation results at low η as described in 'Measuring diversity.

Figure 6 with 5 supplements see all
Phage evolution and spacer turnover.

(A) Principal Component Analysis (PCA) decomposition of phage and bacteria clone abundances for a simulation with C0=104, e=0.95, η=10-4, and μ=10-5. Clone abundances are normalized at each time point, then PCA is performed for the entire phage time series over 4000 generations (four times the mean extinction time for phage clones). Bacteria and phage clone abundances are transformed into the PCA coordinates; colours indicate simulation time. Five time points are highlighted in progressively lighter shades of red for emphasis. (B) Phage genomic speed of evolution vs. mean large phage clone time to extinction. The phage speed is the weighted average genomic distance between the phage population at the end of the simulation and the phage population at an earlier time, divided by the time interval. The dashed line is y=1x. (C) The speed of evolution increases as spacer effectiveness e, spacer acquisition probability η, and phage mutation rate μ increase. The dashed line shows an approximate theoretical calculation (assuming speed = 1/time to extinction) which captures the trend across a wide range of parameters. Error bars in (B) and (C) are the standard deviation across three or more independent simulations and are shown in the positive direction only. (D) Spacer turnover as a function of time delay for four simulations with C0=104, e=0.95, and μ=10-5. The fraction of bacterial clones remaining is the fraction of clones that were present at time t that are still present at time t+ delay. Solid lines are an average across steady-state for each value of the time delay; shaded regions are the standard deviation. (E–G) Spacer-type turnover calculated as in (D) using experimental data from Paez-Espino et al., 2015 (E), metagenomic data sampled from groundwater from Burstein et al., 2016 (F), and metagenomic data sampled from a wastewater treatment plant from Guerrero et al., 2021a (G). Experimental time points are interpolated to the minimum sampling interval to allow averaging across the experiment.

Our simulations show continual spacer turnover at steady state (Figure 6D, Figure 6—figure supplement 3, Figure 6—figure supplement 4), a feature that we might also expect to see in actively evolving laboratory or natural populations of bacteria and phages. We analysed data from a long-term in vitro coevolution experiment with Streptococcus thermophilus bacteria and phage (Paez-Espino et al., 2015), data from a time-series sampling of a natural aquifer community of bacteria (Burstein et al., 2016), and data from a time-series sampling of a wastewater treatment plant (Guerrero et al., 2021a) (see 'Materials and methods') and calculated spacer turnover over time (Figure 6E–G). We found that spacer sequences experienced turnover in all cases, indicating ongoing change in the spacer content in these populations, though no experimental system sho wed complete turnover of all spacer types. Even if spacer loss is happening for non-selective reasons (e.g., mixing or flow in the aquifer system), turnover indicates that new spacers are being acquired as well and that CRISPR systems are active. Moreover, the timescale of early spacer turnover in the S. thermophilus experiment is similar to the timescale in our simulations — after 1000 generations, most simulations have between 40 and 60% of clones remaining (Figure 6D) and about 60% of clones remain after 1000 generations in the experimental data (Figure 6E). We modelled all simulation parameters on known parameters for S. thermophilus where possible.

Time-shifted average immunity calculated from data reveals distinctive patterns of turnover

Bacteria acquire spacers in response to phage clones becoming large. The spacer composition of the bacterial population tracks the phage protospacer composition with a lag: Figure 6A shows the first two components of a PCA decomposition of bacteria and phage abundances over time in a simulation, a visual illustration of bacterial tracking. Without cross-reactivity, trajectories in this lower-dimensional space do not travel in a straight line; they are reminiscent of the diffusive coevolutionary trajectories in antigenic space described in a theoretical model of vertebrate virus-host coevolution (Marchi et al., 2019). In the travelling wave regime with cross-reactivity, trajectories are much more ballistic (Figure 6—figure supplement 1). Can we quantify how well and how quickly can bacteria track the evolving phage population with CRISPR?

Average immunity is a simple metric that quantifies the overlap between bacteria and phages. In experiments with both microbes and vertebrates, time-shift infectivity analyses between host and pathogen populations typically show that hosts are more immune to pathogens from the past and less immune to pathogens from the future (Richman et al., 2003; Frost et al., 2005; Moore et al., 2009; Hall et al., 2011; Koskella, 2014; Betts et al., 2018; Common et al., 2019; Dewald-Wang et al., 2022). We conduct a time-shift analysis on our simulation data (Figure 7A and B, Figure 7—figure supplement 11, Figure 7—figure supplement 12) and find that the same pattern holds true, but only for a limited time window: bacteria are indeed more immune to phages from the past, but this past immunity has a peak and then decays as we look further into the past, eventually reaching 0 immunity (Figure 7B). The presence of a peak in past immunity reflects the timescale of spacer turnover: once the bacterial population has lost all spacers from a previous time point, it is no longer immune to contemporaneous phages and time-shifted average immunity falls to 0. In simulations, the position of this peak is dependent on parameters such as η and μ (Figure 7E and F). As μ increases, the peak occurs further in the past since phage are now moving more quickly away from bacteria tracking, while as η increases, the peak moves closer to the present since bacteria are responding more quickly to changes in the phage population. This suggests that there is a tradeoff between immune memory durability and responding quickly to immune threats: bacteria must choose between tracking the phage population closely or keeping past immunity for a long time.

Figure 7 with 12 supplements see all
Quantifying immune memory in data.

(A, B) Average immunity of bacteria against phage for four simulations with different values of η as a function of time shift. Solid lines are an average across steady state for each value of the time shift; shaded regions are the standard deviation. Average immunity peaks in the recent past (A, indicated by τ*) with a negative slope through zero delay (A, black dashed line) and decays to zero at long delays in the past or future (B). For all simulations C0=104, μ=10-5, and e=0.95. (C, D) Average overlap between bacterial spacer and phage protospacer types using data from a lab experiment with S. thermophilus and phage from Paez-Espino et al., 2015 (C) and data from a wastewater treatment plant sampled over 3 years from Guerrero et al., 2021a (D). Spacer types are grouped by 85% similarity, and shaded region is standard deviation across averaged data. Base average immunity values were multiplied by the average number of protospacers corresponding to the S. thermophilus CRISPR system (C) and the Gordonia CRISPR systems (D) to account for multiple potential protospacer targets per phage. In (D), we compared two time shifts with zero delay average immunity using a Wilcoxon signed-rank test: p=0.27 for lower past immunity at 500 days, p=0.008 for lower past immunity at 200 days, p=0.001 for lower future immunity at 500 days, and p=0.003 for lower future immunity at 200 days. (E, F) The position of the peak in past immunity for simulated data vs. spacer acquisition probability η (E) and phage mutation rate μ (F). The peak position is the time shift value for which the curves in (A) are largest, indicated by τ*. Error bars are the standard deviation across three or more independent simulations.

Time-shift analyses are usually performed explicitly by directly combining stored samples from different time points (Betts et al., 2018; Common et al., 2019; Dewald-Wang et al., 2022). However, by sequencing CRISPR spacers and phage genomes, a pseudo-analysis can be done without any time-shifted competition experiments by calculating the overlap between bacterial spacers and phage protospacers at different times delays. We performed such an analysis for two published datasets: a long-term laboratory coevolution experiment with S. thermophilus and phage (Paez-Espino et al., 2015) and time series of metagenomic samples over 3 years from a wastewater treatment plant (Guerrero et al., 2021a). In both experiments, whole-genome shotgun sequencing was performed on bacteria and phage DNA, and CRISPR spacers and protospacers were recovered and reported. We re-analysed these datasets to detect CRISPR spacers by finding sequences adjacent to known CRISPR repeats in raw reads. We also detected protospacers by finding matches to our detected CRISPR spacers in reads that did not match the CRISPR repeats or the bacterial reference genome(s) (see ‘Laboratory coevolution experimental data’; ‘Experimental data from wastewater treatment plant’).

We grouped spacers with an 85% similarity threshold and interpolated counts between sequenced time points. We calculated the average overlap assuming e=1 (perfect immunity from matching spacers) as a function of time delay, averaging over all combinations of interpolated data with the same time delay (Figure 7C and D). We found two qualitatively different trends as a function of time delay. In the laboratory coevolution experiment, we found that bacteria are more immune to past phages and less immune to future phages (Figure 7C, Figure 7—figure supplement 1), consistent with what we see at short time delay in our model (Figure 7A, black dashed line). In contrast, in the wastewater treatment plant data, we found a peak in average immunity that is roughly centred at zero time delay: bacteria are most immune to phages from their same time point, and immunity rapidly decays in both the past and the future (Figure 7D), qualitatively similar to our simulation results on very long timescales (Figure 7B). These different trends may reflect different regimes of CRISPR immune memory: in the laboratory experiment data, we see no evidence of the decay of immune memory, while in the wastewater treatment data we see a suggestion of decay on the timescale of weeks. Bacteria appear to track the phage population closely in the wastewater data, while in the laboratory data bacteria lag behind the phage population.

Because the variability in average immunity between time points was very high in the wastewater treatment plant data, we performed a Wilcoxon signed-rank test between the average immunity at zero time delay and the average immunity at a time delay ±200 and ±500 (see ‘Calculating average immunity for details’). We found that past immunity after 200 days is significantly lower than present (Wilcoxon statistic Z=1241 p-value p=0.008), but that immunity after 500 days is not significantly lower than present (Z=392,p=0.27). For both time delays, we found that future immunity is significantly lower than present (Z=580,p=0.0012 for 500 days and Z=1293,p=0.003 for 200 days). Interestingly, these significance values are not symmetric if we pose the question from the perspective of phage: the overlap between phage and future bacteria at 500 days is lower than the overlap for present phages (Z=507,p=0.024), and the overlap between phage and past bacteria is lower than the overlap at present (Z=388,p=0.027, Figure 88). This asymmetry, that bacteria are generally more immune to all past phages while phage are not more infective against all past bacteria, is qualitatively the same as that reported by Dewald-Wang et al. in an explicit time-shift study of bacteria and phage immunity and infectivity in chestnut trees (Dewald-Wang et al., 2022).

Discussion

Many bacteria and archaea possess CRISPR systems, and a significant fraction of these systems are likely to provide immunity against phages (Brodt et al., 2011; Shmakov et al., 2017; Pourcel et al., 2020). Given that bacteria and phages coexist in natural environments over extremely long timescales, the impact of CRISPR immunity in these steady-state conditions has remained underexplored. We constructed a phenomenological model of CRISPR immunity in a bacterial population interacting with phages to explore the impacts of adaptive immunity on population survival, fitness, and diversity. We found that both phage and bacterial genetic diversity emerged spontaneously with a minimal set of interactions, and we derived approximate analytic predictions for population outcomes. These rigorous analyses of our simple model lay a foundation for theoretical analysis of adaptive immunity in host-pathogen systems.

Our model is mechanistically simple, and we left out many known biological features and interactions by choice in order to gain a deep understanding of the factors that influenced our results. We modelled uniform spacer effectiveness, uniform spacer acquisition probability, and a constant phage mutation rate, all in a well-mixed system. This constitutes a null model of CRISPR immunity that provides a useful comparison point for both more accurate mechanistic models and experimental data. CRISPR is one of many other bacterial antiviral defence systems (Bernheim and Sorek, 2020), and within the CRISPR world, CRISPR systems are highly evolutionarily diverse and there are many known differences in function and effect between CRISPR systems of different bacterial species (Koonin et al., 2017; Hille et al., 2018; Makarova et al., 2020; Koonin and Makarova, 2022). Experiments typically study a particular CRISPR system, and it is unclear which revealed mechanisms are specific to that system or are a more general property of CRISPR systems. For example, the spacer acquisition rate has been demonstrated to vary across particular phage sequences in several experiments (Heler et al., 2019; Modell et al., 2017; Paez-Espino et al., 2013) and is the source of differences in spacer abundance in some experiments (Heler et al., 2019), but whether this is a general principle that causes broad abundance distributions is not known. Many theoretical works also include mechanistic details such as a lag between infection and burst (Levin et al., 2013; Santos et al., 2014), multiple protospacers and spacers (typically with a fixed upper bound) (Weinberger et al., 2012b; Iranzo et al., 2013; Bonsma-Fisher et al., 2018; Childs et al., 2014; Childs et al., 2012; Weissman et al., 2018, spatial structure Payne et al., 2018; Haerter et al., 2011, autoimmunity Weissman et al., 2018; Chabas et al., 2021, and fitness costs of immunity and/or escape mutations Weinberger et al., 2012b; Weissman et al., 2021). Many phages also contain anti-CRISPRs which impact phage evolution and CRISPR immunity (Bondy-Denomy et al., 2013; Hwang and Maxwell, 2019). These details are all biologically important, but stripping them away as we do provides great insight into which population features depend on these details and which may be more general properties of adaptive immunity.

Diversity and average immunity

Experimental manipulations of CRISPR diversity have focused on changing the number of bacterial spacers present in the population (van Houte et al., 2016; Morley et al., 2017; Common et al., 2020) while keeping phage diversity fixed (and low), with the exception of Guillemet et al., 2021 who also explored high vs. low phage diversity. In contrast, emergent phage and bacterial diversity are tightly coupled in our model (Figure 2B). At high diversity, bacteria must ‘choose’ which of many phage clones to gain immunity against, meaning that they are then immune to a smaller fraction of the total phage population compared to when diversity is low. This is not a result of limiting bacteria and phage to a single spacer or protospacer; this trend also holds when there are multiple spacers or protospacers provided phage and bacteria diversity are correlated, as we showed in ‘Pathogen and host diversity must be considered together’.

Our toy model described in ‘Pathogen and host diversity must be considered together’ shows that the framework of average immunity provides a conceptually intuitive way to understand the impacts of different types of diversity and modes of evolution. Future work to explicitly model the population dynamics of multiple spacers and protospacers with this lens will allow us to understand how and when these different evolutionary modes arise. For instance, it remains unknown how the linkage of different spacers and protospacers within genomes would affect individual clone dynamics. Our work effectively assumes that all spacers and protospacers evolve independently and are uncoupled, but in real genomes evolving with CRISPR, spacers and protospacers may hitchhike to prominence through selection acting on a different sequence in the same genome. Models of CRISPR locus spacer addition have shown that trailer-end clonality emerges from selective sweeps of newly added spacers (Weinberger et al., 2012a; Han et al., 2013), and hitchhiking will affect interpretations that can be drawn from the abundance of individual spacer clones. Exploring correlations between spacer abundances at different locus positions can be done in experimental data (such as Paez-Espino et al., 2013), and extending our simulation to explicitly include multiple protospacers and spacers per genome will provide further data.

Our results suggest that immune diversity itself may not be the most informative observable feature but that average immunity is what actually determines population outcomes. Indeed, when we introduced cross-reactivity between spacer sequences, we found that diversity decreased and that other processes such as establishment were impacted, but that average immunity still correctly predicted population sizes (Figure 5C). Average immunity completely summarizes the effect of CRISPR immunity and accurately predicts population outcomes. Experiments that manipulate bacterial diversity only change average immunity if they include a changing proportion of sensitive bacteria as in Common et al., 2020 (Figure 69A).

Our results showed complex outcomes and a wide range of overall spacer diversity despite limiting bacteria and phage to a single protospacer and spacer. We also observed rapid loss of previous spacer types as bacteria returned to the naive state before acquiring new spacers. In reality, bacteria can store multiple spacers in their CRISPR arrays (Pavlova et al., 2021; Pourcel et al., 2020) and phages can have several hundred to several thousand protospacer sequences which are determined by a particular protospacer-adjacent motif (PAM) in type I and II CRISPR systems (Leenay et al., 2016). In S. thermophilus phage 2972, for example, there are about 230 protospacers that can be acquired into the CRISPR1 locus of S. thermophilus (Paez-Espino et al., 2013) and 465 that can be acquired into the CRISPR3 locus. Similarly, many bacteria can store tens to hundreds of spacers in their CRISPR arrays (Pavlova et al., 2021; Chen et al., 2022; Bradde et al., 2020; Pourcel et al., 2020). Our single-spacer assumption is reasonable for the experimental systems we consider. In liquid culture experiments with staphylococci, most bacteria acquire a single spacer (Heler et al., 2015; Heler et al., 2019; Pyenson and Marraffini, 2020), and in short-term experiments with S. thermophilus, most bacteria acquired just one CRISPR spacer over 2 weeks (Paez-Espino et al., 2013 or one to three spacers over 9 days Common et al., 2019). In the experimental data we analysed here (a 200+ day experiment), the average number of newly acquired spacers hovered around 5 throughout the experiment (Paez-Espino et al., 2015). In a long-term study of metagenomic data, most spacer matches to extant viral sequences were located near the leader end of CRISPR loci; conserved trailer-end spacers had very few matches to sampled viruses (Weinberger et al., 2012a). All these results suggest that dynamics may be dominated by just a handful of spacers per bacterium. In addition, theoretical work has found that the most recently acquired spacer provides more immune benefit than older spacers (Childs et al., 2012; Han et al., 2013). Our single-protospacer assumption for phages had a more extreme impact, however: in our model, we assumed in the non-cross-reactive case that a single point mutation allowed phages to escape from all CRISPR targeting, when in reality a phage with hundreds of protospacers may need mutations in many different protospacers to fully escape from the bacterial population. Previous work on the impact of CRISPR diversity takes place in this context: that having more diverse bacterial spacers means phages need mutations in multiple protospacers to escape, and that a single mutation only escapes a single bacterial spacer at most, leaving other bacteria still resistant (Chabas et al., 2018; van Houte et al., 2016; Common et al., 2020). We approximated this scenario of gradually increasing phage fitness per protospacer mutation by adding cross-reactivity to our model and showed in ‘Cross-reactivity leads to dynamically unique evolutionary states’ that the relationship between diversity and average immunity does not qualitatively change. We also observed the same qualitative trend in a toy model exploring the effects of both multiple spacers and multiple protospacers (‘Pathogen and host diversity must be considered together’).

Our model makes quantitative predictions about the relationship of population size and population outcomes to average immunity that can be tested in data. We directly calculated average immunity in experimental data without the need to assemble phage genomes (Figure 7), and we found that average immunity is anticorrelated with phage population size in data in one experimental dataset, as predicted by our model (Figure 77). However, we did not find a correlation between average immunity and read count (a proxy for population size) in wastewater treatment plant data (Figure 90). When calculating average immunity using experimental data, one must make assumptions about the immune benefit of spacers. In our work, we assumed that spacer and protospacer sequences that belong to the same similarity group provide perfect immunity. Other more complex assumptions are possible, and there has been a great deal of work quantifying the efficiency of spacers based on the position of SNPs in the protospacer in vitro (see Sternberg et al., 2015 for an example) and in vivo (Soto-Perez et al., 2019). Additionally, phages regularly acquire mutations in PAMs, and these mutations strongly decrease the targeting efficiency of bacterial spacers (Leenay et al., 2016; Sun et al., 2013).

In certain limits, average immunity can be related to the Morisita–Horn similarity index ΨWolda, 1981 (see Appendix 1), where instead of comparing two populations from different timepoints or different regions, we compared bacteria and phage populations scaled by the immune benefit of CRISPR. By comparing our definition of average immunity to the Morisita–Horn index, we find that the Morisita overlap is constant across all parameters we study, which implies that the population diversity evolves to attain the highest possible overlap. Average immunity is also theoretically similar to the concept of distributed immunity introduced by Childs et al., 2014; Childs et al., 2012, the ‘immunity’ quantity calculated by Han et al., 2013, and the probability of an immune encounter used by Iranzo et al., 2013 to derive mean-field population predictions. The mathematical definition of average immunity is slightly different than these quantities, but the effect on populations is strikingly similar, and it is significant that several different models and different metrics find that a quantification of immunity is the important predictor for population-level outcomes.

In silico time-shift experiments yield qualitative trends

We calculated average immunity in two experimental datasets and performed synthetic time-shift experiments by calculating average immunity between time-shifted populations of bacteria and phage. This effectively replicates explicit time-shift experiments (Richman et al., 2003; Frost et al., 2005; Moore et al., 2009; Blanquart and Gandon, 2013; Koskella, 2014; Chabas et al., 2016; Pyenson and Marraffini, 2020; Laanto et al., 2017; Common et al., 2019; Dewald-Wang et al., 2022). A similar time-shift calculation using sequence data was performed by Guillemet et al., 2021; they approached the question from the phage perspective and found that phages are most infectious against bacteria from the past and least infectious against bacteria from the future. Our results show that powerful insights into the dynamical immune state of a population can be obtained from sequencing data without explicitly performing time-shift experiments. We applied the same simple method to two very different experimental populations, suggesting that this analysis may be feasible in other time-series datasets, including from other natural microbial populations.

Our model predicts that an immune-evolving population in true steady state must eventually experience declining past immunity. CRISPR arrays are not infinite, and spacers must eventually be lost, even though this may happen on very long timescales. This pattern of declining past immunity is also in line with the expectations of fluctuating selection dynamics in which more rare genotypes are more fit (Gaba and Ebert, 2009; Hall et al., 2011; Blanquart and Gandon, 2013; Dewald-Wang et al., 2022). Previous theoretical work applied to vertebrate adaptive immunity also supports the existence of a peak in adaptive benefit in the recent past (Blanquart and Gandon, 2013; Nourmohammad et al., 2016), and some experimental time-shift studies have reported a peak in adaptation for bacteria-phage interactions (Hall et al., 2011; Koskella, 2014; Dewald-Wang et al., 2022 and for HIV-immune system interactions Blanquart and Gandon, 2013; Nourmohammad et al., 2016). We saw two qualitatively different time-shift curves in the experimental data we analysed: in the long-term laboratory coevolution data, bacteria were more immune to all past phages and less immune to all future phages (Figure 7C), while in the wastewater treatment plant data, bacteria were most immune to phages in their present context and less immune to phages in both the past and the future (Figure 7D). The time-shift pattern of the laboratory coevolution data was consistent with experimental time-shift data that is typically said to be indicative of arms-race dynamics (Richman et al., 2003; Frost et al., 2005; Moore et al., 2009; Blanquart and Gandon, 2013; Betts et al., 2018; Common et al., 2019), while the wastewater data may be consistent with a ‘zoomed-out’ view of the eventual decline in immunity in both the distant past and distant future (Nourmohammad et al., 2016; Guillemet et al., 2021). Indeed, the distinction between arms-race dynamics and fluctuating selection dynamics in time-shift experiments may be a question of the timescale being investigated (Gaba and Ebert, 2009; Blanquart and Gandon, 2013; Dewald-Wang et al., 2022). Our simulation results may qualitatively match both of these regimes depending on the timescale observed, which suggests that measuring very long-term coexistence is an area for further work in host-parasite coexistence experiments. In general, we expect that the timescale of memory length is related to the size of CRISPR arrays: if overall rates of spacer gain and loss are balanced, then we expect an individual spacer’s lifetime in the population to be proportional to array length. The factors that affect the timescale of memory length is an interesting area for further work.

One reason why past immunity did not unambiguously decline in our analysis of the laboratory coevolution data may be that spacer loss was not observed in the original experiment (Paez-Espino et al., 2015. This is also true of several other similar experiments with S. thermophilus Paez-Espino et al., 2013; Weissman et al., 2018), though spacer loss was directly observed coincident with spacer acquisition in Deveau et al., 2008. Spacer loss has also been observed in other experimental systems (Jiang et al., 2013; Rao et al., 2017; Deecker and Ensminger, 2020; Garrett, 2021), and indirect evidence of spacer loss through sequence comparisons has been reported for S. thermophilus (Horvath et al., 2008 and metagenomic data Held et al., 2010; Weinberger et al., 2012a). Spacers may also be functionally lost despite remaining in the genome, either from stochastic decoupling of RNA polymerase (Zoephel and Randau, 2013; Martynov et al., 2017; Soto-Perez et al., 2019 or from a dilution effect of competition for limited Cas protein complexes Martynov et al., 2017; Bradde et al., 2020; Garrett, 2021). The ‘loss’ rate in our model could also be taken to be gain and loss of the entire CRISPR system, which is known to happen (Delaney et al., 2012; Rollie et al., 2020), although this would likely happen on much slower timescales than the acquisition and loss we model. The qualitative shape and timescale of time-shift curves like the ones we presented may indirectly contain information about the timescale of spacer loss.

We found in our simulations that memory length decreased as spacer acquisition probability increased and as phage mutation rate decreased: if bacteria followed phages more closely (higher η) or if phages evolved more slowly (lower μ), the peak in past immunity was closer to the present and memory was more quickly lost (Figure 7A, E, and F). A similar correspondence was derived in a theoretical model of optimal immune memory formation in vertebrates: more frequent sampling of the pathogen landscape (corresponding to higher spacer acquisition η in our model) led to faster memory loss (Mayer et al., 2019). Notably, that result is the optimal update scheme for an immune system wishing to minimize the costs of infection; there is no explicit optimization in our model, yet we find a pattern consistent with a potentially optimal solution.

Vertebrate adaptive immunity

CRISPR adaptive immunity is mechanistically very different from the vertebrate adaptive immune system, yet there are several striking conceptual similarities between them and in the way past theoretical works have modelled vertebrate immunity. For example, phenomenological models of binding affinity between antigens and the immune system have used the degree of similarity between strings as a metric for interaction strength (Detours and Perelson, 1999; Detours and Perelson, 2000; Chao et al., 2005; Wang et al., 2015; Nourmohammad et al., 2016; Sachdeva et al., 2020), and more abstract models of the change in immunity after a virus mutation have also used string similarity to track the strength of immune coverage (Luksza and Lässig, 2014; Rouzine and Rozhnova, 2018; Yan et al., 2019). This is a simplification of a complex protein-protein interaction in the case of the vertebrate immune system (Altan-Bonnet et al., 2020), but it is nearly identical to the actual mechanism of CRISPR immunity, suggesting that lessons from models of CRISPR immunity may in turn be applicable to other forms of adaptive immunity.

Pathogen evolution has been extensively explored in models of vertebrate immunity, and here too we can draw conceptual parallels. We found that the speed of phage evolution depended sublinearly on phage population size in our model (Figure 6C), analogous to the weak positive dependence on population size described by Marchi et al., 2021. Adding cross-reactivity between spacer types in our model changed overall population outcomes as well as population dynamics. We observed a traveling wave regime in which new phage mutants ‘pull’ existing bacterial clones along, even if their matching clone is small or extinct (Figure 54). This travelling wave regime is qualitatively similar to that observed in theoretical models of ‘ballistic’ pathogen evolution in effective low-dimensional antigenic space (Yan et al., 2019; Marchi et al., 2019; Marchi et al., 2021). Similar clone population dynamics were observed in a model of B cell activation and memory generation in which existing memory B cells are reactivated in response to infection with a mutated virus that remains within its cross-reactivity radius (Chardès et al., 2022). We also observed splitting of lineages into divergent subtypes in our model, a phenomenon that has been studied in vertebrate pathogen evolution as well (Marchi et al., 2019; Marchi et al., 2021). We found that the average number of distinct clans was proportional to overall diversity but that increasing cross-reactivity decreased the average clan size (‘Number and size of clone clans’). These qualitative similarities highlight the usefulness of CRISPR adaptive immunity to understand population dynamics that also occur in the vertebrate immune system.

Selection in the vertebrate immune system happens at many distinct stages within individuals, including T cell receptor selection in the thymus (Camaglia et al., 2022 and affinity maturation of B cells following an immune challenge Chardès et al., 2022; Nourmohammad et al., 2019; Nourmohammad et al., 2016; Molari et al., 2020). Selection also happens at the population level as pathogens and individuals coevolve over time (Marchi et al., 2019; Yan et al., 2019, for instance, in the evolution of influenza Luksza and Lässig, 2014; Yan et al., 2019). In contrast, within-host dynamics are much simpler in bacteria, and the possibilities for within-host coevolution are especially limited in the case of virulent phages that kill their hosts after a single infection. Similarly, in the vertebrate immune system, extremely high immune diversity is possible in a single individual, whereas in CRISPR immunity, total diversity per individual is many orders of magnitude lower: most bacteria contain tens of spacers with a very few examples of a few hundred spacers (Pavlova et al., 2021). This reintroduces the question of how to think about immunity in microbial populations, whether as playing out on the level of individuals or on the level of the entire population. Finally, unlike the vertebrate adaptive immune system, CRISPR immunity is heritable, and so the optimal immune strategy for bacteria takes place on a timescale potentially much longer than an individual’s lifetime as well as on the level of the entire population Mayer et al., 2016. We have highlighted several analogies between CRISPR adaptive immunity and vertebrate adaptive immunity. Overall, the mechanistic simplicity and experimental tractability of CRISPR immunity mean that abstract ideas from vertebrate immunity can be linked to measurable features of CRISPR immunity, and insights from CRISPR immunity may be useful in understanding vertebrate immunity as well.

Primed acquisition

In some CRISPR systems, an imprecise match between a spacer and protospacer results in a much higher spacer acquisition rate, a phenomenon called priming (Rao et al., 2017). Including cross-reactivity in our model is a way to explore the effect of primed acquisition. However, we modelled cross-reactivity as a change in phage infection success probability, whereas in primed acquisition it is the spacer acquisition rate that changes. Because bacteria must lose their spacer before gaining a new spacer in our model, including primed acquisition directly would require expanding bacterial arrays to at least two spacers. The acquisition probability η could then become a function of protospacer-spacer similarity to directly model primed acquisition.

Selection patterns in CRISPR immunity

The mechanism of selection for immune variants in CRISPR adaptive immunity is a matter of debate. We observed several features that suggest the presence of negative frequency-dependent selection in our model: (a) decreased clone growth rates as clone size increased, (b) cyclic clone size dynamics as immune pairs oscillate in size, and (c) continual turnover in spacer types over time. This is consistent with recent experimental observations of negative frequency-dependent selection in host-pathogen systems (Betts et al., 2018; Guillemet et al., 2021). Our model also contains the base assumption that phages that can successfully infect dominant bacterial strains will experience positive selection, generating ‘kill-the-winner‘ dynamics that lead to negative frequency-dependent selection for bacteria (Weinberger et al., 2012a). Our model does not capture true host-pathogen coevolution because bacteria are not able to evolve beyond ‘following’ the mutational landscape of phages by acquiring spacers. In true coevolution, bacterial genomic mutations, apart from the ability to acquire spacers, are also drivers of evolution and sources of selection pressure on phages.

Phase transitions in immunity

We observed large population size fluctuations when average immunity reached a particular value of approximately 10% for the parameters we used (Figure 31). This critical value is determined by a balance of the rate of spacer acquisition and spacer loss that leads to exactly equal fitness for both bacteria with spacers and bacteria without spacers (see ‘Dominant balance approximations for ν’). Despite the presence of spacers with non-zero effectiveness, near this critical point the population appears to behave as if there is no CRISPR immunity at all. This has two interesting results: first, the probability of phage establishment drops to zero for some simulations near this critical point (Figure 3—figure supplement 2), since if CRISPR has no influence then mutant phages do not experience strong positive selection. Second, population sizes become unstable near the transition: outcomes with quite different total bacteria population size and fraction of bacteria with spacers are possible (Figures 30 and 31). This critical point will be interesting to explore further as a possible phase transition.

Experimentally accessible parameters

We found that diversity in our model depended on spacer acquisition rate, spacer effectiveness, and spacer mutation rate. These parameters are all known to vary in natural systems, and some may also be experimentally manipulated. CRISPR systems display a range of spacer acquisition rates even within the same system (Heler et al., 2019), and both spacer acquisition rate and spacer effectiveness may be controlled by the level of expression of Cas proteins, which has been found to be connected to the quorum sensing pathway (Høyland-Kroghsbo et al., 2017). Additionally, recent work has found that spacer acquisition is more efficient at slow bacterial growth rate because phages also grow more slowly, giving bacteria more time to acquire spacers (Høyland-Kroghsbo et al., 2018; Dimitriu et al., 2022). This change in bacterial growth rate can be controlled by nutrient concentration (Dimitriu et al., 2022; Payne et al., 2018, temperature Høyland-Kroghsbo et al., 2018, or by bacteriostatic antibiotics which slow bacterial growth without killing Dimitriu et al., 2022). Phage mutations rates also vary and may be experimentally modified (Kysela and Turner, 2007). These findings all represent possible experimental manipulations of parameters relevant for CRISPR immunity; for example, given these findings we predict that a bacteria-phage population exposed to bacteriostatic antibiotics would evolve higher CRISPR spacer diversity than one without antibiotics.

Methods

Model

In this section, we describe the details of our phenomenological model of bacteria and phages, providing more detail for Figure 1A. Sections ‘Reactions’ and ‘Master equation’ list the stochastic interactions we simulate, and ‘Clone size master equation’ lists master equations that are used to derive mean-field equations for this dynamical system (Total population size) and clone size distributions for bacteria and phages (Appendix 2).

We model bacteria and phages interacting in a chemostat. The populations we track are nutrient concentration (C), phages (nV), and bacteria (nB). Bacteria can either have no spacer (nB0) or a spacer of type i (nBi), and phages can have a single protospacer of type j (nVj). Nutrients flow in at concentration C0 with rate F, and all species flow out with rate F. The total number of bacteria with a spacer is nBs and the total number of bacteria is nB. Bacteria grow at rate gC. With rate α, a phage interacts with a bacterium. With probability pV, the phage will kill bacteria without a matching spacer and produce a burst of new phages with size B, while for bacteria with a matching spacer that probability is reduced to pVs=(1-e)pV (0e1). Bacteria without spacers that survive an attack have a chance to acquire a spacer with probability η. Bacteria with a spacer lose their spacer at rate r. Phages are limited to a single protospacer, and phages can mutate to a new protospacer type with probability μL, where μ is the per-base mutation rate and L is the protospacer length (L=30 in all simulations). Parameter descriptions and default values are shown in Table 3. See refs Weissman et al., 2021; Gurney et al., 2019 for recent examples of other chemostat-based models.

Reactions

Table 1 lists all the interactions present in our model between individual bacteria (b), phages (V), and nutrients (C).

Table 1
Model reactions.
b0,i+Cg2b0,iBacterium divides
b0,iFBacterium flows out
VjFPhage flows out
FC0CNutrients flow in
CFNutrients flow out
n=0B(b0+VjαpVPn(B-n)Vj+k=1nVm+k)Interaction, phage wins,
Pn probability of n mutuant phages
b0+Vjα(1pV)(1η)b0Interaction, bacterium survives
b0+Vjα(1pV)ηbiInteraction, bacterium survives and acquires a spacer
n=0B(bi+VjαpV(i,j)Pn(Bn)Vj+k=1nVm+k)Interaction, phage wins,
Pn probability of n mutant phages
bi+Vjα(1pV(i,j))biInteraction, bacterium survives
birb0Bacterium loses spacer

In Table 1, pV(i,j) is the probability of a phage with protospacer type j killing a bacteria with spacer type i.

Pn in Table 1 is the probability of n mutant phages in a burst, given by a binomial distribution:

(9) Pn=P(n;B,1P0)=(Bn)(1P0)n(P0)Bn

where P0=e-μL is the probability of no mutations for an individual phage. L is the protospacer length and μ is the mutation probability per base per generation.

In this formulation of the mutation term, new mutants are assumed to be a new, not previously existent, phage type which increases m by n, the number of mutations (+k=1nVm+k). In our simulations, phage mutations can happen to existing types, but these are rare events and we neglect them in our theoretical analysis.

Master equation

The reactions in Table 1 can be formulated as a master equation describing the probability of observing nB0 bacteria without spacers, the set of nBi bacteria with spacers of type i, the set of nVj phages with protospacers of type j, and a nutrient concentration of C at time t (Equation 10).

(10) dP(nB0,{nBi},{nVj},C,t)dt=g(C+1)(nB01)P(nB01,{nBi},{nVj},C+1,t)+k=1mg(C+1)(nBk1)P(nB0,{nBik},nBk1,{nVj},C+1,t)+F(nB0+1)P(nB0+1,{nBi},{nVj},C,t)+k=1mF(nBk+1)P(nB0,{nBik},nBk+1,{nVj},C,t)+=1mF(nV+1)P(nB0,{nBi},{nVj},nV+1,C,t)+F(C+1)P(nB0,{nBi},nV,C+1,t)+FC0P(nB0,{nBi},nV,C1,t)+=1mα(1pV)(1ηm)nB0(nV+1)P(nB0,{nBi},{nVj},nV+1,C,t)+k=1m=1mα(1pV)ηm(nB0+1)(nV+1)P(nB0+1,{nBik},nBk1,{nVj},nV+1,C,t)+k=1m=1mα(1pV(k,))nBk(nV+1)P(nB0,{nBik},nBk,{nVj},nV+1,C,t)+k=1m=1mn=0BαpV(k,)Pn(nV(Bn)+1)(nBk+1)P(nB0,{nBik},nBk+1,{nVj},nV(Bn)+1,C,t)+k=1mn=0BαpVPn(nV(Bn)+1)(nB0+1)P(nB0+1,{nBi},{nVj},nV(Bn)+1,C,t)+k=1mr(nBk+1)P(nB01,{nBik},nBk+1,{nVj},C,t)(F(nB0+k=1mnBk+=1mnV+C+C0)+gC(nB0+k=1mnBk)+α=1mnV(nB0+k=1mnBk)+rk=1mnBk)P(nB0,{nBi},{nVj},C,t)

Each term is included only if all respective population quantities are 0; for instance, the first term is only included if nB0>0, since if nB0=0 then nB01<0 and the entire term is negative and non-physical. This is important to note especially in terms containing nV-(B-n)+1, which are only included if nV>Bn1.

Clone size master equation

We can also write master equations for the total number of clones of size k. These equations describe the population-level distribution of clone sizes instead of the size of a single typical clone as in Equations 123 and 147.

The master equation for bk, the number of bacteria clones of size k, is:

(11) tbk=gC[(k1)bk1kbk]Bacteria growth+(F+r)[(k+1)bk+1kbk]Removal of spacer-containing bacteria by chemostat outflow or spacer loss+αv[(k+1)bk+1pV(k+1,)kbkpV(k,)]Phage predation - reduces clone size by 1 for each successful infection+αηnB0nvk(1pV)[bk1bk]New bacteria clones from spacer acquisition

The last term representing spacer acquisition is complicated: nVk* is the total number of phages which contain protospacers that match bacteria clones of size k-1, multiplied by the probability of getting that protospacer if infected by that phage. In other words, these are the phages that bacteria clones of size k-1 could acquire spacers from to increase to size k, multiplied by the probability of acquiring a particular spacer from those phages. In a model where phages are all identical with a large number of protospacers m, nVk*=nV/m, where nV is the total number of phages and 1/m is the probability of acquiring a particular spacer. If instead phages can only have a single protospacer, nVk* is not known in general, but approximations may be found; for instance, if there are m phage types present and all phage clones are the same size, nVk* is again equal to nV/m.

The corresponding master equation for v, the number of phage clones of size , is:

(12) tv=(F+α(nB0+nBs))[(+1)v+1v]Phage death from chemostat flow and adsorption+αkkbk[n=0BPn((Bn))v(Bn)pV(k,(Bn))vpV(k,)]Phage burst after infecting spacer-containing bacteria (subtracting mutant phages which become new clones)+αnB0pV[n=0BPn((Bn))v(Bn)v]Phage burst after infecting bacteria without spacers+δ,1αB(1eμLp)[k,pV(k,)kbkv+pVnB0v]New phage clones created by mutation from both spacer-containing and naive bacteria

The phage master equation assumes that all new mutants are unique and that they are a type not currently present in the population.

Pn is a binomial distribution giving the probability of n mutant phages per burst (Equation 9).

pV is the probability of phage successfully infecting bacteria, and in general is an unknown function of both k and . All details of immunity are contained in pV; later we discuss certain choices of pV and their implications for the model and population dynamics. pV with no functional arguments is the probability of phage success against bacteria with no spacers, a constant.

Mean-field model results

Total population size

These mean-field equations for total bacteria, bacteria with spacers, phages, and nutrients are the backbone of our analytic model results. Throughout this work, we assume that clone dynamics take place in a background of total populations at steady state; these equations describe the model that we solve to find these steady-state quantities underlying results in ‘Bacteria and phages dynamically coexist and coevolve’, ‘Phages drive stable emergent sequence diversity’, ‘What determines the fitness and establishment of new mutants?’, ‘Cross-reactivity leads to dynamically unique evolutionary states’, and ‘Dynamics are determined by diversity’.

The total number of bacteria with spacers, nBs, is equal to kbk, and the total number of phages, nV, is v and tnV=tv, and summing over k and in Equations 11 and 12 gives mean-field equations:

(13) nBs˙=(gCFr)nBs+α(1pV)ηnB0nVαnBsnVpVa
(14) nV˙=(F+α(nB0+nBs))nV+αB(pVanBsnV+pVnB0nV)

We can also write mean-field equations for the total number of bacteria without spacers, nB0, and the total nutrients, C. The total number of bacteria nB is nBs+nB0.

(15) nB0˙=(gCF)nB0αpVnB0nVα(1pV)ηnB0nV+rnBs
(16) C˙=F(C0C)gCnB
(17) nB˙=(gCF)nBα(pVanBsnV+pVnB0nV)

The probability of phage success against bacteria with spacers, averaged across the whole population, is given by pVa=k,kbkvpV(k,)nBsnV.

pVa is not known in general, but it can be simplified if certain assumptions are made about the population. In particular, we begin by assuming that immunity is all-or-nothing: if a bacterium has a matching spacer to a phage, the phage success probability is reduced to pV(1-e) (where e is the ‘spacer effectiveness’), but if a bacterium has a spacer that is not exactly matching, that spacer confers no immunity and the phage probability of success is pV, the same as against naive bacteria. This amounts to defining pV in terms of the spacer type i and protospacer type j as follows:

(18) pV(i,j)={pV(1e)if i=jpVif ij

If each bacterium and phage can have only one spacer or protospacer, then the number of bacteria or phage with a particular spacer type i is nBi and nVi, respectively, and

pVa=pV(1enBsnVinVinBi)

Equations 13–17 can be solved analytically at steady state if we further assume pVa=pV(1-e/m), where m is the average number of bacterial clones at steady state. This assumption is described in detail in ‘Measuring diversity’. The solution is exactly the mean-field analytic solution described in Bonsma-Fisher et al., 2018 with the simple replacement of the parameter e with e/m.

Phage extinction threshold

This section relates to Figure 1E in ‘Bacteria and phages dynamically coexist and coevolve’.

We reported in our previous work (Bonsma-Fisher et al., 2018) that Equations 13–15–17 experience a change in fixed point at a critical threshold of the phage infection success probability pV: below pV0=1B(gf(1-f)α+1), phages are driven extinct (Figure 1E). This is the same extinction threshold reported by Payne et al., 2018 as the cutoff for achieving herd immunity in a well-mixed bacterial population. To a first approximation, phages must successfully infect every 1/B bacteria they encounter, but if bacteria are growing quickly, then phage must do better to overcome bacterial growth. The correction gives 1B(1+bacterial birth ratephage birth rate) =1B(1+phage generation timebacterial generation time) as reported in Payne et al., 2018. To see this, we note that the bacterial birth rate in our model is gCgC0f when phage population sizes are small, and the phage birth rate in our model is αBpvnBαnB since BpV1. When phage population sizes are small, the total number of bacteria is nBC0(1-f). Combining these expressions, we find bacterial birth ratephage birth ratefgC0α(1-f)C0=fgα(1-f) as in our original expression.

Steady state

Simulation results are calculated and presented at steady state unless otherwise specified. We define steady state to be the simulation run-time divided by 5, choosing simulation run-times so that mean-field quantities have equilibrated by the steady-state time. We run large population simulations longer than small population simulations: these simulations have less frequent interactions on average because of decreased α and take longer to equilibrate (Figure 8).

Total phage, total bacteria (left), and mean number of bacterial clones (right) vs. simulation time for five simulations with μ=10-6, e=0.5, η=0.001, and C0 ranging from 300 (top row) to 30,000 (bottom row).

Total population sizes equilibrate very quickly, but the total number of clones can take longer at large population sizes (high C0). The time constants inset are a measure of how quickly we expect each mean-field quantity to equilibrate: nV time constant is the inverse growth rate of the total phage population (1/(-F-αnB(BpV-1)-αBpVnBse/m)) and the extinction time constant is the mean time to extinction for large phage clones (Equation 171), a measure of the rate of turnover of the number of clones.

We set the total bacterial generations to be 10000(logC0-3), rounded to the nearest thousand, with a minimum of 10,000 generations (see Table 2). The following table lists simulation length and assumed steady-state time tss for different values of C0.

Table 2
Simulation length.
C0Simulation length (bacterial generations)Steady-state start time (tss)
30010,0002000
100010,0002000
300010,0002000
10,00010,0002000
30,00015,0003000
100,00020,0004000
300,00025,0005000
1,000,00030,0006000

Figure 9 shows the mean number of bacterial clones (m) at steady state as a function of the initial number of phage clones. For most parameters, the steady-state m is independent of the initial m. There is a slight dependence on initial m at high C0.

Mean number of bacterial clones after t=tss bacterial generations vs. initial number of phage clones for e=0.8, η=10-3.

The mean is an average of 15 evenly spaced points from t=tss to t=5tss bacterial generations. Error bars are the standard deviation across three or more independent simulations.

Single-clone mean-field dynamics

This section presents mean-field equations for the population size of single bacteria and phage clones. The steady-state bacteria and phage clone sizes (Equations 25 and 26) are, like the mean-field results in ‘Total population size’, used extensively in further analytic results. In particular, predicted mean clone sizes are used as a cutoff to measure the time to extinction for large clones (Appendix 3) and to calculate diversity in ‘Analytic approximations for diversity’ (main text ‘Phages drive stable emergent sequence diversity’). These expressions are also used to calculate the fitness of new phage mutants in Appendix 3.

We can write mean-field equations for individual clones nBi and nVj as well:

(19) nVj˙=(F+α(nB0+nBs))nVj+αBP0pVnB0nVj+αBP0ipV(i,j)nBinVj+αBjkipV(j,k)μΔiknBjnVk
(20) nBi˙=(gCFr)nBiαjpV(i,j)nBinVj+αηnB0nVi(1pV)

If pV(i,j) is binary and all spacers are equally effective (Equation 18), nVj˙ and nBi˙ can be simplified:

(21) nVj˙=(F+αnB)nVj+αBP0pVnVj(nBenBj)+αBjkipV(j,k)μΔiknBjnVk
(22) nBi˙=(gCFr)nBiαpVnBi(nVenVi)+αηnB0nVi(1pV)

In Equations 19 and 21, mutations that arise from phage i decrease nVi, which is captured in that B becomes BP0=BeμL for type i is effectively lower, since mutations go to different phage types.

The last term in Equations 19 and 21 is for mutations that happen in all other phages besides type i that convert type k to type i. Any particular phage k has a certain mutational distance from phage i, Δik, which means Δik-specific mutations must happen to get from type k to type i. This happens with probability μΔik. Going forward we assume that this term is small, which is true if the overall mutation rate αBpVaμnVnB is sufficiently low and/or the space of protospacer types is sufficiently large so that mutations are almost always to new types not present in the population.

We solve the coupled time-dependent system given by Equations 21 and 22 numerically, assuming all other populations (nB, nB0, nV, and C) are at their deterministic steady-state value and ignoring the last term of Equation 21. These solutions are plotted alongside mean clone sizes from a simulation in Figure 10 (‘Numerical deterministic prediction’). The deterministic solution matches the mean clone size well at early times but does not capture the effects of clone extinction at later times. By including only surviving clones in the simulation mean and normalizing the deterministic solution by the predicted fraction of surviving clones (‘Numerical deterministic prediction normalized to survival probability’), the theoretical prediction matches well at early times and can be piecewise-combined with the steady-state mean clone size to give good agreement at all times (see ‘Predicted clone size’ dashed lines in Figure 3—figure supplement 1). We estimate the predicted fraction of surviving phage clones by numerically solving Equation 153 which gives the phage clone probability of extinction in the absence of matching bacterial clones. The fraction of surviving clones is 1-P0(t) where P0(t) is the probability of extinction. Normalizing by 1-P0(t) does not give good agreement at long times because P0(t) goes to 1 as t goes to infinity, causing the solution to diverge at long times. While individual trajectories do eventually go extinct, the mean clone size conditioned on survival reaches the deterministic steady state at long times.

Phage (A) and bacteria (B) mean clone size in a simulation, either conditioned on survival (blue circles) or including extinct clones (orange circles).

Theoretical predictions are plotted as solid lines: the time-dependent numerical solution to Equations 21 and 22 in green, the same solution divided by the phage clone probability of survival in red, and a one-dimensional solution to Equation 21 in (A) and 22 in (B). Equations 25 and 26 are black dashed lines. An alternate simulation mean clone size is plotted for bacteria (brown circles) in which each clone trajectory is stacked based on the bacterial acquisition time and averaged across trajectories, conditioned on survival. Simulation parameters are C0=10000.0, μ=10-5, η=0.001, and e=0.95.

We consider bacteria clones in the background of phage clones; that is, time 0 for a bacterial clone starts when the matching phage clone arises by mutation. This is the case in the numerical solution for the coupled bacteria-phage clone system given by Equations 21 and 22. For this reason, normalizing the bacterial clone size prediction by the phage clone probability of extinction also gives good agreement to the simulation mean conditioned on bacterial clone survival. (If a bacterial clone size is 0 but its corresponding phage clone has not yet gone extinct, that 0 will count in the mean clone size, but after the phage goes extinct, a 0 in the bacterial clone size will not be included.)

We can also solve Equations 21 and 22 analytically by assuming nBi and nVi are constant, respectively; this amounts to removing the dependence of phage clone dynamics on bacteria clone dynamics and vice versa.

If we assume nBi=0, we get the following solution for nVi(t):

(23) nVi(t)=nVi(0)es0t

where s0=αBpVnBe-μL-F-αnB is the average growth rate of phage clones and t is in minutes. This is the same growth rate as is in Equation 156 with a small correction: BBe-μL; the burst of an individual phage clone is reduced by phages that mutate to a new type.

Likewise if we assume that nVi is constant, we get the following solution for nBi(t):

(24) nBi(t)=nBi(0)esBt+δsB(esBt1)

where sB=gC-F-r-αpV(nV-nVi) is the average growth rate for bacterial clone i and δ=αηnB0nVi(1-pV) is the rate of spacer acquisition for that clone.

Equations 23 and 24 are plotted in Figure 10 (‘Analytic 1D prediction’). Equation 23 matches the phage clone size not conditioned on survival at early times, but continues to grow past the steady-state phage clone size without growing bacterial clones to reign it in. Equation 24 is not comparable to the simulation mean with time zero as the time of phage clone birth; rather it should be compared to bacterial clone growth independent of what the phages are doing. Using the steady-state mean phage clone size in Equation 24 (purple line), we obtain a rough correspondence with the simulation mean (brown circles). The prediction does not match at intermediate times, likely because this solution assumes that phage clones are at their mean size when in fact they are likely still smaller.

Equations 21 and 22 can be solved at steady state in terms of the total mean-field variables (ignoring the last term of Equation 21). These solutions (Equations 25 and 26) are indicated by horizontal dashed lines in Figure 10.

(25) nBi=1e(nBF+αnBαBP0pV)
(26) nVi=nBi(αpVnV(gCFr))αηnB0(1pV)+αpVenBi

Simulation methods

This section describes how simulations were performed, shows basic simulation results, describes parameter choices (‘Parameter values’), and explores the parameter dependence of stochastic population extinction (‘Stochastic population extinction’) that relates to results in ‘Bacteria and phages dynamically coexist and coevolve’.

Simulations were written in Python and performed on the Béluga and Niagara supercomputers at the SciNet HPC Consortium (Ponce et al., 2019) and on a local server in our group. Our simulation code can be found at https://github.com/mbonsma/CRISPR-dynamics-model (copy archived at Bonsma-Fisher and Goyal, 2023). We used the tau-leaping method, a fast approximation for Gillespie simulation Cao et al., 2006 and compared with full Gillespie simulations for some cases. Gillespie and tau-leaping methods showed similar dynamics and good agreement for total population sizes (Figure 11), total spacer diversity (Figure 12), mean phage and bacteria clone sizes (Figure 13), and produced the same qualitative behaviour for individual spacer types (Figure 14).

Total phage (nV) and total bacteria (nB) as a function of time for a Gillespie simulation and a tau-leaping simulation with the same parameters.

Total phage is shown at early times (A) and late times (B), and total bacteria at early times (C) and late times (D). The simulation parameters are C0=104, μ=10-6, η=0.001, and e=0.95. Early time dynamics differ slightly in this example, but the long-time behaviour and steady-state values are similar.

Number of bacterial clones (m) vs. simulation time for three sets of simulations, each beginning with 1, 10, or 50 phage clones.

Gillespie simulations are dashed blue lines, and tau-leaping simulations are solid orange and red lines. The simulation parameters are C0=104, μ=10-6, η=0.001, and e=0.95.

Average population size of phage clones (solid lines) and bacterial clones (dashed lines) in a Gillespie and tau-leaping simulation with the same parameters.

The simulation parameters are C0=104, μ=10-6, η=0.001, and e=0.95.

10 spacer trajectories for a Gillespie simulation (A) and tau-leaping simulation (B).

The first 10 trajectories that surpass nVi=1000 are shown. The simulation parameters are C0=104, μ=10-6, η=0.001, and e=0.95.

We initialized five simulations for each parameter combination for a total of approximately 19,800 simulations, not including simulations with cross-reactivity. A small subset of simulation parameters were run six times instead of five. Not all simulations were successfully completed either due to errors while running or because their running time was exceedingly long. Of the initialized simulations, approximately 10,000 were completed and included in analysis. Simulations with very low μ tended to either have no phage establishments (at low C0) or very long running times (at high C0), and simulations with very high μ tended to have higher diversity and longer running times in general. Unless otherwise noted, plots with simulation averages are an average across three to six simulations with the same parameters. We set simulations to run for a fixed number of bacterial generations intended to be long enough to allow the system to reach steady state and remain there for a long time to generate statistics (‘Steady state’).

We model spacers and protospacers as a binary sequence of length L as in Han et al., 2013; Han and Deem, 2017. When a phage reproduces and creates a burst of B phages, we draw BL numbers from a binomial distribution with probability μ. Successes in this draw designate bits that will mutate (flip) in the newly created phages. It is possible but very rare for more than one mutation to happen in the same new phage: this occurs with probability 1-e-μL-μLe-μL(μL)2 whereas a single mutation occurs with probability μLe-μLμL. Multiple mutations occur approximately μL times as often as single mutations; for the largest value of μ we use, they represent 0.3% of events.

Our simulation code can be found on GitHub.

Parameter values

Parameter values are as above unless otherwise indicated. Representative values estimated for S. thermophilus bacteria in lab conditions.

Parameter descriptions and values are listed in Table 3. The burst size for phage that target S. thermophilus has been measured at between 140 and 200 (Lucchini, 1999) and 80 for phage 2972 (Levin et al., 2013). We use a burst size of 170. (Vaningelgem et al., 2004) measured the maximum growth rate of S. thermophilus in milk at 42°C to be 2.4×10-2 min-1. This corresponds to gC0 in our model; we choose g based on C0 so that gC0=2.4×10-2 min-1.

Table 3
Model parameters.
ParameterDescriptionValue
1gC0Bacterial doubling time41.7 min
C0Inflow nutrient concentration in102 to 10-6
Units of bacterial cell density
αPhage adsorption rate2×10-2/C0
BPhage burst size170
FChemostat flow rate0.3gC0
pVProbability of phage success0.02
for bacteria without spacers
eSpacer effectiveness0.1 to 0.95
rRate of spacer loss0.04gC0
ηProbability of spacer acquisition10-5 to 10-2
μPhage mutation rate per base per generation10-8 to 10-4
LPhage protospacer length in nucleotides30
  1. Parameter values are as above unless otherwise indicated. Representative values estimated for Streptococcus thermophilus bacteria in lab conditions.

The parameter α in our model is the phage adsorption rate constant divided by the culture volume. The rate of adsorption for phage is of the order of 10-8 min-1 ml Delbrück, 1940; this is similar to the values used in Levin et al., 2013 and Weissman et al., 2018. In order to explore regimes of different total population sizes, we decrease α as we increase C0 in order to maintain stable coexistence between bacteria and phage; this is equivalent to decreasing the culture volume as C0 decreases. For example, if C0=105, we set α=2×10-2/C0=2×10-7, implying a culture volume of approximately 50μl.

Levin et al., 2013 estimated the frequency of phage mutants that escape CRISPR targeting by S. thermophilus to be between 5×10-7 to 5×10-5. These measurements are the fractions of phages from lysate that can evade CRISPR targeting of different unique spacers. This is analogous to μL in our model, which is the probability of a mutation occurring in a newly burst phage. We use values of μ between 10-8 and 10-4 corresponding to μL between 3×10-7 and 3×10-3 to encompass the physiological range of phage mutation rates as well as unusually high mutation rates.

We generally fix the probability of successful phage infection against naive bacteria at pV=0.02 (though pV is varied in Figure 1E and in Bonsma-Fisher et al., 2018). This is consistent with the value of 10-2 used in Doekes et al., 2021 and Berngruber et al., 2013. Several other models of CRISPR immunity have assumed phage success probability to be much higher, typically close to 1 (Childs et al., 2014; Childs et al., 2012; Weissman et al., 2018). Increasing pV does not qualitatively alter our results beyond changing the relative population sizes of phage and bacteria; at high values of pV bacteria population sizes are small and they become more likely to experience stochastic extinction (see Figure 1E). A low value of pV can be taken to reflect the presence or effectiveness of other anti-phage defence systems.

The rate of spacer loss in our model, r, can be thought of as a phenomenological parameter since the true rate of spacer loss is not well understood (Weissman et al., 2018). For comparison, Jiang et al. estimate the rate of loss of function of the entire CRISPR system in S. epidermis at 10-4 to 10-3 per individual per generation (Jiang et al., 2013). We rescale the parameter r as a function of bacterial growth rate, which means that the rescaled parameter R=r/(gC0)=0.04 is constant per bacterial generation. Our loss rate is an order of magnitude higher than the rate in Jiang et al., 2013.

We vary the parameters η and e to explore different 'strengths' of CRISPR immunity. The parameter e is spacer effectiveness: when e=0, spacers provide no immunity and bacteria with spacers are functionally no different from bacteria without spacers. When e=1, a spacer-containing bacterium that encounters a phage with a matching protospacer is guaranteed to survive. We vary e between 0.1 and 0.5 to explore different regimes of CRISPR effectiveness. The parameter η is the probability that a naive bacterium will acquire a spacer if it is infected but not killed by a phage. Rates of naive spacer acquisition vary widely, and we vary η between 10-5 and 10-2, with the value of η constant within a simulation. For comparison with measured acquisition rates, Pyenson et al. measured naive acquisition in Staphylococcus aureus to be approximately 10-6 to 10-7 (Pyenson and Marraffini, 2020). Acquisition rates may hundreds of times higher in primed acquisition (Staals et al., 2016), and Heler et al. measured four orders of magnitude difference in spacer abundances shortly after infection, likely a result of differences in acquisition rate (Heler et al., 2019).

The flow rate F was picked in order to get a stable fixed point where phage and bacteria coexist. Stability conditions approximately correspond to those derived in our previous work (Bonsma-Fisher et al., 2018).

Stochastic population extinction

This section relates to results in 'Bacteria and phages dynamically coexist and coevolve'.

Even when bacteria and phage are within the deterministic coexistence regime, populations may experience stochastic extinction (main text Figure 1F). Simulations are run for a fixed number of generations (see 'Steady state') or until bacteria or phage go extinct. Population extinction on the timescale of our simulation run-time is restricted to low values of C0 (no phage population extinctions happen for C0>3000, and no bacteria extinctions happen for C0>300). Phages are prone to population extinction at higher population sizes than bacteria because of their large burst size which makes their dynamics more noisy (see 'Long-time approximation for P0(t)').

Extinction probability is strongly related to total population size, with smaller populations being more likely to go extinct across all parameters (Figure 1—figure supplement 1 and Figure 1—figure supplement 2). However, there are differences that depend on parameters and initial conditions. First, high spacer effectiveness increases the likelihood of phage extinction and decreases the likelihood of bacterial extinction (Figure 1—figure supplement 1). High spacer effectiveness correspondingly increases the time to extinction for bacteria (Figure 1—figure supplement 4) and decreases the time to extinction for phages (Figure 1—figure supplement 3). Bacteria also appear to be less likely to go extinct at high values of η (Figure 1—figure supplement 4). Phages have longer times to extinction and lower extinction probability at high values of μ (Figure 1—figure supplement 3). And finally, phages have a longer time to extinction if the initial number of phage clones (minit) is high (Figure 1—figure supplement 6), but this effect disappears at low η (Figure 1—figure supplement 5), perhaps because bacteria do not acquire spacers quickly enough for the initial phage diversity to make a difference before the number of clones equilibrates, which happens rapidly at low total population size (see upper rows of Figure 8).

Simulation results

Measuring diversity

This section describes in detail our method for measuring diversity in simulations, relating to results in 'Phages drive stable emergent sequence diversity'. It provides justification for matching the number of large phage clones to the number of bacteria clones and discusses the assumptions made and where they may break down.

A measure for the overall sequence diversity in the population is the total number of unique clones. In general, the bacterial diversity and phage diversity need not be the same, but it turns out that the number of bacterial clones (which we call m) closely tracks the number of 'large' phage clones across a wide range of parameters.

To measure the number of large phage clones in a simulation, we scale the observed phage clone size distribution by the probability of extinction for each clone size below a size cutoff given by Equation 115. We multiply each observed number of clones of size n by the probability of extinction (Equation 159) to the power of the clone size n (Equation 27): this gives a scaling factor for each clone size that predicts how many clones of that size will survive at long times (Equation 28). Pnlarge is plotted alongside the full normalized histogram of clone sizes for one simulation in Figure 15.

(27) 1P0(n)=1(12s0B(s0+δ0))n
(28) Pnlarge={Pn(1P0(n))if n<NestPnif nNest
Phage clone size distribution from 15 combined time points for a simulation with the parameters C0=104, e=0.95, η=0.01, and μ=3×10-6.

The blue points are the values of the full normalized phage clone size histogram with a bin width of 1500. The orange line is given by Pnlarge in Equation 28 smoothed with a running average of window size 3000. Both distributions are scaled by the total number of phage clones.

We then calculate the number of large phage clones at steady state as Ntotal×nPnlarge, the total number of phage clones multiplied by the fraction of clones that survive for long times. Note that Pnlarge is not normalized; it is a quantity 1 representing the proportion of the clone size distribution that corresponds to large clones. The predicted number of large phage clones is plotted against the observed mean number of bacterial clones in Figure 22. There is extremely good agreement at all parameters except the lowest value of η, η=10-5, where the estimated number of large phage clones tends to be larger than m. In this regime, phage clones go extinct because of clonal interference before bacteria are able to acquire spacers (Figure 16 and Figure 17). At low η and high μ, phage clone fitness declines more rapidly with less influence from bacterial spacer acquisition than at high η and moderate μ: Figure 18 shows the ratio of phage clone initial fitness to phage clone fitness at the mean bacteria spacer acquisition time, measured from simulations, vs. η/μ. At most parameters, however, phage clone fitness has not changed much by the time bacteria are acquiring spacers.

Mean phage clone size at the time of first spacer acquisition vs. the deterministic mean phage clone size.

For each phage clone trajectory, the clone size at the time of first spacer acquisition is recorded and these are averaged across each simulation. Error bars are the standard deviation across three or more independent simulations.

Mean time to extinction for phage clones vs. the timescale of bacteria spacer acquisition given by 1/D where D=αη(1-pV)nVi*nB0gC0.

Points outlined in red are simulations where the ratio of large phage clones to bacterial clones exceeds 1.2. Phage clones experience clonal interference at low η: they go extinct faster than bacteria acquire spacers.

Ratio of average phage clone initial fitness to the average phage clone fitness at the mean bacterial spacer acquisition time vs η/μ.

Points outlined in red are simulations where the ratio of large phage clones to bacterial clones exceeds 1.2. The phage fitness is the average per capita growth rate of phage clones conditioned on survival. Phage clones experience clonal interference at low η and high μ.

There is an η-dependent ‘floor’ on the minimum size a phage clone must be before bacteria begin to acquire spacers (Figure 16). We can estimate the size of a phage clone at the time of first spacer acquisition by calculating the mean time of acquisition from an exponentially growing population. Let n(t) be the size of a phage clone relative to the time of mutation. We assume phage clones grow exponentially with rate s0 at early times, where s0=αnB(BpV-1)-F is the average initial growth rate of phage clones: n(t)=es0t.

The probability that no acquisitions have happened by time t is given by P0 in a Poisson process. Let a=αη(1-pV)nB0 be the rate of spacer acquisition (see Equation 15), then the probability of no acquisitions by time t is P0(t)=e-a0tn(t)dt=e-as0(es0t-1). Now, the probability that an acquisition happens between time t and t+dt is then

(29) P(t)=P0(t)an(t)=aes0teas0(es0t1)

Equation 29 is shown as a function of t for four simulations with different values of η in Figure 19. The probability has a sharp peak: there is a particular time related to phage clone growth at which the first acquisition is most likely — intuitively, this is what leads to the appearance of a sharp floor in phage clone size at first acquisition. Interestingly, the mean time of first acquisition is non-monotonic in η: the mean time is smallest for the lowest and highest values of η.

Probability of the first spacer acquisition happening at time t for four simulations with different values of η and C0=104,μ=10-5, and e=0.95.

The mean of each distribution is shown as a vertical dashed line.

We calculate the mean time from Equation 29:

(30) t=0aes0teas0(es0t1)tdt
(31) t=1s0eas0Γ(0,as0)

where Γ(0,as0)=as0e-ttdt is the incomplete gamma function. We can plug this time into n(t)=es0t to estimate the mean phage clone size at first acquisition; the measured clone size is plotted as a function of this prediction in Figures 20 and 21. The clone size depends on the phage growth rate and bacterial acquisition rate, which are themselves primarily dependent on the total bacteria population size and the acquisition probability parameter η. If a<<s0 (true for our parameters), then eas01 and Γ(0,as0)>1, meaning that the time of first acquisition is larger than 1/s0. (Γ(0,as0)>1 for as00.25.) Since 1/s0 is the approximate time at which phages are safe from stochastic extinction due to drift (see ‘Clone fitness’), this means that phage clones are out of the stochastic extinction regime by the time bacteria begin to acquire spacers.

Mean phage clone size at time of first spacer acquisition for simulation data, the predicted with Equation 31, and the prediction with the mode of the distribution given by Equation 29 for four simulations with different values of η and C0=104,μ=10-5, and e=0.95.
Measured mean phage clone size at the time of first spacer acquisition vs. the prediction given by es0t of Equation 31.

Error bars are the standard deviation across three or more independent simulations.

The definition of large phage clones we just described depends on the measured simulation distribution of phage clone sizes. We can instead approximate the number of large phage clones as the product of the rate at which phage clones become established and the rate at which large phage clones go extinct. This statement is summarized in Equation 32. Numerically solving this equation for m gives a prediction for the total number of bacterial clones and total number of large phage clones.

(32) m=[2s0B(s0+δ0)]phage establishment fractionαB(1eμL)pVnVnB(1eνm)1gC0phage mutation rate2nVi(1lnnVinV)gC0(B1)2β+δlarge phage clone time to extinction

s0=β0(B-1)-δ0=αnB(BpV-1)-F

nVi= deterministic mean phage clone size at steady state

β0=nBαpV

δ0=F+αnB(1-pV)

β=nBαpV-αpVenBi

δ=F+αnB(1-pV)+αpVenBi

We now describe each of the three terms in Equation 32. The phage establishment rate is the phage mutation rate multiplied by the fraction of new phage mutants which become established. Derivation of the phage establishment fraction and phage time to extinction can be found in ‘Long-time approximation for P0(t)’ (Equation 158). Derivation of the mean time to extinction for large phage clones can be found in ‘Neutral time to extinction from backward master equation’ (Equation 171).

The phage mutation rate is the mean-field phage reproduction rate αBnV(pVanBs+pVnB0) multiplied by the probability of one or more mutations per burst (1-e-μL).

We assume that pVa=pV(1-e/m), which is true if all clones are equal in size (i.e., nVi=nV/m and nBi=nBs/m) or if the deviations from equal size are uncorrelated between matching bacteria and phage clones. This assumption means that the average immunity in the population is approximately e/m which is accurate across a wide range of parameters: Figure 22 and Figure 23 compare the full average immunity (Equation 33) with e/m, where m is the number of bacterial clones present at a given time point in a simulation. The assumption breaks down at low η and high μ (points below the line in the upper right). Intuitively this happens when the sizes of matching clones become anti-correlated because bacteria acquire few spacers while phages acquire many mutations. The resulting matching pairs have more mismatched clone sizes than the mean number of bacterial clones would suggest. (In this particular case, phage clones are smaller than nV/m.) Conversely, at large population sizes and large spacer acquisition rates, matching clone sizes become correlated, leading to average immunity >e/m. Figure 24 shows clone size distributions for four simulations with increasing η, showing that as η increases the phage clone distribution becomes more narrow and the matching clone pairs become more correlated.

(33) Average immunity=1i,jnBinVjpV(i,j)pVi,jnBinVj=enBsnVinVinBi
Mean number of large phage clones vs. mean number of bacterial clones in simulations.

For each simulation, we take a subset of 15 evenly spaced timepoints at steady state and calculate the size and number of phage clones present. We scale the observed clone sized distribution with Equation 28 and calculate the mean number of large phage clones by multiplying the total number of clones with the fraction of large phage clones given by nPnlarge. We use the simulation mean total population sizes to calculate s0 and δ0 in Equation 27. We obtain the mean number of bacterial clones by averaging the number of clones present at 15 evenly spaced timepoints at steady state. Error bars are the standard deviation across three or more independent simulations.

Effective e=enBsnVinVinBi vs e/m across all simulations where m1 on average.

Error bars are the standard deviation across three or more independent simulations. The solid black line is y=x.

Four simulations with C0=10000, μ=10-5, and e=0.95.

From top to bottom, η increases by a factor of 10 in each row, from η=10-5 in the top row to η=10-2 in the bottom row. The first two columns show clone size distributions combined from 15 time points between 2000 and 10,000 bacterial generations. Bacteria are in the left column and phages in the middle column. The third column shows the pairwise clone sizes of matching clones at the last sampled time point (9467 generations). The expected large phage clone size is the total phage population divided by the mean number of bacterial clones. The inverse overlap is eeeff=nBsnVinVinBi, which we assume is m as shown in Figure 23. The dashed line in the third column indicates the line that clone size pairs would fall on if they were perfectly correlated, and the red star indicates the mean large clone size for phages and the mean clone size for bacteria.

For several parameter combinations, there is no m that satisfies Equation 32. This generally occurs for parameters resulting in small total population sizes. In these cases, no solution is found because the predicted mutation rate and/or the predicted establishment fraction and/or the predicted mean time to extinction for large clones are too low (Figure 25).

Measured simulation phage clone mutation rate, establishment fraction, and mean time to extinction as a function of the theoretical prediction for each.

Highlighted in grey are parameter combinations for which no theoretically predicted m could be determined; the predicted quantity is instead calculated with the simulation mean m. Error bar are the stnadard deviation across three or more independent simulations.

Analytic approximations for diversity

This section describes how we approximate our measurement of diversity to arrive at Equation 87, an analytic approximation for the number of bacterial clones at steady state. The first sections describe approximations for each of the components of diversity; the combination is summarized in ‘Approximation for m’. This is the mathematical background for results in ‘Phages drive stable emergent sequence diversity’.

We want to understand more intuitively how diversity depends on system parameters. Here we describe analytic approximations for diversity m and its components. Equation 32 contains m both explicitly and implicitly, so we approximate each of the three components to arrive at an analytic approximation for m.

We can make some initial simplifications by cancelling terms (note that s0+δ0=β0(B-1) and β0=nBαpV):

(34) m=(2s0B1)(1eμL)nV(1eνm)2nVi(1lnnVinV)(B1)2β+δ

Now expanding s0 and the denominator and cancelling and collecting terms:

(35) m=4(αnB(BpV1)F)(1eμL)nVnVi(1lnnVinV)(B1)(B(B2)β+αnB+F)(1eνm)

In theory, nVinV/m and nBinBs/m. To see this, we start with the solutions for the deterministic mean bacteria and phage clone sizes (Equations 25 and 26, reprinted here):

(36) nBi=1e(nBF+αnBαBP0pV)
(37) nVi=nBi(αpVnV(gCFr))αηnB0(1pV)+αpVenBi

From the deterministic mean-field equation for nV (Equation 14), we have that F+αnB=αBpVnB(1-eνm). Substituting this in Equation 36, we get

(38) nBi=1e(nBαBpVnB(1eνm)αBP0pV)=nBe(1(1eνm)P0)

We can neglect P0, both because it is always close to 1 and because, at steady state, the effect of mutants leaving type i is partially balanced by mutants entering type i.

(39) nBinBνm=nBsm

Similarly, from the deterministic mean-field equation for nBs (Equation 13), we have gC-F-r=αnVpV(1-em)-αnV(1-pV)ηnB0nBs. Substituting this in Equation 37, we get

(40) nVi=nBi(αpVnV(αnVpV(1em)αnV(1pV)ηnB0nBs))αηnB0(1pV)+αpVenBi

Substituting nB0=nB(1-ν) and nBs=nBν, we find nVi*=nV/m.

Replacing nVi and nBi with nV/m and nBs/m, respectively:

(41) m=4(αnB(BpV1)F)(1eμL)nV2m(1ln1m)(B1)(B(B2)αpVnB(1eνm)+αnB+F)(1eνm)

Now we want to solve Equation 41 for m. This is difficult because the total population sizes nB, nV, C, and ν also depend on m. To find the m dependence of nV, nB, C, and ν, we note that we can approximate the steady-state solutions to Equations 13–17 as the corresponding solutions described in Bonsma-Fisher et al., 2018 with the simple replacement of the parameter e with e/m. This is because e/m is a good approximation for the full average immunity at most parameters (Figure 23), and average immunity enters these equations in place of e in the model without phage mutations. This is analogous to average adaptive immunity replacing the rate of innate immunity in the Lotka-Volterra system described by Iranzo et al., 2013.

We write the steady-state solutions for Equations 14 and 17 below, where we have defined pV(i,j) as in Equation 18 and approximated average immunity as e/m. For simplicity we rescale nB and nV by C0: nV=C0y* and nB=C0x*. Here p=pVα/g. These steady-state solutions are shown in Figure 26.

Total phage, total bacteria, and fraction of bacteria with spacers as a function of average immunity (effective e).

Blue points are simulation results. Error bars are standard deviation across three or more independent simulations. The solid black line is the solution given by Equations 42–44 (from Bonsma-Fisher et al., 2018) with the parameter e replaced by effective e. The horizontal grey dashed line corresponds to the no-CRISPR (e=0) mean-field solution (derived in Bonsma-Fisher et al., 2018).

At low average immunity (high diversity), the population behaves as if there were no CRISPR system at all. Figure 26 shows total phage, total bacteria, and the fraction of bacteria with spacers as a function of average immunity (effective e). At low average immunity, total population sizes are the same as in a population without CRISPR entirely. Interestingly, the effective no-CRISPR case does not necessarily correspond to no CRISPR spacers: ν>0 even when effective e0 for these parameters. This is because spacer acquisition and growth of that clone can still happen even if the spacer confers no fitness benefit.

(42) x=fpVp1BpV(1eνm)1
(43) y=(f1)p(BpV(eνm1)+1)fpVp(eνm1)(p(BpV(eνm1)+1)pV)

We have defined x* and y* in terms of ν*, which is given in the following implicit cubic equation, where R=r/(gC0):

(44) 0=(1ν)[pVeνmη(1pV)][(1f)p(pVB(1eνm)1)fpV]+RνpV(1eνm)(BppV(1eνm)p+pV)

This cubic equation is analytically solvable (ignoring the dependence of m on ν), but the full solutions in terms of all parameters are cumbersome.

Only one of the three solutions of Equation 44 is physical in the parameter range we use (real-valued and properly bounded):

(45) ν=(1+i3)(27a2d+9abc2b3)2+4(3acb2)327a2d+9abc2b33623a+(1i3)(3acb2)3 22/3a(27a2d+9abc2b3)2+4(3acb2)327a2d+9abc2b33b3a

where the coefficients are

(46) a=B(em)2fppV2(f+R1)
(47) b=emfpV(p(f(B(pV(em+η+1)η)1)+B(ηpV(em+η2R+1))R+1)+pV(f+R))
(48) c=fp[BpV2(em(f1)(η+1)+(f1)η+R)(em1)(f1)pV(Bη+1)(2B+2)(f1)ηpV+(f1)ηpV(f+R1)]+fpV(emfpVfη+pV(fη+R))
(49) d=fη(pV1)((f1)p(BpV1)+fpV)

Dominant balance approximations for ν

The cubic equation for ν given by Equation 44 can be simplified in certain parameter regimes using dominant balance. We write Equation 44 as aν4+bν2+cν+d=0 with the coefficients as written above. Comparing the numerical values of the coefficients in different parameter regimes, we arrive at Table 4 which outlines three different approximations for ν obtained by dropping coefficients of the original cubic equation. Broadly speaking, at low η and low e, both the cubic and quadratic coefficients (a and b) are small, so ν*-d/c, while at high η and low e the cubic coefficient (a) can be dropped, and at high e and low η the constant d can be dropped. Figures 2729 show total population sizes as a function of the three approximations in Table 4.

Table 4
ν approximations.
ηeeffective
≤0.010.01 to 0.050.05 to 0.10.1 to 0.50.5 to 1
10-5-dc-dc-dc-b+b2-4ac2a-b+b2-4ac2a
10-5 to10-4-dc-dc-c+c2-4bd2b-b+b2-4ac2a
10-4 to 10-2-dc-dc-c+c2-4bd2b
10-2-dc-c+c2-4bd2b-c+c2-4bd2b
  1. Note: the drop-a solution also works for below e=0.1 for all values of η (since 2bdc20), but the -d/c solution is simpler and so preferred for the very low e range.

nV, nB, and ν vs. η for e=0.05, approximating ν-d/c.
nV, nB, and ν vs η for e=0.15, approximating ν-c+c2-4bd2b.
nV, nB, and ν vs. η for e=0.8, approximating ν-b+b2-4ac2a.

For small effective e and small η, ν-d/c:

(50) νη(1pV)(α(1f)(BpV1)fg)η(1pV)(α(1f)(B(em+1)pV1)fg)+αpV(BpV1)(Rem(1f))+gpV(emf+R)

We define two combined parameters: A=(BpV-1)(1-f)αfg, and η^=η(1-pV). Each of these has an intuitive meaning: A>1 is the stability condition for phage existence, and η^ is the probability of spacer acquisition following escape from naive phage predation, so it can be thought of as the overall spacer acquisition probability at the start of an infection. With these variable substitutions, we get

(51) νη^(A1)pVR(1f+Aff(1f))+η^(A1+ABpVem(A1)(BpV1))empV(A1)

Now we notice that the deterministic total phage population size when e=0 is given by

(52) nVno CRISPR=n~V=gC0f(1f)(A1)αpV(1f+Af)

We also include nB for e=0 for completeness:

(53) nBno CRISPR=n~B=C0(1f)A

We can replace some terms in ν with n~V by comparison with Equation 52:

(54) ν11+rη^αn~Vem(pVη^ABpV(A1)(BpV1))

where r=RgC0 is the spacer loss rate per minute. The second term in the denominator is the balance of spacer acquisition and spacer loss per naive bacterium: η^αn~V is the rate of spacer acquisition per naive bacterium in the low average immunity limit.

The third term in the denominator is negative for the parameters we use, but this is not necessarily true in general. It is negative if η^AB<(A1)(BpV1), which for the values of A and BpV we use (fixed across all simulations) is true for η<0.0113.

From this expression for ν we learn the following:

  • ν decreases if r goes up. More spacer loss means a smaller fraction of bacteria with spacers.

  • ν increases if spacer acquisition goes up (provided e/m is small).

  • ν increases if em increases: higher CRISPR effectiveness means higher ν.

For e/m0, we define ν~:

(55) ν~=11+rη^αn~V

Equation 54 expression breaks down at high η where the true value of ν is largely independent of e. In this case, when both em is small and η^ is large, we can drop the third term in the denominator in Equation 54.

There is a discontinuity in Equation 54 at the value of e/m where the denominator equals zero. This critical value of e/m, em*, is given by Equation 56.

(56) em=(A1)(BpV1)(αη^n~V+r)αn~VpV((A1)(BpV1)ABη^)

Taking a series expansion in η^ and keeping the first two terms:

(57) emrαn~VpV+η^pV+ABrη^αn~VpV(A1)(BpV1)

At high η, the middle term dominates, and at small η, the first two terms dominate (since the third term is 9.8η^ and 1/pV=50). This transition governs the change between the low e and high e regimes; Equation 58 is plotted in Figure 3—figure supplement 2 and Figure 3—figure supplement 3.

(58) emrαn~VpV+η^pV

If we substitute this critical value of e/m into the mean-field equations and take η to 0, we recover the no-CRISPR mean-field solutions for nV, nB, and C. Since this also corresponds to ν=0, this seems to explain why several simulation values of the phage establishment probability go to 0 around effective e=0.1 for small η and small C0 (Figure 30). Despite the presence of spacers with non-zero effectiveness, the population appears to behave as if there is no CRISPR near this point, which removes the fitness advantage of new phage mutants and lowers their establishment probability. We also see large fluctuations in total bacteria population size near this critical point (Figure 31).

Fraction of bacteria with spacers ν vs. effective e.

The solid line is the full numerical theoretical solution. The dashed black line is given by Equation 58. Error bars are the standard deviation across three or more independent simulations.

Population standard deviation divided by population mean as a function of the distance from the average immunity critical point given by Equation 58.

Insets show a smaller x-axis range for the same quantities. Total phage (top left), total bacteria (top right), fraction of bacteria with spacers (bottom left) and total nutrients (bottom right) are plotted. X error bars are the standard deviation across three or more independent simulations.

Another way to understand the critical point is to look at the difference in relative fitness between bacteria with and without spacers, nBs and nB0:

(59) nBs˙nBsnB0˙nB0=r(1+nBsnB0)+αnVpVeeff+α(1pV)ηnV(nB0nBs1)

Now if effective e is given by Equation 58:

(60) nBs˙nBsnB0˙nB0=rnBsnB0+α(1pV)ηnVnB0nBs

Equation 60 indicates that at the critical value of effective e, the fitness difference is exactly given by the balance of spacer acquisition and spacer loss. Below this critical point, there are additional constant negative terms, and for the same values of nBs and nB0, the relative fitness of spacer-containing bacteria is lower than naive bacteria; above this point, it is higher. At the critical point, the immunity boost from spacers exactly cancels out spacer loss. It is interesting that for the same reasons that many parameters collapse onto the same curves as a function of average immunity, the value of this critical point is largely independent of the parameters of CRISPR immunity and occurs at a value of effective e10-1 regardless of μ, e, or C0.

Large e approximation for ν

In the limit of e1 and small η, we find the following approximate solution for ν by setting e=1 and taking a series expansion in η.

(61) να(BpV1)(f+R1)+g(f+R)αBpV(f+R1)+gRη^pV(f+R1)(α(BpV1)(f+R1)+g(f+R))

Equation 61 is plotted in Figure 3—figure supplement 2 in red. The second term proportional to η^ is tiny compared to the first for all values of η we use.

Approximation for Text

If we evaluate the mean phage time to extinction (‘Neutral time to extinction from backward master equation’) at the deterministic mean phage clone size, we can approximate nVinVm and nBinBsm as before:

(62) Text2nVm(1+lnm)gC0B(B2)αpVnB(1νem)+F+αnB

We expand nV and nB in powers of e/m:

(63) nV=gC0f(1f)(A1)αpV(1f+Af)+eνmC0[fgαpV1fpV(1f+Af)+ABf(1f)(1f+Af)2]+O(e/m)2
(64) nB=C0(1f)A+eνm[BFpVα(BpV1)2]+O(e/m)2

Note that ν never appears without e/m beside it in these expressions, so we will use the zeroth-order expression for ν (Equation 55) which still allows us to keep e/m to first order elsewhere.

(65) ν~=11+rη^αn~V+O(e/m)

Substituting Equations 63–65 into Text:

(66) Text2(1+lnm)f(B1)n~Vm(11BpV)+em2(1+lnm)[2C02fgη^(1A)((BpV2)(gA2fg1f)+2gA1f(1+f(BpV2)))αpV2B(B1)(1+Af1f)2(η^gfC0(1A)pVr(1+Af1f))]

The second term is order em2, which we drop, giving the following approximation for the mean phage time to extinction (plotted in Figure 32):

(67) Text2(1+lnm)f(B1)n~Vm(11BpV)
Approximate phage time to extinction vs. numerically calculated theoretical time to extinction, where the approximate time to extinction is given by (Koskella, 2014).

The full theoretical predicted value of m is used in the approximate expression.

Here again we are using the phage population size independent of CRISPR (n~V). Notably this is independent of η and e (except implicitly in m) but is still extremely close to the full theoretical quantity.

From Equation 67 we learn:

  • Text is insensitive to the changes in total phage population size caused by changing CRISPR immunity.

  • Text increases if the phage clone size n~Vm increases.

  • BpV is the effective burst size: for every phage adsorption, BpV new phages emerge on average. This can be thought of as the phage growth rate. 1-1BpV lies between 0 and 1; if BpV is larger, Text is also larger.

  • The burst size B also appears alone in the denominator (B-1). A larger burst size means larger fluctuations: for larger B, more deaths tend to happen between birth events at steady-state, so extinction happens sooner on average (shorter Text from B-1 in the denominator).

  • The normalized chemostat flow rate f lies between 0 and 1 and is a death rate for phages; if f is larger Text is smaller.

The 1+lnm term is more difficult to parse. A hand-wavy intuition is that while larger m sometimes means smaller clone size and shorter extinction times (n~Vm), it is also related to increased phage success in the population, and so the 1+lnm in the numerator softens the straight inverse dependence on m through clone size.

Approximation for Pest

Now we approximate the probability of phage clone establishment.

(68) Pest=2s0B(s0+δ0)

Recall that s0=β0(B-1)-δ0=αnB(BpV-1)-F, β0=nBαpV, and δ0=F+αnB(1-pV). These all depend on nB, so plugging in Equation 42 for nB, we get

(69) Pest=2eνm(B1)

Equation 69 with ν given by Equation 54 is plotted in blue in Figure 3—figure supplement 2. The same expression with e=0 is plotted in green. There is a discontinuity in ν~ when the third term in the denominator becomes large, which is why the blue lines do not extend to large effective e. These two approximations bound the full solution at low effective e.

Approximation for phage mutation rate

The effective phage mutation rate (new phages per minute, μ¯) is given by Equation 70. In general, μ¯ depends both explicitly on e and m and implicitly through nV, nB, and ν. We do the same small e/m approximation for μ¯ as above, where we expand nV, nB, and ν in e/m (nVn~V+emnV1, etc.). The zeroth order approximation (e/m=0) does not change μ¯ much at all across the whole range of parameters we investigated (Equation 71 and Figure 33).

(70) μ¯=αB(1eμL)pVnVnB(1eνm)1gC0
(71) μ¯=αB(1eμL)pVgC0[n~Vn~B+em(nV1n~B+n~VnB1ν~n~Vn~B)]+O(e/m)2
Approximate phage mutation rate for e=0 vs theoretical phage mutation rate.

Approximation for m

Finally, we do the same approximation for the complete expression for m, expanding all variables in powers of e/m. The result turns out to be the same as if we combined the individual approximations shown above.

(72) m=2eνm(B1)phage establishment fractionαB(1eμL)pVnVnB(1νem)phage mutation rate2nVm(1+lnm)B(B2)αpVnB(1νem)+F+αnBlarge phage clone time to extinction
(73) m=4(1eμL)ν~n~V2(B1)2e(1+lnm)m2+4n~V(1eμL)(BC0fgpV(2(B1)ν~nV1+(B1)ν1n~V+ν~2(n~V))+αnB1ν~n~V(BpV1)2)(B1)3BC0fgpVe2(1+ln(m))m3+O(e/m)3

Let’s look at the first term by itself, neglecting (e/m)2 and higher. Rearranging to collect m terms:

(74) m3(1+lnm)=4e(1eμL)ν~n~V2(B1)2

Let us let the right-hand side of Equation 74 equal a, a new parameter that is independent of m. Now we are approximating:

(75) m3=a(1+lnm)

Changing variables to

(76) z=1+lna3+lnz3

We solve this perturbatively. Let z0=1+lna3, then let z=z0(1+δ) where we assume δ is small. Now we are solving for the perturbation

(77) z0(1+δ)=z0+13ln(z0(1+δ))

We approximate ln(1+δ)δ for small δ. This gives

(78) δlnz03z01
(79) m(az0(1+lnz03z01))13
(80) m[a(ln(a)+3)(ln(a)+ln(13(ln(a)+3))+2)3(ln(a)+2)]13

If a is large, the leading order contribution to m is a1/3.

(81) a=4eν~n~V2(1eμL)(B1)2

Measured m vs. a is shown in Figure 34. To get more insight into the dependence of a and m on parameters, we make some approximations. First, since μL<<1, we approximate (1-e-μL)μL. Then substituting ν~ in a:

(82) a4eμLη^αn~Vη^αn~V+rn~V2(B1)2
Measured mean m in simulations vs. a as given by Equation 81.

The solid line is m vs. a solved numerically using Equation 113. The right panel shows the same but with the two lowest values of η removed. Error bars are the standard deviation across three or more independent simulations.

Substituting in n~V:

(83) n~V=C0αpV(BpV1)(1f)αfg1+(BpV1)α/g
(84) a4eμLη^α(C0αpV(BpV1)(1f)αfg1+(BpV1)α/g)η^α(C0αpV(BpV1)(1f)αfg1+(BpV1)α/g)+r(C0αpV(BpV1)(1f)αfg1+(BpV1)α/g)2(B1)2

Now we do an expansion assuming large burst size, B>>1. Expanding in 1/B:

(85) a4g3C03(1f)3eημL(1pV)α2B2pV2(gC0(1f)η(1pV)+pVr)+O(1B3)

Now if η is small, specifically if pVr>η(1pV)gC0(1f), we can expand in η. This assumption roughly amounts to assuming bacterial survival followed by spacer acquisition is rare compared to phage predation and spacer loss.

(86) a4g3C03(1f)3eμLα2B2pV3r[η(1pV)gC0(1f)pVrη2(1pV)2+...]

The 0th-order term captures the trend across a wide range of parameters and gives some insight into parameter dependence (Figure 2C). To summarize: For μL<<1, B>>1 and pVr>η^gC0(1f):

(87) a4eμLη^(gC0(1f))3B2α2pV3r

Equation 87 implies that m goes like (eμη)1/3, a non-intuitive dependence.

  • m increases as the overall phage mutation rate increases.

  • m increases as ν increases.

  • m increases for the same reasons that phage time to extinction increases, except that it depends on the total phage population size (instead of the phage clone size).

Speed of evolution

This section provides more detail on our calculation of the speed of phage evolution presented in ‘Dynamics are determined by diversity’.

The speed of evolution of a population is often taken as the rate at which its genetic structure changes over time. For example, Betts et al. measure the rate of evolution in a bacteria-phage coevolution experiment by calculating the Euclidean genetic distance between populations at different time points (Betts et al., 2018), and Rouzine and Rozhnova calculate the rate of substitution (Rouzine and Rozhnova, 2018). Speed of evolution is sometimes defined more functionally as the rate of change of a population’s fitness over time (see Desai and Fisher, 2007; Desai et al., 2007; Rouzine and Rozhnova, 2018).

In our context, the average population fitness does not change over time at steady state (more like a Red Queen scenario than an arms-race scenario) and so we use a measure of genetic distance to calculate the speed of evolution. Instead of Euclidean distance (L2 norm), we use Hamming distance or edit distance (L1 norm). This distance metric is more representative of the process involved in changing sequences since the sequence [1,1] is mutationally twice as far from [0,0] as [0,1]. Our sequence space has as many dimensions as the number of genetic sites, and a population’s 'centre of mass' can be calculated by weighting each sequence ('position') in this space by the frequency of each subpopulation with that sequence. Our effective space is dimension L=30 since each protospacer has 30 sites that can mutate. Each site can have the value 0 or 1 meaning there are 230 possible sequences, but a sequence can differ by at most 30 changes from another sequence. The maximum distance possible in our model using the L1 norm is 30.

For example: if L=2 and there are 20 phages with the sequence [0,1] and 10 phages with the sequence [1,1], then the centre of mass in the two-dimensional space is [13,1]:

Rx=130(20×0+10×1)=1/3x
Ry=130(20×1+10×1)=1y

If the ancestor phage is [0,0], then the centre of mass distance is 1+1/3=43. If the ancestor sequence is at one of the corners (i.e. only values of 0 or 1), calculating the centre of mass in this manner is the same as calculating the population-weighted average distance from the ancestor: 20 phages are 1 mutation away, 10 are 2 mutations away, therefore the average distance is (20+20)/30=43. However, if the ancestor sequence is not at a corner (i.e. if comparing two centres of mass), the sum of distances gives larger values than the difference of centres of mass. The sum of distances is a measure of the spread, while the change in centre of mass is a measure of the speed of evolution.

We can look at the distance of the phage and bacteria population in simulations from the ancestor sequence over time to get an idea of how the population moves through sequence space. Figures 35 and 36 show the mutational distance for phage and bacteria over time from either the ancestor phage sequence or a centre of mass starting point later in the simulation, for low (Figure 35) and high (Figure 36) values of η.

Phage and bacteria centre of mass distance from the original phage and bacteria sequences.

The centre of mass distance is plotted in blue for phage (left) and bacteria (centre). Grey circles represent the size of clonal subpopulations at each distance from the ancestor sequence (arbitrary scale). The third panel shows the weighted average distance of the population from the centre of mass at that timepoint, a measure of the spread in sequences present at any time. In this simulation C0=104, η=10-5, μ=10-6, e=0.95, and mean m=2.8.

Phage and bacteria centre of mass distance from the original phage and bacteria sequences.

The centre of mass distance is plotted in blue for phage (left) and bacteria (centre). Grey circles represent the size of clonal subpopulations at each distance from the ancestor sequence (arbitrary scale). The third panel shows the weighted average distance of the population from the centre of mass at that timepoint, a measure of the spread in sequences present at any time. In this simulation C0=104, η=10-2, μ=10-6, e=0.95, and mean m=14.5.

Since the maximum distance in our model is 30, there comes a point in distance where mutations back in the direction of the ancestor become likely. The fraction of available mutations fm that move away from the ancestor decreases as the distance increases: fm=L-distanceL. For L=30, this means that at a distance of 15 the population is equally likely to move towards or away from the ancestor. This is apparent in Figure 35 – the distance from the ancestor phage reaches 15 and then begins to decrease. In other simulations, the distance continues to increase and does not come close to 15 (Figure 36).

Measuring speed of evolution

The speed of evolution in this framework is the distance the population travels in spacer space in a certain amount of time. In principle, this is straightforward to calculate, but we find that the distance between populations does not increase linearly with the time interval used to calculate the distance (Figure 37 left panel). This means that measured speed will depend on the time interval used to calculate it (Figure 37 right panel), and there is no clear way to choose one particular time interval over another. We go to the long Δt limit and measure the maximum distance from the ancestor reached by a population in a set number of bacterial generations. The maximum distance reached by bacteria and phage populations is highly correlated with each other within a simulation and is also repeatable across independent simulations (Figure 38). Speed in this definintion increases with phage mutant initial fitness (Figure 39).

Phage and bacteria centre of mass distance from the centre of mass at time t-Δt (left) and the distance divided by the time interval Δt (right).

Distances are averaged over the entire simulation at steady state; error bars are standard deviation. Simulation parameters are C0=104, η=10-2, μ=10-6, e=0.95, and mean m=14.5.

Maximum distance from ancestor population for bacteria vs. phage.

The maximum distance is highly correlated, indicating that the bacteria population tracks the phage population closely. Error bars are the standard deviation across three or more independent simulations.

Phage mutational distance per generation vs. initial phage mutant fitness for simulations with e=0.95, η=10-3, and μ=3×10-7.

Error bars are the standard deviation across multiple independent simulations and are shown in the positive direction only.

The total number of phage establishments in a simulation determines the maximum distance that the population can move away from the ancestor, but if diversity is high, many of those establishments will not actually move the population further away. A simple approximate scaling for this process is that total distance is proportional to 1m: if there are m established clones and one of them is the furthest away from the ancestor, there is a 1m probability that the next establishment will come from the furthest-away clone. This is similar to Childs et al., 2012 where if they have v protospacers per phage (and one dominant bacterial species with a spacer), there is only a 1/v chance that a random mutation will actually be an escape mutation.

Since m is a product of mutation rate, establishment probability, and extinction time, plotting distance per establishment vs. m is equivalent to plotting speed vs. time to extinction (Equation 88). We have m=Pestμ¯T, where Pest is the probability of establishment for a new phage clone, μ¯ is the effective phage mutation rate, and T is the mean time to extinction for large phage clones. Let the mutational distance from the ancestor be Δ=vτ, where v is the 'speed' and τ is a fixed time interval used to measure distance.

(88) ΔPestμ¯τ1mvτPestμ¯τ1Pestμ¯Tv1T

Figure 6—figure supplement 2 shows distance per establishment vs. m and speed vs. T. Relating distance and speed to the time to extinction gives further insight into the underlying parameter dependencies: we know from other approximations ('Analytic approximations for diversity') that Tn~V1+lnmm, and for small m we can substitute our approximation for m, giving T1eμη13 and veμη13. Applying the same approximation here as for m above (Equation 87), with Text2n~Vf(B-1)m(1-1BpV) (Figure 6):

(89) vαBf(eμηL(1pV)2α2B2r)13

Spread in sequence space

We can ask how spread out the population is in sequence space – how far is the average clone from the centre of mass? When there are more clones (larger m), the population spread is larger (Figures 40 and 41). The spread also depends on initial conditions, at least on the timescale of our simulations: simulations that start with one phage clone (Figure 40) have lower spread than simulations that start with 50 phage clones (Figure 41). In principle, waiting for a very long time in simulations that begin with 1 clone should produce results in line with simulations that start with 50 clones as different clans diverge from each other over time. This suggests that there are multiple features with which we could define steady state, and they do not all reach true steady state at the same time. Total population size equilibrates fastest, followed by diversity (Figure 8), and then spread in sequence space.

Average population distance from the centre of mass at steady state for bacteria (left) and phages (right) vs. mean m for simulations with one original phage clone ancestor.

The dashed line is m-1, a purely phenomenological choice. Error bars are the standard deviation across three or more independent simulations.

Average population distance from the centre of mass for bacteria (left) and phages (right) vs. mean m for simulations with 50 original phage clones.

The dashed line is m-1 and the solid line is m-1. Error bars are the standard deviation across three or more independent simulations.

Number and size of clone clans

Watching a simulation visualization of spacer and protospacer types, it is apparent that over time several different lineages may appear and become separated from each other in genome space (Figure 42). What determines how many separate 'clans' there are at steady state? What is the characteristic size of a clan? How does this depend on parameters? To address this question, we perform agglomerative clustering on protospacer and spacer sequences to identify separated groups of sequences. We use the L1 norm to define distance, which as described above is the most natural distance metric for sequence changes by mutation. Clusters are grouped using the ‘single’ linkage criterion in scikit-learn AgglomerativeClustering: the cluster distance is the minimum of the distances between all observations of the two sets. In other words, a collection of sequences that differ from any other sequence in the cluster by one mutation are all grouped together, even if some of the sequences differ by more than one from each other. This is a reasonable criterion for identifying groups of sequences that are related by descent and are actively feeding each other with mutations.

A frame from a simulation movie at 5000 generations with C0=104, η=10-3, μ=10-5, e=0.95, initial m=1.

Phages are on the left, bacteria with spacers on the right.

Figures 43 and 44 show the resulting dendrogram from this clustering process for phages and bacteria at one time point for one initial phage clone and 10 initial phage clones. To determine the number of clusters, we use a distance threshold of 2. This separates groups that have no members closer than two mutations away. In reality, these groups may still be linked by descent more recently than other groups (apparent in the dendrograms), but a distance of 2 or more means these groups are likely to remain separate going forward in time since they lack a connecting clone that can be accessed by a single mutation.

Dendrogram resulting from agglomerative clustering with the L1 norm and linking clusters using the minimum distance between members.

The number of clusters is determined with a cutoff at a distance of 2. C0=104, η=10-3, μ=10-5, e=0.95, initial m=1.

Dendrogram resulting from agglomerative clustering with the L1 norm and linking clusters using the minimum distance between members.

The number of clusters is determined with a cutoff at a distance of 2. C0=104, η=10-3, μ=10-5, e=0.95, initial m=10.

Simulations with an initial number of clones greater than 1 converge to a steady-state number of clusters at a different rate than beginning with one clone since clones from a single ancestor remain more related on average than clones from multiple different ancestors (Figures 45 and 46).

Clan number and size over time in a simulation with C0=104, η=10-3, μ=10-5, e=0.95, initial m=1.
Clan number and size over time in a simulation with C0=104, η=10-3, μ=10-5, e=0.95, initial m=10.

We calculate the average number of clans at steady state and the average clan size for bacteria and phages across all simulations. The number of clans is proportional to m; at high mutation rates the number of clans is approximately m/2. Low mutation rates mean that it takes a long time for the number of clans to equilibrate, which can be seen by contrasting the mutation rate dependence of 10 initial clones (Figure 47) with 1 initial clone (Figure 48).

Average clan number vs. m for all simulations that begin with 10 clones.

The dashed lines are m divided by the mean bacterial clan size (1.3) and m/2. Error bars are the standard deviation across three or more independent simulations.

Average clan number vs. m for all simulations that begin with one clone with μ10-6.

The dashed lines are m divided by the mean bacterial clan size and m/2. Error bars are the standard deviation across three or more independent simulations.

The regimes of low and high clan number in our model are similar to regimes of virus-host coevolution identified in Marchi et al., 2019: at low virus mutation rates and low mutational jump distances, trajectories move in a straight line in antigenic space, while at higher jump distances, trajectories become more diffusive, the way ours look. At even higher mutation rates and jump sizes, trajectories split into multiple stable coexisting lineages, which is what we see at high diversity.

Based on their parameters and regimes, we expect that increasing cross-reactivity relative to mutation rate in our model should bring our results closer to the ballistic regime.

Cross-reactivity

In this section, we present additional methods and results for simulations with two types of cross-reactivity relating to results in 'Cross-reactivity leads to dynamically unique evolutionary states'.

In our standard simulations and theory, we assume that pV(i,j) is binary (Equation 6): any mismatch between protospacers and spacers means bacteria no longer have any immune advantage against phage. Here, we discuss results of simulations where pV includes some cross-reactivity so that the more mutations a protospacer contains, the less immunity bacteria have against it.

We explore two types of cross-reactivity: one in which we define pV(i,j) as an exponential function of the mutational distance between a spacer and protospacer (Equation 7), and one in which pV(i,j) is a θ-function of mutational distance (Equation 8). The exponential definition of cross-reactivity is the same as that used in Marchi et al., 2019; Yan et al., 2019 to model viruses evolving in abstract antigen space (Marchi et al., 2019) and in genomic distance space (Yan et al., 2019). Note that our usual binary definition of pV is a special case of Equations 7 and 8 with θ=0. Figure 49 shows these different definitions of pV(i,j) as a function of dij.

Phage infection success probability pV as a function of mutational distance between spacers and protospacers for different definitions and degrees of cross-reactivity.

Definitions are plotted for pV=0.02, e=0.95.

Cross-reactivity leads to a higher average bacterial immunity against phages for the same parameters (Figures 50 and 51). This means that our assumption that effective ee/m, valid for nearly all parameters without cross-reactivity, breaks down when we introduce cross-reactivity. Importantly, this is a quasi-steady-state effect: if we start simulations with more initial clones, clone clans are genetically separated and both mean m and average immunity converge to the binary pV case (Figure 50 right panel).

Average immunity vs. diversity with different degrees of cross-reactivity for simulations with e=0.95, μ=10-6.

Dashed lines are simulations with cross-reactivity, solid line is simulations without cross-reactivity. Error bars are the standard deviation across three or more independent simulations.

Mean phage clone size (top) and mean bacteria clone size (bottom) relative to time of phage mutation for different definitions of pV.

These simulations begin with one initial phage clone and parameters C0=104, μ=10-6, η=0.0001, e=0.95.

Phages are less likely to establish when there is cross-reactivity because phage escape mutations are incomplete (Figure 5A). However, the clones that do survive grow quicker: simulations with high cross-reactivity appear to have a faster mean growth rate for clones, regardless of conditioning on survival (Figure 51).

Adding cross-reactivity causes some unexpected dynamical behaviours (Figure 52, Figure 53). We observed a travelling wave regime in some simulations with large spikes in phage clone size (Figures 54 and 55). This happens because cross-reactivity creates a fitness gradient for new phage mutants such that some mutants are much more fit and establish much more quickly.

Number of bacterial clones (top) and average immunity (bottom) for simulations beginning with either 1 phage clone (left) or 10 phage clones (right).

These simulations have parameters C0=104, μ=10-6, η=0.0001, e=0.95.

Phage clone size (top) and bacteria clone size (bottom) in a simulation with cross-reactivity (step function CRISPR effectiveness with θ=1).

This simulation has one initial phage clone and parameters C0=104, μ=10-6, η=10-4, and e=0.95. Later times in this simulation are shown in Figure 55, and earlier times in this simulation are shown in Figure 54.

Phage clone size (top) and bacteria clone size (bottom) in a simulation with cross-reactivity (step function CRISPR effectiveness with θ=1).

This simulation has one initial phage clone and parameters C0=104, μ=10-6, η=10-4, and e=0.95. Later times in this simulation are shown in Figure 55.

Phage clone size (top) and bacteria clone size (bottom) in a simulation with cross-reactivity (step function CRISPR effectiveness with θ=1) showing the switch between a travelling wave and low turnover regime.

This simulation has one initial phage clone and parameters C0=104, μ=10-6, η=10-4, and e=0.95. Earlier times in this simulation are shown in Figure 54.

To see this happening, let us look closely at the simulation in Figure 54 at time 3565 (Figure 56). There are three large phage clones and two large bacteria clones (soon to be three). Phage clone 0 is the largest and phage clone 2 is the smallest. Phage clone 0 and phage clone 1 are both either 0 or 1 mutation away from bacteria clones 0 and 1 (Figure 57). Because θ=1, they are all equally fit; effectively they could be considered one large clone. Phage 2, however, is 1 mutation from bacteria clone 1 but 2 mutations from bacteria clone 0, so it now has a much higher fitness than the other two phage clones and grows quickly. But now bacteria clone 1 is also fit against this new mutant, so it too continues growing. This push-pull keeps going with new phage mutants experiencing runaway fitness for a while. This also leads to an asymmetry in clone identity between the populations: because a bacterial clone may be immune to a phage clone with a different but related sequence, we see bacteria clones becoming large after their matching phage clone has gone extinct because they are still resistant to an existing phage clone that is genetically related (Figures 58 and 59).

Phage clone size (top) and bacteria clone size (bottom) for a short time window of the simulation shown in Figure 54.

Large phage and bacteria clones are numbered in the legend; these numbers correspond to the numbers in Figure 57.

Phage infection success probability matrix for each clone shown in Figure 56.

Dark blue is low infection success, light blue is high infection success.

Mean phage clone size (top) and mean bacteria clone size (bottom) relative to the time of phage clone mutation, either normalized to surviving clones (left) or averaged over all trajectories (right) for the simulation shown in Figure 54 (C0=104, μ=10-6, η=10-4).

Clones in the travelling wave regime (4000–6200 generations, orange) grow much more quickly than clones in the initial low-turnover regime (1000–3200 generations, blue) or the final low turnover regime (7000–10,000 generations, green). The black dashed line is the mean clone size for a simulation with the same parameters but without cross-reactivity.

A slow-switching cross-reactivity regime for the simulation shown in Figure 54.

(Left) A subset of clone trajectories for phages (top) and bacteria (bottom) in a simulation with cross-reactivity (step function CRISPR effectiveness with θ=1). Three trajectories are highlighted and coloured to show increasing time. The population is in a regime where matching clones are offset: because clones that are one mutation apart have the same complete immune overlap, the highlighted clones have large bacterial clone size well after the matching phage clone goes extinct. (Right) The clone size of the highlighted trajectories shown as a function of the bacterial marginal immunity against a particular phage clone (top) or the marginal immunity of a particular bacteria clone (bottom).

This fitness and growth pattern can be seen as a function of the marginal immunity of bacterial clones to the phage population and of the bacterial population to individual phage clones (Figure 59). Average immunity for individual clones, which we call marginal immunity in analogy with Nourmohammad et al., 2016, captures the fitness differences between individual clones that determine their dynamical behaviour. Without cross-reactivity, bacteria clones are only immune to at most one phage clone at a time, though their overall marginal immunity still changes smoothly as the matching phage clone changes size (Figure 60B). When cross-reactivity is added, bacteria may be immune or partially immune to multiple phage clones, and calculating marginal immunity summarizes all of these combined effects on the fitness of a particular clone. Bacteria clones grow larger once their marginal immunity is high, and phage clones grow large once bacterial marginal immunity against them is low. These opposing forces generate Lotka-Volterra oscillations even in the case with no cross-reactivity, though these oscillations are more rapid and persistent when cross-reactivity is added (Figure 60C and D). Phage clones grow quickly when bacteria have low marginal immunity to their sequence, while bacteria clones grow quickly when they have high marginal immunity.

Cross-reactivity leads to persistent oscillations.

(A) A subset of clone trajectories for phages (top) and bacteria (bottom) in a simulation with no cross-reactivity. Transient oscillations occur. One trajectory is highlighted and coloured to show increasing time. (B) The highlighted trajectory in (A) is shown as a function of the marginal immunity for phages (top) and bacteria (bottom). Clones experience an oscillating fitness that depends on their overlap from the other population. Arrows indicate the direction of increasing time in the oscillation. (C) A subset of clone trajectories for phages (top) and bacteria (bottom) in a simulation with cross-reactivity (step function CRISPR effectiveness with θ=1). The population is in a regime where several clones experience persistent and rapid oscillations. One trajectory is highlighted and coloured to show increasing time. (D) The highlighted trajectory in (C) is shown as a function of the marginal immunity for phages (top) and bacteria (bottom). Clones experience an oscillating fitness that depends on their overlap from the other population. Arrows indicate the direction of increasing time in the oscillation. For all simulations C0=104,e=0.95,η=10-4, and μ=10-5.

Clones grow more quickly on average in the travelling wave regime than in other regimes in the same simulation (Figure 58).

In Figure 4, we show phylogenies of four simulations with different amounts and types of cross-reactivity. We used DendroPy Sukumaran and Holder, 2010 and Toytree Eaton and Matschiner, 2020 to construct and plot phylogenies for different simulations. Figure 61 shows simulations for the same parameters as in the main text but a tenfold higher phage mutation rate: most simulations have multiple coexisting lineages, and the travelling wave regime is less long-lived before periods of low turnover. In the low-turnover regime, a relatively small number of large clones that are all outside of each other’s cross-reactivity radius can coexist for a very long time, oscillating out of phase from each other (Figures 62 and 63).

Phage clone phylogenies for four simulations with different cross-reactivities and a higher mutation rate than shown in main text Figure 4: no cross-reactivity (A) and step-function cross-reactivity with θ=1 (B, θ=2) (C), and θ=3 (D).

All simulations share all other parameters: C0=104,η=10-4,μ=10-5,e=0.95. Phage clones are plotted at the first time they pass a population size of 2 (to remove clutter from many new mutations destined for extinction), and the size of each circle is logarithmically proportional to the maximum size reached by that clone. Colours indicate the time of extinction of each clone. For each simulation with cross-reactivity, the left inset shows phage (top) and bacteria (bottom) clone sizes over time; colours indicate unique clone identities.

Phage clone size (top) and bacteria clone size (bottom) for a short time window of the θ=2 simulation shown in Figure 61C.

Each of the large trajectories oscillating out of phase between 5000 and 7000 generations is at least three mutations away from all of the others; they are all outside each other’s cross-reactivity radius.

Matrix of mutational distance between each of the four largest phage clones shown in Figure 62; colours of those trajectories are labelled on the y axis.

Each clone is at least three mutations away from all other large clones.

The durability and turnover of immune memory is qualitatively different in the different regimes of the simulation with cross-reactivity shown in Figure 54. We calculated time-shifted average immunity for each of the three regimes (initial regime with no turnover, travelling wave regime, and persistent oscillation regime) and found that turnover was extremely low in all but the travelling wave regime, which had a rapid decay of immune memory to zero (Figure 64). In contrast, the immune memory for the same parameters without cross-reactivity had a much more gradual long-term decay towards zero immunity (Figure 65). In the travelling wave regime, different amounts of cross-reactivity result in different time-shifted immunity patterns. Turnover is slow but smooth with exponential cross-reactivity, while turnover is much more rapid with step-function cross-reactivity (Figure 66). The differences in rate of turnover can also be seen by comparing timescale of sequence change in Figure 4.

Time-shifted average immunity for three regimes of the simulation shown in Figure 54 (C0=104, μ=10-6, η=10-4, and e=0.95, step-function cross-reactivity with θ=1).

The initial low-diversity regime (1000–3200 generations) and the low turnover, high diversity regime (7000–10,000 generations) had extremely low turnover, while the travelling wave regime (4000–6200 generations) had high average immunity near 0 delay that rapidly decayed to zero both in the past and future.

Time-shifted average immunity for the corresponding simulation to Figure 64 without cross-reactivity (C0=104, μ=10-6, η=10-4, and e=0.95). Peak average immunity is low because of high diversity, and immunity decays very gradually to zero in both the past and future.
Time-shifted average immunity for four simulations with the same parameters but different types of cross-reactivity: no cross-reactivity (top), exponential cross-reactivity (middle top), step-function cross-reactivity with θ=1 (middle bottom), and θ=2 (bottom).

Shared parameters are C0=104, μ=10-6, η=10-4, and e=0.95. Only the travelling-wave regime of each simulation with cross-reactivity was used to compare turnover in this regime. The grey shaded region is the standard deviation across averaged data.

Adding cross-reactivity appears to slightly decrease average clan sizes (Figure 67). This is what we intuitively expect since cross-reactivity means phages are under pressure to get as far away from other clones as possible.

Average clan size across simulations with different parameters for different degrees of cross-reactivity with minit=10.

Error bars are the standard deviation across three or more independent simulations.

Data analysis

This section contains details of our methods for analysing sequencing data from three published papers – groundwater metagenome data used in calculating spacer turnover (Metagenomic CRISPR spacer analysis from Burstein et al., 2016), a laboratory long-term coevolution experiment used for spacer turnover and time-shifted average immunity (‘Laboratory coevolution experimental data’), and wastewater treatment plant metagenome data used for spacer turnover and time-shifted average immunity (‘Experimental data from wastewater treatment plant’). Methods and toy models relating to calculating average immunity in data are in ‘Theoretical considerations when calculating average immunity from data’.

Metagenomic CRISPR spacer analysis from Burstein et al., 2016

This dataset is the source for Figure 6F in ‘Dynamics are determined by diversity’.

We detected spacer sequences in metagenomic data from Burstein et al., 2016. We analysed the 0.1 μm filter dataset which contains metagenomic reads at six time points, found under accession PRJNA268031. We searched for matches to the CRISPR repeats identified in the study using blast; we accessed a list of repeats from the study’s Supplementary data 2. There were 144 unique repeat sequences. Reads are 150 bp long, and we kept only blast results where a repeat had two or more matches to a read in order to accurately detect spacers and to reduce spurious matches to repeats. To further ensure match quality, we kept only alignments that were at least 85% similar to repeat queries, unless another match to the same repeat was present on the read. This 85% threshold was chosen for consistency with previous spacer identification studies (Paez-Espino et al., 2013; Paez-Espino et al., 2015; Skennerton et al., 2013).

It is possible that multiple unique repeats match the same read. For each matched read, we kept the repeat match that had the highest alignment identity (fractional alignment length times percent identity). We detected spacers between repeats, then used the detected spacer length to extract spacers on either side of matched repeats. Because we required at least two repeat matches, at most three spacers could be detected from a read, and at minimum two spacers were detected. We detected a total of 37,963 spacers of which there were 17,491 unique sequences. The most abundant sequence was present 572 times. After grouping within each unique repeat by spacer sequence similarity with an 85% average similarity threshold, there were 12022 unique types with the largest type having an abundance of 616.

Theoretical considerations when calculating average immunity from data

The methods described here are used to scale the y-axis of Figure 7C, D in ‘Time-shifted average immunity calculated from data reveals distinctive patterns of turnover’.

In our simulations and model, we assume that each phage has a single protospacer and each bacterium has a single spacer. In reality, phages can have hundreds to thousands of possible protospacers, and bacteria can also acquire tens to hundreds of spacers (Pavlova et al., 2021). If we assume that average immunity plays out at the organism level; that is, if a bacterium that contains one or more spacers matching one or more phage protospacers is immune to that phage, then calculating average immunity in practice requires knowing something about the typical numbers of protospacers and spacers in organisms.

First, let us consider the effect of changing the number of protospacers while keeping spacer array length constant at 1. This is a reasonable model for experimental data at short timescales when most bacteria acquire only one new spacer. For a set i of observed spacers and a set j of observed protospacers with abundances ni and nj, if each spacer and protospacer are assumed to belong to one organism only, then the average immunity is given by Equation 90. We assume for simplicity that any matching spacer provides perfect immunity, that is, pV(i,j)=pV(1-δij).

(90) 1i,jninjpV(i,j)pVi,jninj=1pVi,jninj(1δij)pVi,jninj=i,jninjδijNiNj

where Ni denotes ini.

If instead the same set of protospacers is divided among fewer phages so that each phage has a protospacers on average, then the total number of phages is Nj/a. The numerator of average immunity remains the same since each bacterium still contains only a single spacer. Average immunity is exactly the same as before but multiplied by the average number of protospacers per phage (Equation 91).

(91) i,jninjδijNiNja=ai,jninjδijNiNj

This agrees with the intuition that more spacer targets per phage increases the immune potential of the CRISPR system.

In data from Paez-Espino et al., 2015, we find quantitative agreement between the range of average immunity values in their data and the types of values in our simulations once we apply this simple transformation with a=696, since there are 231 possible CRISPR1 protospacers and 465 possible CRISPR3 protospacers in the phage genome (determined by searching for the canonical PAM for each CRISPR locus).

If bacteria contain multiple spacers and phages contain multiple protospacers, the combinatorics of average immunity gets more interesting. Now, if a bacterium contains multiple spacers that target the same phage, we assume this does not increase its immunity (although in reality there may be a benefit to multiple matching spacers).

We simulated sets of spacers and protospacers, randomly divided them into arrays of different sizes, and calculated the organism-level average immunity in each case. Figure 68 shows the average immunity values for several array lengths with different assumptions for how the set of spacer sequences and array lengths are distributed. We chose a set of spacer ‘sequences’ where each sequence is a letter of the alphabet, then constructed phage protospacer arrays by randomly drawing 5 unique letters 50 times to create 50 phages. We then sampled letters from the alphabet either uniformly or from an exponential distribution (where ‘A’ is the number 1, etc.). The exponential sampling captures the fact that in practice spacer abundance distributions are highly non-uniform. To create bacteria spacer arrays, we sampled from the set of spacers without replacement, either creating arrays of constant size or arrays with the same mean size but with their length either normally or exponentially distributed.

Average immunity vs. bacterial spacer array length for simulated distributions of protospacers and spacers.

We simulate 50 phages, each with 5 protospacers represented by letters from the alphabet, uniformly sampled. We simulate 420 bacterial spacers, drawn from the alphabet either uniformly (top row) or following an exponential distribution with mean 6 (bottom row). We construct arrays by sampling from the set of 420 spacers without replacement, either creating arrays of constant length (left column), or of variable length with a gaussian distribution (middle column) or exponential distribution (right column). Average immunity is calculated as in Equation 90 except the indices run over all arrays and not over individual sequences, and the presence of any matching pair gives perfect immunity. Blue points are average results over 50 simulations; error bars are standard deviation. The black dashed curve is given by an=1-Cn=1-(1-a1)n.

We developed a simple theoretical model for the change in average immunity as array length increases. The baseline average immunity when bacteria are assumed to have single spacers we denote a1, and define a constant C=1-a1. Then as array length increases, the total number of bacteria decreases and the overlap increases, but by a smaller factor than the population size decrease: the remaining average immunity that could be gained is reduced with the same fraction as the original average immunity so that the average immunity for an array of length n+1 is an+1=1-(1-an)C. Plotting 1-an vs. n on a log plot gives a straight line. Plugging in a1=1-C gives an=1-Cn, plotted as a black dashed line in Figure 68. The theory agrees well with the simulated results except for when array length is exponentially distributed; in this case, we believe the presence of several very long and very short arrays means that short-array bacteria do not get the coverage of multiple spacers while the long-array bacteria have redundant spacers. This functional form for average immunity as a function of array length is very similar to a quantity derived by Iranzo et al., 2013 — they calculated that the probability that a random bacterium is immune to a random phage, each with a random set of spacers and protospacers is pc=1-(1-αnNt)Ns, where n is the average bacterial array length, Nt is the total number of unique protospacers, and Ns is the number of protospacers per phage. The constant α is a scaling factor that measures the degree of correlation between matching spacer and protospacer abundances. Their scaling is as a function of the phage array length instead of bacterial, but the same intuition applies.

An interesting result of this analysis is that the relative immune benefit from gaining a spacer decreases as immunity increases: a CRISPR array twice as long does not provide double the immunity. A similar diminishing-returns result has also been observed in theoretical models of vertebrate immunity where the relative decrease in immune susceptibility and increase in immune memory both decrease as the number of infections increases over an organism’s lifetime (Mayer et al., 2019). Long CRISPR arrays are also subject to a dilution effect: spacers compete to form complexes with limited Cas protein machinery, and if the number of spacers is high, there is a higher chance that the needed spacer is not available in high enough numbers during an infection (Martynov et al., 2017; Bradde et al., 2020).

Average immunity negatively correlates with diversity regardless of array size

This section describes the toy model we present in results ‘Pathogen and host diversity must be considered together.

In our simulations, we observed an inverse proportionality between average immunity and bacterial diversity (Figure 50). We wondered if this trend would hold for array sizes and numbers of protospacers larger than 1. Our intuition is that this trend should hold for any arrangement of spacers and protospacers into arrays, provided the following two statements are true: (a) bacteria and phage diversity is coupled; that is, phage protospacer diversity matches bacterial spacer diversity, and (b) diversity is larger than CRISPR array length.

We tested this with a toy model, described in ‘Pathogen and host diversity must be considered together’. Conceptually, there are two regimes of CRISPR immunity and phage evolution to consider. In an ‘evolutionarily late’ regime, bacteria are exposed to many diverse phages for which any one spacer may target only a few phage genotypes. In an ‘evolutionarily early’ regime, most phages are clonally similar and protospacer diversity is generated through escape mutations. In this scenario, we can imagine a set of phages that share most of their protospacers but have a small number of recently mutated protospacers. This regime can further be divided into a slow-mutation ‘successive sweeps’ regime in which each new mutation takes place in the background of the previous mutant and a fast-mutation ‘clonal interference’ regime in which many unique mutations coexist in the same background.

We explored these three scenarios by generating synthetic sets of protospacers and spacers and varying the total diversity by changing the size of the pool of available sequences, distributing sequences exponentially in abundance to qualitatively match data Bonsma-Fisher et al., 2018. We randomly assigned sequences to individual phages and bacteria, then calculated average immunity in three scenarios of protospacer origin. First, we divided protospacers randomly among 100 phages so that each phage had x unique protospacers to mimic an evolutionarily diverse population. In the second scenario, we created a set of phages that mimic a population undergoing successive sweeps by changing the first protospacer, then the first and second, and so on. In the third scenario, the set of phages share the first x-1 protospacers and only the last protospacer is varied; this mimics an initially clonal population undergoing rapid mutations. In all scenarios, we sampled uniformly from the pool of protospacers to generate 1000 spacer sequences that were divided into bacterial arrays with either a constant array size (Figure 69B–D), or a Gaussian-distributed array size (Figure 70). We repeated the bacterial array assortment process 20 times for each total spacer diversity and combination of array sizes.

Average immunity decreases with coupled CRISPR diversity.

(A) Inferred initial average immunity as a function of diversity in the experiment of Common et al., 2020. Bacterial diversity was manipulated in the experiment by combining different numbers of bacterial clones defined by unique CRISPR spacers with a single phage strain that was able to infect only one of the clones. We calculated the expected initial average immunity based on the number of clones. (B–D) Results of a toy model of sorting a set of protospacers and spacers into arrays of variable length, either assigning protospacers fully randomly (B), as sequential mutations (C), or changing the last protospacer in the array only (D).

Average immunity vs. diversity (number of unique types) for simulated distributions of spacers and protospacers with different bacterial array sizes (increasing top to bottom) and different numbers of protospacers per phage (increasing left to right).

Bacterial array sizes are drawn from a Gaussian distribution with mean given by the array mean for each row and a standard deviation of 2. Points are averages across 50 independent runs.

We calculated average immunity in these synthetic scenarios, assuming for simplicity that any matching spacer provides perfect immunity, that is, e=1, pV(i,j)=pV(1-δij), or pV(I,J)=pV[IJ=]. For the set of bacteria nB and phages nV, the average immunity for the population is

(92) 1I,JnBInVJpV(I,J)pVI,JnBInVJ=1pVI,JnBInVJ[IJ=]pVI,JnBInVJ=I,JnBInVJ[IJ]NBINVJ

where NBI=InBI and NVI=InVJ. In our base model, bacteria and phages are each limited to a single spacer or protospacer, whereas now I and J label the full set of spacers or protospacers in an organism. nBI is the number of bacteria with the same unique set of spacers I and the Iverson bracket [IJ] gives 1 if any of the spacers in I matches any of the protospacers in J and 0 if there is no overlap between sets I and J.

The 1-spacer 1-protospacer curve in Figure 69B represents the same situation as our simulations: one spacer and one protospacer (but with a random distribution of sequences), and the trend matches our simulations exactly with average immunity inversely proportional to diversity. This strict inverse dependence changes shape as the number of spacers and protospacers per organism changes.

We compared these toy model results to the experimental setup of Common et al., 2020. In their experiments, they combined equal proportions of bacterial strains with a different CRISPR spacer with one phage strain that is either targeted by all spacers or escaped from one of the spacers. There is then one susceptible bacterial strain per mixture when the escape phage is used. All experiments also contained one surface mutant bacterial strain that is always immune to the phage.

We can directly calculate the expected initial average immunity based on this setup. For m bacterial CRISPR strains, one surface mutant strain, and one escape phage, total bacteria Nb, total phage Nv, each bacterial strain has abundance Nb/m and vj=Nv.

We calculate average immunity assuming e=1:i,jbiviδijNiNj

We can include the surface mutant or not; it slightly shifts the total diversity but the trend is unaffected. For instance, for m=3 CRISPR clones, the phage is able to infect one of them, so average immunity is NvNb/m+NvNb/mNbNv=2m=23. If we included the surface mutant in equal proportions, we would get 3/4 instead.

Laboratory coevolution experimental data

This dataset is the source for Figure 6E in ‘Dynamics are determined by diversity’ and Figure 7C in ‘Time-shifted average immunity calculated from data reveals distinctive patterns of turnover’.

We analysed experimental data from Paez-Espino et al., 2015. In this experiment, S. thermophilus bacteria were mixed with phage 2972 and allowed to coevolve until phage extinction, up to 232 days in one replicate. Bacteria and phage whole-genome shotgun sequencing was performed at irregular intervals (13 time points for series MOI-2B). This data is publicly available in the NCBI Sequence Read Archive under the accession PRJNA275232.

We analysed data from the MOI-2B series and detected spacers in the CRISPR1 and CRISPR3 loci by searching raw reads for matches to the S. thermophilus repeats (GTTTTTGTACTCTCAAGATTTAAGTAACTGTACAAC for CRISPR1 and GTTTTAGAGCTGTGTTGTTTCGAATGGTTCCAAAAC for CRISPR3) using BLAST.

The expected structure of the CRISPR locus is 36 nt repeats interspaced with 30 nt spacers. Reads are 100 nt long (Illumina), and so at most two complete spacers can be detected per read (one full repeat match near the centre of the read). To maximize the number of genuine spacers detected while removing low-quality matches, we kept only full-length alignments to the repeat (length 36), unless a shorter alignment was also present on the same read as a full-length alignment (for instance, if the repeat is partially present at the start or end of a read). If a single repeat match was present on a read, we extracted 30 nt on either side and labelled these spacers. We set a minimum spacer length to be 26 nt (0.85×30) and did not keep spacer sequences at the start or end of a read that were shorter than this. If two repeat matches were present, we extracted the spacer sequence between the matches.

To detect wild-type spacers, we searched for matches to the CRISPR repeats on the S. thermophilus DGCC7710 reference genome (accession NZ_CP025216) and extracted the sequences between repeat matches. We also searched raw reads from the day 1 data of the control replicate (no phages) for matches to the repeats and performed the same spacer extraction procedure.

We grouped all extracted spacers using the AgglomerativeClustering method from SciPy with an 85% similarity threshold and the ‘average’ linkage criterion. We performed this grouping separately for CRISPR1 and CRISPR3 spacers. Each spacer sequence was then labelled with a group type identifier. Figure 71 shows our detected spacer counts after grouping by 85% average similarity and eliminating single counts and matches to wild-type spacers, compared with the reported results from Paez-Espino et al., 2015; there is good agreement between our counts and the original authors’ counts.

Unique spacer types detected in our analysis of data from Paez-Espino et al., 2015 after grouping by 85% average similarity and removing single spacer counts (blue bars).

Counts reported in Paez-Espino et al., 2015 are red points.

To detect protospacers, we first blasted all reads against the S. thermophilus DGCC7710 reference genome and the phage 2972 reference genome (accession NC_007019) (Figure 72). We next blasted all unique detected spacer sequences from the previous step against all the reads, removing any query sequences that are a perfect subset of another sequence and any sequences that were more than 30% N nucleotide. This detects both potential protospacer sequences and the original spacers themselves. To isolate protospacers, we kept only results that were on reads that did not match the bacterial genome and that did not match the CRISPR1 repeat (CRISPR3 repeats were not checked; however, they represent a very small number of additional reads, less than 0.1%). We kept only results that were 26 nt or larger. If there were multiple hits on a read from the same spacer type (but different sequences), we kept only the match with the lowest e-value. We extracted the matched sequence from the read and 10 nucleotides after it to check for a PAM sequence (all spacers were stored oriented in the same direction relative to the repeat to facilitate comparison and PAM detection). Since reads are paired-end, some reads will overlap, and some spacers may be double-counted in the overlap. We decremented the total spacer or protospacer count for a sequence by 1 if a spacer sequence was present on both ends of a paired read. We also removed sequences that contained long strings of 11 or more of the same nucleotide, assuming these to be sequencing errors. This removed around 1% of sequences.

Total reads per time point that match phage (blue) or bacteria (orange).

Top: total reads. Middle: total reads divided by phage and bacteria genome sizes. Bottom: fraction of total reads matching bacteria or phages.

We grouped all spacer and protospacer sequences together (separated by CRISPR locus) by several different similarity thresholds between 85 and 99% to assign type labels based on each similarity threshold.

We checked the 10 nt region downstream from each potential protospacer and removed any sequences that did not have a perfect match to the PAM: AGAAW for CRISPR1 (Paez-Espino et al., 2013; Paez-Espino et al., 2015; Shah et al., 2013; Garneau et al., 2010 and GGNG for CRISPR3 Paez-Espino et al., 2015; Shah et al., 2013). Since targeting is highly sensitive to PAM mutations (Shah et al., 2013), we assumed that any deviation from the perfect PAM meant that the protospacer would not be successfully targeted. There are cases where the PAM sequence is incomplete because of hitting the start or end of a read; these were also assumed to be imperfect PAMs (though some would be genuine). Changing this assumption to include partial PAMs that were subsets of perfect PAMs caused total average immunity numbers to increase and the slope of each curve to decrease. Results including incomplete PAMs were qualitatively similar to the results including all protospacer matches regardless of PAM, highlighting the importance of a perfect PAM match for functional immunity (Figure 7—figure supplement 2 and Figure 7—figure supplement 3; Figure 73).

Number of shared spacer types between bacteria and phage (top left), ratio of the number of phage types to the number of bacteria types (top right), number of bacteria types (bottom left), and number of phage types (bottom right) as a function of the sampling date for data we analysed from Paez-Espino et al., 2015.

Colours indicate different similarity grouping thresholds. Spacer counts include wild-type spacers, and protospacers are included only if they possess a perfect PAM. Data is summed over CRISPR1 and CRISPR3. All sequences are included regardless of abundance.

Calculating average immunity

To calculate average immunity between bacteria and phage, we interpolated spacer and protospacer counts between sequenced time points using the shortest interval between experimental time points as the sampling frequency from the interpolated data. We removed the first time point and last two time points from the data because the first time point has low bacterial spacer counts and a very large phage population size, and at the last two time points, phage counts are very low because phages are about to go extinct (Figure 74 and Figure 73). Results including all time points are given in Figure 7—figure supplement 4; they are qualitatively similar except for large changes at the first and last time points where only the first and last phage population are included in the average, respectively.

Total phage and bacteria population size for the MOI2B experiment in Paez-Espino et al.Paez-Espino et al., 2015.

Circles are points digitized from Figure 1A in Paez-Espino et al., 2015; squares are the population size interpolated to match the sequencing dates in the experiment.

We calculated the time-shifted average overlap between bacteria and phage spacer types by comparing bacteria and phage types separated by a time delay and averaging over all points with the same time delay (Figure 7—figure supplement 4). As described in ‘Theoretical considerations when calculating average immunity from data’, we multiplied raw average immunity values by the total number of protospacers to account for multiple protospacers on each phage genome. The number of possible protospacers in phage 2972 for the CRISPR1 locus in S. thermophilus (AGAAW PAM) is 231 and the number for CRISPR3 is 465 (696 total). Note that because PAM mutations are common, the true number of protospacers may be less than this hypothetical amount, which would cause us to slightly overestimate average immunity. However, we are also assuming that each bacterium has one effective spacer, and this is likely an underestimate based on estimates of locus length from Paez-Espino et al., 2015.

As the similarity threshold increases, the overall overlap goes down slightly (Figure 7—figure supplement 4 and following) because there are more total types (Figure 72). Interestingly, the number of shared types across the whole dataset is very stable regardless of the clustering threshold: there is a tradeoff between the total number of types (which increases as the similarity threshold increases) and the likelihood that types are shared (which increases as the similarity threshold decreases). At high similarity thresholds, we expect that many spurious types will be created from sequencing errors in spacers, while at low similarity thresholds, genuine escape mutants will be grouped with spacers meaning that the immunity information contained in the overlap is not accurate. There are about 700 unique protospacer sequences in the wild-type phage genome, so we expect the number of unique types to be on the order of 700; they are for bacteria, but the number of phage types in our analysis can be quite a bit higher. For all similarity thresholds, we see that bacterial immunity is higher to phages from the past than phages from the future.

We experimented with different trimming thresholds for the data, removing time points from the beginning and end of the data. The early and late parts of the experiment experienced large fluctuations in population sizes, so if we are interested in steady-state behaviour, it makes sense to remove some time from the beginning and end. But, how much to remove? We can remove the points that look more different from the others either in terms of population size, total reads, or number of types: this could lead us to remove the first three and last three time points. Looking just at population size alone, it makes sense to remove the first time point and last two time points. Figure 7—figure supplement 4 shows results with different sets of points removed to compare the resulting overlap.

There were no protospacer matches to the reference genome wild-type spacers for CR1 and only a single protospacer match to CR3, even with a lenient 85% similarity threshold. This means the phage has effectively escaped all the wild-type spacers and only the new spacers matter. However, our results differ when wild-type spacers are removed since we also included spacer sequences from the control experiment as wild-type spacers. There are many sequence variants in the control experiment wild-type spacers that do cluster with other spacers, and there are protospacer sequences that cluster with supposed wild-type spacers at lenient grouping thresholds (Figure 75). This means that average immunity results with and without wild-type spacers are more similar at high similarity thresholds than at low thresholds.

Total number of unique protospacer sequences after removing all sequences that cluster with wild-type sequences at different similarity thresholds.

Only protospacers with a perfect PAM are included.

Our model predicts that phage population sizes should decrease as average immunity increases and that bacterial population sizes should increase as average immunity increases. We calculated average immunity at each experimental time point and compared it to the reported phage and bacterial population size at each time point. Phage and bacteria population sizes were digitized from Figure 1A in Paez-Espino et al., 2015. We found that for all similarity groupings, the log-transformed phage population size was significantly negatively correlated with average immunity, while there was no significant correlation with bacterial population size (Figure 76). This suggests that the mechanism of phage extinction is increasing bacterial immunity over time, and indeed the average immunity just before phage extinction is very high (Figure 77).

Phage (top) and bacteria (bottom) population size vs. average immunity calculated with data from Paez-Espino et al., 2015.

Protospacers are included if they have a perfect PAM, and all wild-type spacers are included. Spacers and protospacers are grouped with an 85% similarity threshold. Colours from blue to yellow indicate increasing time.

Average immunity vs time in the MOI-2B experiment.

Protospacers are included if they have a perfect PAM, and all wild-type spacers are included. Spacers and protospacers are grouped with an 85% similarity threshold.

Experimental data from wastewater treatment plant

This dataset is the source for Figure 6G in ‘Dynamics are determined by diversity’ and Figure 7D in ‘Time-shifted average immunity calculated from data’ reveals distinctive patterns of turnover.

We analysed metagenomic sequencing data from Guerrero et al., 2021a. This is a longitudinal study of a natural community of bacteria and phages in a wastewater treatment plant. Sixty samples were collected approximately biweekly over a 3-year period from a municipal sewage treatment plant, and whole-genome sequencing was performed on extracted DNA (Illumina platform, 250 bp paired-end reads). They focused on bacteria from the genus Gordonia, which is one of the most abundant taxa and was consistently present in samples. They detected CRISPR loci in Gordonia genome assemblies, then searched for matches to the two identified CRISPR repeats in all reads to detect other spacers. They constructed graphs of all the linked spacers based on spacer adjacency within reads. They searched for matches to spacers in the metagenomic data, then used matches to identify contigs that might be phages, resulting in two draft phage genomes (DC-56 and DS-92) that they followed over time. They searched for protospacers within these phage genomes using blast. They did not find matches between the Gordonia spacers and any other contigs from the dataset, suggesting that those two phages are the only two from which Gordonia is acquiring spacers. They used Crass and metaCRT to detect other possible CRISPR arrays within the unassembled data and other contigs; they did not find and/or do not discuss any other detected CRISPR loci. They found a broad fan-like network of CRISPR arrays within Gordonia that reflect new spacer acquisition, many of which matched the phage genomes. Older, conserved spacers largely did not match the phage genomes. They found that peaks in phage genome coverage (a proxy for abundance) over time were highly correlated with bacterial coverage, suggesting that their population dynamics are tightly coupled. They identified a PAM sequence, GTT (5’) for the CRISPR1 locus, but did not find that mutations occurred more frequently in the protospacer regions or PAM regions of the phage genomes than elsewhere in the genome.

Following the same analysis procedure we used for data from Paez-Espino et al., 2015 (‘Laboratory coevolution experimental data’), we used the two reported Gordonia CRISPR repeats (GTGCTCCCCGCGCGAGCGGGGATGATCCC for CRISPR1 (type I) and ATCAAGAGCCATGTCTCGCTGAACAGCGAATTGAAAC for CRISPR2 (type III)) to search for spacers in the raw read data using BLAST.

To detect wild-type spacers, we searched for matches to the CRISPR repeats on the Gordonia metagenome-assembled genome (MAG) provided on the study’s GitHub repository and extracted the sequences between repeat matches. We determined that spacers from the CRISPR1 locus had an average length of 32 nt and spacers from the CRISPR2 locus had an average length of 35 nt.

We detected spacers as sequences between or adjacent to repeat sequences. To maximize the number of genuine spacers detected while removing low-quality matches, we kept only full-length alignments to the repeat, unless a shorter alignment was also present on the same read as a full-length alignment (for instance, if the repeat is partially present at the start or end of a read). If a single repeat match was present on a read, we extracted 32 (CRISPR1) or 35 (CRISPR2) nt on either side and labelled these spacers. We set a minimum spacer length of 85% of the expected spacer length and did not keep spacer sequences at the start or end of a read that were shorter than this. If multiple repeat matches were present, we extracted intervening sequences as spacers.

We grouped all extracted spacers using the AgglomerativeClustering method from SciPy with an 85% similarity threshold and the ‘average’ linkage criterion. We performed this grouping separately for CRISPR1 and CRISPR2 spacers. Each spacer sequence was then labelled with a group type identifier.

To detect protospacers, we first blasted all reads against the Gordonia MAG and the two reported phage genomes (DC-56 and DS-92). We next blasted all unique detected spacer sequences from the previous step against all the reads, removing any query sequences that are a perfect subset of another sequence and any sequences that were more than 30% N nucleotide. This detects both potential protospacer sequences and the original spacers themselves. To isolate protospacers from spacers, we kept only results that were on reads that did not match the CRISPR1 or CRISPR2 repeat. Some reads matched both the bacterial genomes and the phage genomes; we removed matches for which the log-difference in e-value between phage genome match and bacteria genome match was less than 25, that is, the phage genome e-value must be lower than the bacteria e-value by a factor of 1025 to be considered not a bacteria match. Figure 78 shows the e-value differences for reads that matched both phage and bacteria for one time point.

Histogram of reads that matched both the Gordonia MAG and either phage reference genome vs the base-10 log difference in e-value between the matches for accession SRR9260993.

A positive value means the bacteria match has a lower e-value than the phage match. The vertical dashed line indicates the -25 cutoff; matches to the left were considered phage matches, matches to the right were considered bacteria matches.

We kept only potential protospacers that were 85% of the expected spacer length or larger. If there were multiple hits on a read from the same spacer type (but different sequences), we kept only the match with the lowest e-value. We extracted the matched sequence from the read and 10 nucleotides on either side to check for a PAM sequence (all spacers were stored oriented in the same direction relative to the repeat to facilitate comparison and PAM detection). Consistent with the PAM reported by Guerrero et al., we identified a GTT PAM at the 5’ end of protospacer sequences (Figure 79, reverse-complement).

Probability logo for the four nucleotides at the 5’ end of potential protospacer sequences, generated with WebLogo; the spacer sequence starts at the right edge of the logo meaning that the reverse-complement PAM is GTT.

Since reads are paired-end, some reads will overlap, and some spacers may be double-counted in the overlap. We decremented the total spacer or protospacer count for a sequence by 1 if a spacer sequence was present on both ends of a paired read.

We grouped all spacer and protospacer sequences together (separated by CRISPR locus) by several different similarity thresholds between 85 and 99% to assign type labels based on each similarity threshold.

We checked the 10 nt region upstream from each potential protospacer and removed any sequences that did not have a perfect match to the GTT PAM. Since targeting is highly sensitive to PAM mutations Shah et al., 2013, we assumed that any deviation from the perfect PAM meant that the protospacer would not be successfully targeted. There are cases where the PAM sequence is incomplete because of hitting the start or end of a read; these were also assumed to be imperfect PAMs (though some would be genuine). We detected 627,555 potential protospacers for the CRISPR1 locus and only 61 potential protospacers for the CRISPR2 locus. Since the CRISPR2 locus is a type III CRISPR system, we expect that there is no PAM for this system and so included all CRISPR2 protospacers; because there are so few, this choice does not affect results. After removing all CRISPR1 protospacers that did not have an adjacent match to the perfect PAM, we were left with 10,2761 protospacers in total representing 19,292 unique sequences and 537 unique types after grouping with an 85% similarity threshold. The total number of unique spacers detected from the CRISPR1 locus over the experiment was 5289 (2078 unique types at 85% similarity), while the total number from the CRISPR2 locus was an order of magnitude lower at 972 (357 unique types at 85% similarity). In total, 28,386 CRISPR1 spacers and 3353 CRISPR2 spacers were detected.

Our detection of reads that matched the Gordonia MAG and phage genomes is closely correlated with the reported coverage from the study (Figure 80).

Phage genome (top) and bacteria genome (bottom) coverage, digitized from Figure 2D of Guerrero et al., 2021a vs. number of reads assigned to phage or bacteria from our analysis.

Each marker is a separate time point.

The number of unique protospacer and spacer types we detected fluctuated considerably over the course of the study (Figure 81, Figure 82). The number of shared types as well as the ratio of phage types to bacteria types also fluctuated considerably.

Abundance over time for the 20 largest bacteria clones (top) and phage clones (bottom) over time for data from Guerrero et al., 2021a.

The left panels show absolute counts, and the right panels show fractional abundance for the included types. None of the top types are shared between phage and bacteria.

Number of shared spacer types between bacteria and phage (top left), ratio of the number of phage types to the number of bacteria types (top right), number of bacteria types (bottom left), and number of phage types (bottom right) as a function of the sampling date for data we analysed from Guerrero et al., 2021a.

Colours indicate different similarity grouping thresholds. Spacer counts include wild-type spacers, and protospacers are included only if they possess a perfect PAM. Data is summed over CRISPR1 and CRISPR2.

About 40% of spacer and protospacer types remained throughout the time series without going extinct at an 85% similarity threshold (Figure 83). In data from Paez-Espino et al., 2015, about 20% of original types remained at the end of the experiment (Figure 6E); however, the fraction of types remaining decreases steadily in data from Paez-Espino et al., 2015 unlike in this dataset.

Fraction of spacer types (left) and protospacer types (right) remaining as a function of time delay, averaged over the entire time series from Guerrero et al., 2021a.

Types are grouped with an 85% similarity threshold. The grey shaded region is the standard deviation across averaged data.

Calculating average immunity

To calculate average immunity between bacteria and phage, we interpolated counts between sequenced time points using a 14-day spacing (the most common sampling interval in the data). We calculated the time-shifted average overlap between bacteria and phage spacer types by comparing bacteria and phage types separated by a time delay and averaging over all points with the same time delay (Figure 7—figure supplement 5). As described in ‘Theoretical considerations when calculating average immunity from data’, we multiplied raw average immunity values by the average number of protospacers with the GTT PAM from the phage DC-56 and DS-92 genomes (1956 protospacers) to account for multiple protospacers on each phage genome. Note that because PAM mutations are common, the true number of protospacers may be less than this hypothetical amount, which would cause us to slightly overestimate average immunity. However, we are also assuming that each bacterium has one effective spacer, and this is likely an underestimate. As the similarity threshold increases, the overall overlap goes down slightly because there are more total types. As in the data from Paez-Espino et al., 2015, the number of shared types across the whole dataset is insensitive to the clustering threshold: there is a tradeoff between the total number of types (which increases as the similarity threshold increases) and the likelihood that types are shared (which increases as the similarity threshold decreases) (Figure 82).

We experimented with different trimming thresholds for the data, removing time points from the beginning and end of the data series before calculating time-shifted average immunity (Figure 7—figure supplement 8, Figure 7—figure supplement 9 and Figure 7—figure supplement 10). For all similarity thresholds and most trimming choices, there is a peak in average immunity at zero time shift that decays in both the past and future directions (Figure 7—figure supplement 5). However, if enough data is trimmed from the beginning to remove the large spikes in average immunity between days 200 and 400 (Figure 84), the central peak is quite diminished or removed altogether (Figure 7—figure supplement 9 and Figure 7—figure supplement 10). The standard deviation of average immunity is large near the zero time shift (Figure 7—figure supplement 5), which is because average immunity is highly variable across the time series: there is a large spike in average immunity near the middle of the time series (Figure 84).

Average immunity at each time point in the time series.

Raw values cannot be larger than 1, but plotted values are multiplied by 1956, the average number of protospacers with the GTT PAM from the phage DC-56 and DS-92 genomes, yielding some values larger than 1.

Because the variability in average immunity between time points was very high, we performed a Wilcoxon signed-rank test between the average immunity at zero time delay and the average immunity at a time delay ±200 and ±500 (arbitrarily chosen). We paired each shifted time point with its corresponding average immunity at zero delay for each bacterial abundance (Figure 85). Using a paired test answers the following question for each time point: is the overlap between bacteria and past phages significantly lower than the overlap in the present, and is the overlap between bacteria and future phages significantly lower than the overlap in the present? The test returns a statistic that is the sum of the ranks of differences greater than zero; a larger number indicates more asymmetry between the two compared datasets. We found that past immunity after 200 days is significantly lower than present (Wilcoxon statistic Z=1241, p-value p=0.008), but that immunity after 500 days is not significantly lower than present (Z=392,p=0.27). When comparing all time points with present, only a small range of past immunity values were significantly lower than present (Figure 86). For both time delays, we found that future immunity is significantly lower than present (Z=580,p=0.0012 for 500 days and Z=1293,p=0.003 for 200 days). Interestingly, these significance values are not symmetric if we pose the question from the perspective of phage: the overlap between phage and future bacteria at 500 days is lower than the overlap for present phages (Z=507,p=0.024), and the overlap between phage and past bacteria is lower than the overlap at present (Z=388,p=0.027). This asymmetry, that bacteria are generally more immune to all past phages while phage are not more infective against all past bacteria, is qualitatively the same as that reported by Dewald-Wang et al. in an explicit time shift study of bacteria and phage immunity and infectivity in chestnut trees Dewald-Wang et al., 2022.

Distribution of average immunity values at zero time shift (blue) and at 500 days time shift for future phages (orange) and 500 days time shift for past phages (green).
p-values for the Wilcoxon signed-rank test comparing the average immunity at zero time shift with all other time shifts.

Bacteria immunity against past phages is generally not significantly lower than bacterial immunity against current phages (blue), while bacterial immunity against future phages is significantly lower than immunity against current phages for almost all time shifts (orange). The time shifts are not symmetric; bacterial overlap with past phages is not necessarily the same as phage overlap with future bacteria and vice versa. The number of points that are available to compare decreases as the time shifts get larger (black dashed line).

We confirmed that the past and future shifted average immunity becomes less correlated with the zero-shift average immunity as the time shift increases. Figure 87 shows the time-shifted average immunity as a function of the zero-shift average immunity for three time shift values, and we see that average immunity is highly correlated at small time shifts but becomes less correlated at larger shifts.

Time-shifted average immunity for each time point as a function of the zero-shift average immunity of that data point for shifts comparing past phages (top row) and future phages (bottom row).

We randomly shuffled time point labels for bacteria and phages separately (Figure 7—figure supplement 6) and together (Figure 7—figure supplement 7) and calculated average immunity with the bootstrapped data. The average immunity trend as a function of time shift disappears in the bootstrapped data even if bacteria and phage clone abundances remain matched at each time point.

Unlike in the data from Paez-Espino et al., 2015, we did not find a correlation between phage or bacteria coverage and average immunity (Figure 88), though Guerrero et al., 2021a did report that bacteria and phage coverage was correlated and that periods of high phage coverage coincided with higher abundance of Gordonia strains lacking matching spacers, suggesting that the expected negative correlation between phage abundance and bacterial immunity may be playing out at some level but may be confounded by generally higher detected average immunity when coverage is high.

Phage (top) and bacteria (bottom) population size vs. average immunity calculated with data from Guerrero et al., 2021a.

Protospacers are included if they have a perfect PAM, and all wild-type spacers are included. Spacers and protospacers are grouped with an 85% similarity threshold. Colours from blue to yellow indicate increasing time.

Spacer turnover

For all sets of experimentally observed spacers (Paez-Espino et al., 2015, Burstein et al., 2016, and Guerrero et al., 2021a), we calculated spacer turnover over time. We interpolated spacer abundances grouped by 85% similarity over time using the smallest time delay between samples, then calculated the average fraction of spacer types remaining as a function of time delay across the entire dataset.

Appendix 1

Relationship of average immunity to Morisita–Horn index

This appendix is presented to contextualize our work with the broader language of diversity in ecology.

In certain limits, average immunity can be related to the Morisita–Horn similarity index Ψ (Wolda, 1981; Childs et al., 2012), where instead of comparing two populations from different timepoints or different regions, we are comparing bacteria and phage populations scaled by the immune benefit of CRISPR. If we assume that a matching spacer provides perfect immunity (e=1) and there is no cross-reactivity, then pV(i,j)=pV(1-δij) and average immunity is given by

(93) 1i,jnBinVjpV(i,j)pVi,jnBinVj=1pVi,jnBinVj(1δij)pVi,jnBinVj=inBinVinBsnV

This is the Morisita overlap index between bacteria and phage without the factor of 2DB+DV, where DB and DV are the Simpson’s diversity indices for bacteria with spacers and phages, respectively. In the limit that all phage and bacteria clones are equal sizes (nBinBs/m, nVinV/m), the factor 2DB+DVm, so we have Ψmeeff. We found in most simulations that effective ee/m, which implies that the Morisita overlap between bacteria and phage is constant across all parameters we study. A constant Morisita overlap implies that the population diversity evolves to attain the highest possible overlap. The full average immunity includes pairwise comparisons between all species and may be quite different from the Morisita overlap index.

Appendix 2

A neutral model of non-coevolving bacteria

This section provides the mathematical background for the statement that neutral diversity depends linearly on mutation rate made in ‘Phages drive stable emergent sequence diversity’.

In our results for the diversity in a coevolving population of bacteria and phage, we find that diversity approximately scales like meμη13 (Approximation for m). Intuitively this dependence seems quite weak, but we would like to know what to compare this to: what is the expectation for diversity in a population with mutations but without coevolution? An appropriate comparison point is a neutral model of a bacterial population that grows and dies; the birth and death rates are exactly matched to give a constant size at steady state. We also include the possibility of mutations: a bacterium may mutate to a new type with probability μ per division, and we assume that each mutation is to a new, never-before-seen type (infinite alleles model).

Let us consider a population of bacteria that divides with rate g, dies with rate F, and mutates with rate μ. We can write a master equation for the number of bacterial clones of size k. New mutants enter at clone size 1, and we assume that the total population size is constant at Nb=kkbk. Mutations effectively lower the growth rate of a particular clone.

(94) tbk=(gμ)[(k1)bk1kbk]+F[(k+1)bk+1kbk]+δk,1μNb

Generating function solution

The generating function for the probability distribution bk(t) is G(z,t)=kzkbk(t). Let us replace the birth and death rates by β and δ, respectively. β=g-μ, δ=F. Multiplying Equation 94 with kzk and noting that zG(z,t)=kkzk-1bk(t), we get the following differential equation:

(95) tG(z,t)=zG(z,t)(z2βz(β+δ)+δ)+Dz

The term Dz comes from the source term with D=μNb.

Equation 95 can be solved with the method of characteristics (Van Kampen, 1981). We parameterize the function G(z,t) with a new variable s. Applying the chain rule:

(96) sG(z(s),t(s))=Gzzs+Gtts

And by comparison with Equation 95, the characteristic equations are

(97) ts=1
(98) zs=(1z)(βzδ)
(99) Gs=Dz

From Equation 97 we see t=s+C, so we can choose t0=C=0 and replace s with t going forward.

Solving the characteristic equation for z by integrating both sides gives Equation 100.

(100) 1zδβze(βδ)t=C

At t=0, z will pass through some point z0, so we have the initial condition z(0)=z0. With z0 in Equation 100 at t=0, we get Equation 101, where C is given by Equation 100.

(101) z0=Cδ1Cβ1

The variation of G along this z-t curve is given by

(102) zG=Dzz2βz(β+δ)+δ

which has the solution

(103) G(z)=Dβδ[ln(1z)+δβln(δβz)]+Ω(C)

The constant Ω is a function of the characteristic z-t curve (Equation 100). To find the particular form of Ω(C), we apply the initial condition G(z,0)=zNb, meaning we start with one clone of size Nb at t=0.

(104) zNb=Dβδ[ln(1z)+δβln(δβz)]+Ω[1zδβz]

Let us use the temporary variable ξ=1-zδ-βz. Then z=ξδ-1ξβ-1, and the solution for Ω(ξ) is

(105) Ω(ξ)=ξδ1ξβ1NbDβδ[ln(1ξδ1ξβ1)+δβln(δβξδ1ξβ1)]

Now we can write the full solution for G(z,t), replacing ξ with ξϵ, where ϵ=e(β-δ)t.

(106) G(z,t)=(ϵδ(1z)δ+βzϵβ(1z)δ+βz)Nb+Dβδ[ln(1z)+δβln(δβz)]Dβδ[ln(1ϵδ(1z)δ+βzϵβ(1z)δ+βz)+δβln(δβϵδ(1z)δ+βzϵβ(1z)δ+βz)]

bk can be found by Taylor-expanding G(z) about z=0. The full distribution is cumbersome for this initial condition, but the limit as t, if β<δ, is independent of the initial condition:

(107) b(k)=Dkβk1δk=Dβkekln(δ/β)

Now we have required constant Nb=kkbk, which means

(108) Nb=kDβk1δk=Dδβ=μNbFg+μ

So to maintain constant population size, F=g, as expected. The diversity is the total number of clones, defined as kbk. We represent diversity as m.

This gives the following result for steady-state diversity:

(109) kbk=m=Dβln(δβδ)=μNbgμln(μg)

In our definition of the birth and death rates, it is implied that μ<g, and under that condition, the diversity is positive. If we let μg=u, we can write the diversity as

(110) m=Nbu1ulnu

For small u (i.e. mutation rate much smaller than division rate, usually true), m-Nbulnu. Equation 107 is equivalent to Fisher’s logseries with α=Dβ and x=βδ. This is also equivalent to Hubbell’s unified neutral theory in the limit of large sample size. This result for diversity (Equation 110) is also approximately equivalent to the result obtained in Vallade et al., 2003 and also in He and Hu, 2005 when the mutation rate is within a few orders of magnitude of g.

Comparison to coevolution model

In our bacteria-phage coevolution model, we find that diversity scales approximately like ma13, where

(111) a4eμLη^αn~Vη^αn~V+rn~V2(B1)2

For μL<<1, BpV1>>1 and pVr>η^gC0(1f), the dependence of diversity on a is given by Equation 112. Note that this expression correctly captures the fold change in diversity as a function of parameters, but the value itself is off by about a factor of 10.

(112) a4eμLη^(gC0(1f))3(BpV1)2α2pVr

Is this dependence quite different from what we get under neutral evolution without coevolution? Since we are measuring bacterial diversity in the coevolution model, the spacer acquisition probability η is a good analog to bacterial mutation rate in the non-coevolving model. Let us look at the change in diversity for a given change in spacer acquisition probability.

In the coevolution model, a kfold change in spacer acquisition probability gives approximately a k1/3 change in diversity, while under the non-coevolution model, a kfold change in mutation rate u gives a k(1u)1ku(1+lnklnu)fold change in diversity. For small u, this is approximately k(1+lnklnu). The change is dependent on the mutation rate, but if k is not very large and u<<1, then a kfold change in u gives approximately a kfold change in diversity. A similar correspondence can be reached with a different approximation. If a is large, the leading order contribution to m is a1/3, but a better approximation is given by Equation 113.

(113) m(a(1+lna3))13

If a is large, which it typically is, then the leading term is more accurately alna31/3. This is directly comparable to the ulnu dependence of the non-coevolving model: diversity in the coevolution model goes like ulnu1/3, and in the non-coevolving model it goes like ulnu. Either way, there’s a 1/3 power in the coevolution model that isn’t there in the simple model.

Appendix 2—figure 2 compares the predicted diversity and fold change in diversity in each model under the simple approximation assumptions here. Diversity increases more rapidly with mutation rate in the simple model than in the coevolution model.

Appendix 2—figure 1
Diversity vs. η (left), μ (centre), and e (right) for C0=104.

The η dependence of diversity is not very well predicted even by the full numerical solution, but for mutation rate and spacer effectiveness the approximate solutions do pretty well in this regime. The simulation data increase in diversity as a function of spacer acquisition probability actually goes more like mlnη.

Appendix 2—figure 2
Approximate predictions for diversity in our coevolution model (blue) or a simple model with mutation but no coevolution (orange).

(A) Predicted diversity as a function of spacer acquisition probability in our coevolution model as given by Equation 111 for C0=104, e=0.95, and μ=10-6 (blue). Predicted diversity in the non-coevolving model as given by m=-Nbu1-ulnu with Nb=1000 (orange). The low-μ limit is m=-Nbulnu. The high-μ limit is a series expansion in ϵ for μ=1-ϵ giving mNb(13+5μ6-μ26). (B) Fold change in diversity as a function of fold change in mutation rate or spacer acquisition probability under the simple approximation that mη1/3 (blue) and mu (orange).

Appendix 2—figure 3
Fold change in diversity (number of species) as a function of mutation rate and k, the fold change in mutation rate.

(A) Fold change in diversity as a function of fold change in mutation rate in the coevolution model: diversity increases by approximately a factor of k13, independent of mutation rate (solid line). (B) Fold change in diversity as a function of both mutation rate and fold change in mutation rate in the model without coevolution.

Appendix 3

Stochastic clone dynamics

In this appendix, we calculate clone size distributions for bacteria, the probability of extinction/establishment for phage clones, and the mean time to extinction for bacteria and phage clones. These results underlie our result for diversity; in particular the phage establishment probability is a core component of our calculation of diversity in ‘Phages drive stable emergent sequence diversity’. Phage establishment is also covered in ‘What determines the fitness and establishment of new mutants?’.

In this section we define pV(i,j) as given by Equation 18.

Clone fitness

Here we use mean-field expressions for phage and bacteria clone size from ‘Single-clone mean-field dynamics to calculate the fitness of new phage mutants’. These results are used in ‘Measuring diversity to justify our assumptions about the timescale of spacer acquisition relative to phage establishment.

We investigated the fitness of new phage mutants to understand the effect of bacterial spacer acquisition on phage mutant growth. We define the fitness of phage clones to be their per-capita average growth rate: their average growth rate in bacterial generations divided by their average size. We calculate the fitness from simulation data by calculating the mean phage clone size conditioned on survival (as in Figure 10, ‘Single-clone mean-field dynamics’), then taking the time derivative, then dividing by the mean phage clone size. (In principle, this is equivalent to first taking the time derivative of each individual clone trajectory and then averaging across all trajectories, but we found that edge effects from trajectories that go extinct skewed the result, so we first average across all trajectories before taking a derivative.) Phage clone fitness over time in a single simulation is plotted in Figure 3—figure supplement 1 (orange markers).

We calculate the theoretical phage clone fitness by taking the time derivative of the predicted mean phage clone size and dividing by the predicted mean clone size. The predicted mean phage clone size is piecewise-defined at short times as the numerical solution of the system of Equations 21 and 22, and at long times as the numerical solution steady-state clone size (Equation 26). This prediction is plotted as a solid black line in Figure 3—figure supplement 1.

New phage mutants have a positive growth rate on average (initial fitness >0), and their growth rate drops to zero on average as bacteria acquire matching spacers and gain immunity to new mutants. We can define the time or size at which phage clones become ‘established’, after which they are safe from rapid stochastic extinction and behave like neutral clones (having a fitness that is close to 0). We define one minus the long-time limit of the probability of phage clone extinction (Equation 159) as the probability of establishment for new phage clones:

(114) Pest=1P0Nest=1(12s0B(s0+δ0))Nest

where s0=αBpV-αnB-F is the average initial growth rate of phage clones and δ0=F+αnB(1-pV) is the average initial death rate of phage clones (more in ‘Phage clone probability of extinction’). The establishment clone size can then be defined as the value of Nest for which Pest1. Since N02s0B(s0+δ0) is not necessarily small, we approximate Pest as an exponential function and set Nest as the scale of the exponent:

(12s0B(s0+δ0))NesteN02s0B(s0+δ0)=eN0Nest
(115) NestB(s0+δ0)2s0

Equation 115 is the size at which phage clones become established on average (where P01/e). This is plotted as a horizontal dashed line for one simulation in Figure 3—figure supplement 1. If B=2 (birth-death with no bursts and positive selection), then Nest=s0+δ0s0δ0/s0. This is the expected result for a simple birth-death process with selection as given in Desai and Fisher, 2007.

We can also find the time at which phage clones reach the establishment size. In the absence of matching bacterial spacers, new phage mutants grow deterministically as nVi(t)=es0t. We condition on survival by dividing by the probability of establishment; for this calculation we use the long-time probability of establishment given by Equation 156 with s=s0 and δ=δ0. This curve does not match the measured growth at short times, but in the region of phages reaching their establishment size it agrees well (solid pink line in Figure 10).

With these assumptions, the growth curve for phage clones is

(116) nVi(t)es0t1(B(δ0+s0)2s0)(es0t1)2s0+B(es0t1)(δ0+s0)=1+B(es0t1)(δ0+s0)2s0

Replacing nVi with the establishment clone size (115) and solving for t, we find

(117) test=1s0ln(2B(s0+δ0)2s0B(s0+δ0))=1s0[ln2+ln(1s0B(δ0+s0))]

Now s0/(B(δ0+s0))<<1, so the second logarithm can be approximated as -s0/(B(δ0+s0)).

(118) test1s0[ln2s0B(δ0+s0)]

We can further approximate by dropping the second term entirely since ln2>>s0/(B(δ0+s0)).

(119) testln2s0

We get the same approximate value of test if we take B=2 directly and assume s0<<1. Equation 119 is also numerically close to the mean establishment time calculated by Desai and Fisher for positive selection, τest=γs (where γ0.577216) Desai and Fisher, 2007. This implies that the presence of a burst size B>2 does not dramatically change the dependence of the time to establishment on phage fitness. The phage growth rate s0 does still depend on B, however. (Note that Desai and Fisher define establishment time differently than we do, so our cases are not directly comparable.) Equation 119 is plotted as a vertical dashed line in Figure 3—figure supplement 1.

The initial fitness f0 of a new phage mutant can be computed analytically by using a different early-time approximation for nVi(t), this time conditioning on survival using the short-time approximation for P0(t) given by Equation 163.

(120) nVi(t)es0t1δ0β0+δ0(1e(β0+δ0)t)=es0t(β0+δ0)β0+δ0e(β0+δ0)t

To estimate the initial fitness, we differentiate 120 with respect to t, divide by nVi(t) to get the per-capita growth rate, and evaluate at t=0:

(121) f0=1nVi(t)dnVi(t)dt|t=0=s0+δ0(β0+δ0)e(β0+δ0)tβ0+δ0e(β0+δ0)t|t=0=s0+δ0

Interestingly, if we had not conditioned on survival, the initial per-capita growth rate would simply be s0: conditioning on survival increases the apparent growth rate of phage clones by effectively ignoring their death rate.

We can evaluate s0+δ0 in terms of our original parameters for insight. We replace B with Be-μL to capture the decrease in phage growth clone growth due to mutations away from a clone.

(122) f0=s0+δ0=αpVnB(BeμL1)

The initial fitness does not directly depend on characteristics of CRISPR immunity such as η and e because new phage mutants see the bacterial population as if it did not have any CRISPR immunity; f0 depends only on the total bacterial population size (Appendix 3—figure 1).

Appendix 3—figure 1
Phage clone initial growth rate vs. total bacteria normalized by the initial nutrient concentration C0.

Phage clone growth rate is as defined in Figure 3—figure supplement 1; for each simulation, the average phage clone growth rate is the derivative of the average phage clone size, averaged across all trajectories after steady state; plotted points and error bars are the average across three or more simulations. The phage initial fitness depends slightly on the phage mutation rate (mutants decrease the growth rate of a particular phage clone), but this dependence is slight enough that all mutation rates collapse onto the theoretical line. Here we plot data with μ=10-6. The phage clone initial growth rate also does not depend on e or η because new phage mutants see the bacteria population as if it did not have any CRISPR immunity. The theoretical initial phage clone growth rate is given by Equation 122. The effective lower bound of nB/C0 is set by the steady-state population size without CRISPR immunity: nB/C0=fgα(BpV-1)0.15.

Bacteria clone dynamics

In this section, we calculate clone size distributions for bacteria and the time to extinction for bacterial clones. Clone size distributions from simulations are shown in Figure 2A.

To solve for the dynamics of individual bacteria clones, we write a one-dimensional master equation just for nBj (Equation 123).

(123) dPndt=(n+1)Pn+1[F+r+αpV(nVenVi)]+(n1)Pn1[gC]+Pn1[αηnB0nVi(1pV)]nPn[F+r+αpV(nVenVi)+gC]Pn[αηnB0nVi(1pV)]

For brevity we write Pn=PnBi(t|N0), the probability of having n bacteria of type i at time t given N0 bacteria of type i at t=0.

Bacteria clone growth (gC), phage predation (αpV(nV-enVi)), outflow (F), and spacer loss (r) all depend on the number of bacteria n, but spacer acquisition (αηnB0nVi(1-pV)) adds new bacteria independent of the current size of the clone.

We assume that the total population is in steady state so that the total population sizes nV, C, and nB0 are constant. In general, nVi is time-dependent and varies for each clone i, but we will assume that it is also constant at steady state.

This equation is very nearly identical in form to the bacteria clone size equation solved in our previous work Bonsma-Fisher et al., 2018 as well as the clone size equation described in Dessalles et al., 2022, and we repeat our derivation of the solution here in brief.

We solve Equation 123 using a generating function approach: G(z,t)=nznPn(t). Multiplying Equation 123 by nzn, we get the corresponding generating function partial differential equation:

(124) tG(z,t)=zG(z,t)(d+bz2(b+d)z)+DG(z1)

Here b and d are the birth and death rates for bacterial clones: b=gC and d=F+r+αpV(nVenVi). D is the rate of spacer acquisition from naive bacteria: D=αηnB0nVi(1pV).

We solve Equation 124 using the method of characteristics Van Kampen, 1981. We parameterize the function G(z,t) with a new variable x. Applying the chain rule:

(125) xG(z(x),t(x))=Gzzx+Gttx

Comparing Equation 124 with Equation 125 gives the following characteristic equations:

(126) tx=1
(127) zx=(1z)(bzd)
(128) Gx=DG(z1)

From Equation 126, we see t=x+c1, so we can choose t0=c1=0 and replace x with t going forward. Solving the characteristic equation for z by integrating both sides gives Equation 129.

(129) 1zdbze(bd)t=c2

At t=0, z will pass through some point z0, so we have the initial condition z(0)=z0. With z0 in Equation 129 at t=0, we get Equation 130, where c2 is given by Equation 129.

(130) z0=c2d1c2b1

The variation of G along the z-t curve is

(131) Gz=DG(z1)(1z)(bzd)=DG(bzd)

Integrating both sides, we get

G(z)=Ω(c2)(bzd)Db

The constant Ω is a function of the characteristic z-t curve (Equation 129). To find the particular form of Ω(c2), we apply the initial condition PN0(0)=1 which gives G(z,0)=zN0, meaning that the clone starts at size N0 at time t=0.

G(z,0)=zN0=Ω(1zdbz)(bzd)Db

Let ξ=1-zd-bz, therefore z=ξd-1ξb-1.

Ω(ξ)(b(ξd1ξb1)d)Db=(ξd1ξb1)N0

Solving for Ω(ξ):

Ω(ξ)=(ξd1ξb1)N0(b(ξd1ξb1)d)Db

The full solution for G(z,t) can be written by replacing the constant Ω(c2) with the expression for Ω(ξ) and replacing ξ with ξϵ, where ϵ=e(b-d)t is the time-dependent part of the z-t curve.

G(z,t)=(bzd)Db(ξϵd1ξϵb1)N0(b(ξϵd1ξϵb1)d)Db

Finally, replacing ξ with 1-zd-bz, we get

G(z,t)=(bzd)Db((1z)ϵd+bzd(1z)ϵb+bzd)N0(b((1z)ϵd+bzd(1z)ϵb+bzd)d)Db

G(1,t)=nPn(t)=1, meaning that the total probability is conserved. Assuming d>b, the limit as t of G(z,t) is

G(z)=(bzdbd)Db

The limit is independent of the initial clone size N0 as we expect. We can construct Pn at steady state by taking successive derivatives of G(z):Pn=1n!nGzn|z=0

(132) Pn=1n!dn(dbd)D/bi=1n[D+(i1)b]
P0=(dbd)D/b

This is a negative binomial distribution with parameters D/b and b/d. We can re-write this expression using Stirling’s approximation for n! to facilitate evaluation at large n.

(133) Pn=12πnexp[Dbln(dbd)+i=1nln(end(D+(i1)b))]

Equation 133 is an analytic expression describing the steady-state spacer abundance distribution that results from our simulations. To compare this prediction with our simulations, we assume that nVi on average is equal to nV/m, where nV is the predicted total phage population size and m is the number of large phage clones approximated by the predicted bacterial diversity.

Appendix 3—figure 2 compares the analytic distribution to the steady-state spacer clone size distribution from our simulations at several values of the spacer acquisition probability η. The theoretical prediction captures the qualitative impact of increasing η fairly well: as η increases, the clone size distribution gains a more pronounced peak.

The discrepancy between the theoretical prediction and simulation data at high η results in part from the predicted large phage clone size being larger than the measured large phage clone size in simulations. This can happen when the predicted number of clones m is smaller than the simulation result. To assess whether this influenced our prediction, we also used maximum likelihood estimation to calculate the value of nVi that gave the best fit between Equation 133 and the data; for the two largest values of η, this does return a smaller value of nVi and hence a distribution peak further to the left.

The previous two calculations assumed that the phage clone size is single-valued and constant in time. In reality, however, the phage clone size is both broadly distributed (Figure 24) and changing in time (Figure 3—figure supplement 1). We relax the first assumption by calculating an average bacterial distribution using the observed distribution of phage clones: we solve Equation 133 at each observed large phage clone size, then average across the distribution of clone sizes to calculate P(nBi)=P(nBi|nVi)P(nVi). (The large phage clone distribution is given by Equation 28.) This average distribution more accurately predicts the presence of small bacterial clones at high η, but it actually behaves worse than the single-nVi solution at small η. This is related to the deviation of the number of large phage clones from the number of bacterial clones at small η (Figure 22): at small η, bacteria do not acquire spacers as readily and so phage clones experience clonal interference largely without bacterial influence. The average distribution then predicts more small bacterial clones than there are because the large phage clone distribution includes intermediate phage clone sizes that the bacteria do not end up acquiring spacers from. Essentially, the bacteria ‘see’ fewer phage clones than the theory predicts, so the observed bacteria clone size distribution has fewer smaller clones than predicted.

Equation 133 can be approximated for large n as a gamma distribution:

(134) Pn(1bd)DbΓ(Db)eln(d/b)n(1n)1Db
Appendix 3—figure 2
Clone size histograms (left) and cumulative distributions (right) for four different values of the spacer acquisition probability η.

In all simulations, C0=104, e=0.95, and μ=3×10-6. We sample 30 evenly spaced time points between 2000 and 10,000 bacterial generations and combine the clone sizes at each of the sampled points to create the clone size distributions plotted. Solid lines show three different theoretical solutions. The solid blue line is given by Equation 133 with all population quantities predicted from solving the system of Equations 13–17 with m given by Equation 32 and nVi=nV/m. The solid orange line is given by Equation 133, with the value nVi determined by maximum likelihood estimation (MLE) to give the best fit to the data. For the two largest values of η, the value of nVi returned by the MLE fit is smaller than the theoretical value of nVi*, while for the two smallest values of η the MLE value of nVi is larger. For large enough values of nVi, the bacteria clone death rate d is smaller than the birth rate b which violates the assumptions used to derive Equation 133. This happens for the MLE fit at the two smallest η values and hence no MLE solution is plotted. The solid green line is an average distribution calculated by solving Equation 133 at each observed large phage clone size and averaging across the distribution of clone sizes; i.e. P(nBi)=P(nBi|nVi)P(nVi). The large phage clone distribution is given by Equation 28.

This is a gamma distribution with shape parameter Db and rate parameter ln(d/b). Note that (1-bd)Dbln(d/b)Db, consistent with the canonical form of the gamma distribution. The shape parameter D/b describes the relative balance between spacer acquisition and growth: if D/b>1, then spacer acquisition is the dominant means by which bacterial clones grow. This often also means that the clone size distribution has a peak at clone size gt1 (provided db). Specifically, the mode of the distribution is greater than 1 if Dbb>dbb. The rate parameter describes the decay constant of the exponential distribution resulting if the shape parameter equals 1.

Bacteria clone extinction

When a matching phage clone exists in the population, bacteria with a particular spacer have a fitness advantage if they encounter that phage. Once the matching phage goes extinct, bacteria tend to quickly go extinct as well; in fact, bacteria often go extinct before their matching phage clone (Appendix 3—figure 3). To understand why this might be, we derive a theoretical prediction for the mean time to extinction under the assumption that bacterial clones evolve neutrally once they become large. This prediction does describe the extinction time distribution well in some regimes, but it underestimates the time to extinction at large total population sizes. If the neutral assumption is valid, this means that bacterial clones go extinct stochastically and are not necessarily driven to extinction by the extinction of their matching phage clone. This is true at small-to-medium total population sizes (C0104). On the other hand, when population sizes are large, bacteria clones go extinct more slowly than neutral theory would predict, likely because they are still being challenged by a matching phage and are also able to acquire spacers from that phage clone.

Appendix 3—figure 3
Bacteria and phage clone trajectories aligned to the time at which bacteria trajectories go extinct.

Bacteria trajectories are included if they reach size nBi* given by Equation 25 and all corresponding phage trajectories are plotted. In this simulation C0=3×104, e=0.8, η=10-3, and μ=10-6.

We calculate the time to extinction using the backward master equation corresponding to Equation 123. The backward equation is an equation for the time to extinction Tn from a given state. Instead of working with frequencies, we write this in terms of the number of bacteria belonging to a clone, n. Here b=gC, d=F+r+αpV(nV-enVi), and D=αηnB0nVi(1-pV).

(135) TnΔt=bnΔtTn+1+DΔtTn+1+dnΔtTn1+(1bnΔtDΔtdnΔt)Tn

Notice that the rate terms in the backward equation depend only on n, not on n-1 or n+1 like in the forward master equation. The forward equation is a sum of all the ways in which the system could end up at state n at time t from where it could have been at time t-Δt, so those rates depend on the other states. The backward equation goes in the other direction, looking backwards: it is a sum of all the ways in which the system could have been in state n at time T now that time Δt has elapsed.

Rearranging Equation 135, we arrive at Equation 136.

(136) 1=(bn+D)Tn+1+dnTn1(bn+dn+D)Tn

For boundary conditions, we have T(n=0)=0 (time to extinction is 0 when already extinct) and dTdn|n=nBs=0 (reflecting boundary at n=nBs). The reflecting boundary is harder to justify because in reality nBs is a flexible upper limit on clone size, but in steady state when nBs is approximately constant, it is true that no single clone will grow larger than nBs.

To solve Equation 136, we expand about n and keep terms up to 2nd order to get the Fokker–Planck equation:

(137) 1=dTdn(bn+Ddn)+12d2Tdn2(bn+D+dn)

To get an approximate solution, we drop the drift term, assuming that when clones are large their net growth rate is approximately zero so bn+D-dn0 (this is the same as setting nBi˙=0 in Equation 22). This gives the following differential equation:

(138) 112d2Tdn2(bn+D+dn)

The solution to Equation 138 with the boundary conditions described above is

(139) T(n)=2(b+d)2[DlnD(D+(b+d)n)ln(D+(b+d)n)+(b+d)n(1+ln(D+(b+d)nBs))]

Equation 139 with n=nBi* is plotted in Appendix 3—figure 4, Appendix 3—figures 57 and Appendix 3—figure 10. We also solved the full Fokker–Planck equation numerically without assuming the drift term is 0 (Equation 137). In this numerical solution, we change the value of nVi for different values of n=nBi to reflect the fact that at small bacteria clone sizes phage clones also tend to be smaller; we use the numerical solutions for average nvi(t) and nBi(t) shown as dashed lines in Figure 3—figure supplement 1. This solution is shown in Appendix 3—figure 3.

Appendix 3—figure 4
Mean time to extinction for bacterial clones after reaching size nBi* as a function of η for C0=104, e=0.95, and μ=3×10-6.

The solid line is given by Equation 139 with n=nBi*, and the dashed line is given by numerically solving Equation 137.

Approximate time to extinction

Since we are interested in the time to extinction once bacterial clones reach a large size, we can substitute n=nBi* in Equation 139 to gain insight into the time to extinction for bacteria. The average clone size nBi*=nBsm.

(140) T(nBi)=2(b+d)2[nBs(b+d)m(1+ln(nBs(b+d)+D))(nBs(b+d)m+D)ln(nBs(b+d)m+D)+Dln(D)]

We can substitute values for b+d using the steady-state deterministic solution for nBs and assuming nVi=nV/m:

(141) b+d=gC+F+r+αpVnV(1em)=2F+2r+2αpVnV(1em)α(1pV)ηnV1νν

To approximate the extinction time expression, we decompose nB, nV, and ν into series expansions in e/m, i.e. nVnV0+emnV1+em2nV2. We substitute these expressions into Equation 140 and perform an overall series expansion in e/m. The zeroth-order term is shown here and plotted in Appendix 3—figure 3B, and the solution to first order is plotted in Appendix 3—figure 3A.

(142) T[(2C02f2g2η(α(BpV1)+g)2(α(f1)(BpV1)fg1)(rln(C02fg2ηr(α+f(α(BpV1)+g)+α(B)pV)αm(BpV1)(g(αBC0(f1)ηpV2+pV(rα(B+1)C0(f1)η)+αC0(f1)η)+αpVr(BpV1)+C0fηg2(pV1)))((g(2αC0(pV1)(BpV1)+(3pV4)r)+α(3pV4)r(BpV1))ln(C02fg2η(α+f(α(BpV1)+g)+α(B)pV)(g(2αC0(pV1)(BpV1)+(3pV4)r)+α(3pV4)r(BpV1))αm(BpV1)(α(BpV1)+g)(g(αBC0(f1)ηpV2+pV(rα(B+1)C0(f1)η)+αC0(f1)η)+αpVr(BpV1)+C0fηg2(pV1)))(pV1)(g(2αC0(BpV1)+3r)+3αr(BpV1))(ln(C02fg2η(α+f(α+αBpV+g)+α(B)pV)(g(2αC0m(pV1)(BpV1)+r(3m(pV1)1))+αr(BpV1)(3m(pV1)1))αm(BpV1)(α(BpV1)+g)(g(αBC0(f1)ηpV2+pV(rα(B+1)C0(f1)η)+αC0(f1)η)+αpVr(BpV1)+C0fηg2(pV1)))+1))/(α(BpV1)+g)))÷(αm(BpV1)(g(2αC0(BpV1)+3r)+3αr(BpV1))2(C0η(pV1)(α+f(α(BpV1)+g)+α(B)pV)+pVr(α(BpV1)+g)g))]

This expression remains unwieldy, so we also do a series expansion in r and keep the zeroth-order term (equivalent to setting r=0). Spacer loss is not the dominant death rate for spacer-containing clones; the rate of spacer loss we use is an order of magnitude lower than the rate of cell death due to outflow (R=0.04,f=0.3). This expression becomes very simple:

(143) T(nBi)f(1+lnm)(α(BpV1)+g)α2m(BpV1)2

We can write Equation 143 in terms of total population sizes without CRISPR (n~B and n~V) to gain insight.

(144) T(nBi)n~BF+pVαn~V(1+lnm)m

If average immunity is low, then F+αpVn~VgC (from the mean-field equation for nB).

(145) T(nBi)n~BgC~(1+lnm)m=n~Bb(1+lnm)m

Equation 143 is compared to the measured time to extinction in Appendix 3—figure 3C. It is not a good approximation for low η, but at the two highest values of η it captures the trend reasonably well. The dependence of bacterial time to extinction on clone size (Equation 145, dropping the 1+lnm term) is shown in Appendix 3—figure 3.

Interestingly the dependence on m in this approximate expression is identical to the dependence on m in the phage extinction approximation (Equation 67). Furthermore, there is also a proportionality to the mean clone size in the form nB/m in the bacteria extinction equation (since nBs=nB if r=0, which is why this approximation works best at high η when ν is closer to 1). This is reasonable considering that the dynamics of bacteria and phage are ultimately matched to each other at steady state. This expression still depends on m, which is an emergent quantity dependent on all other parameters as well. We apply the approximation for m from 'Approximation for m' for more insight.

(146) Tbacteria1mf(α(BpV1)+g)α2(BpV1)2fgC0(1f)[r4αBeη(1pV)μL]13

This simplified expression for time to extinction gives insight into the main drivers of extinction. Perhaps counterintuitively, the bacteria clone time to extinction decreases as the growth rate gC~ increases — a higher growth rate means faster dynamics overall and lower time to extinction. Note, however, that a higher growth rate (for instance, with larger C0) also leads to larger nB.

Bacteria extinction depends inversely on α, B, e, μL, and η(1-pV), with the time to extinction decreasing with all parameters except the spacer loss rate r. The bacteria time to extinction increases with increasing flow rate f.

Phage clone dynamics

In this section, we calculate the probability of establishment for new phage mutants and the time to extinction for large phage clones. The phage establishment probability is a core component of our calculation of diversity in 'Phages drive stable emergent sequence diversity' and is explored on its own in 'What determines the fitness and establishment of new mutants?'. The phage time to extinction is also related to the speed of phage evolution in Figure 6, 'Dynamics are determined by diversity'.

To solve for the dynamics of individual phage clones, we can write a one-dimensional master equation just for nVj (Equation 147). Here we neglect mutations; we assume that mutations are rare and a burst always contributes to the clonal population being tracked. This master equation is a birth-death master equation with bursts – it is in the form of a classic birth-death master equation except that the population grows with a jump of size B1. Pn=PnVi(t|N0) is the probability of having n phages of type i at time t given N0 phages of type i at t=0.

(147) dPndt=(n+1)Pn+1[F+α(1pV)nB+αpVenBi]+(nB+1)PnB+1[αpVnBαpVenBi]nPn[αnB+F]

Phage clone probability of extinction

Equation 159 is the basis for our prediction of phage establishment probability (Equation 114).

We can find the probability of extinction P0 for new phage mutants under the assumption that nB and nBi are constant using Equation 147. We solve this equation using a generating function approach: G(z,t)=nznPn(t). Multiplying Equation 147 by nzn, we get the corresponding generating function partial differential equation:

(148) 1β+δtG(z,t)=zG(z,t)(1p+pzBz)

Here p=ββ+δ where β and δ are the birth and death rates for phage mutants: β=αpVnB-αpVenBi and δ=F+αnB(1-pV)+αpVenBi. We distinguish two cases of constant nBi, valid at early times after phage mutation, and nBi=nBi*, the deterministic steady-state value of nBi given by Equation 25, valid at long times after phage mutation. When we are assuming nBi=0, we use β=β0 and δ=δ0.

We solve Equation 148 using the method of characteristics:

(149) xG(z(x),t(x))=Gzzx+Gttx

This gives the following characteristic equations:

(150) tx=1β+δ
(151) zx=(1p+pzBz)

Integrating these two equations, we get t=xβ+δ, and

(152) x=z(0)z(x)dw1p+pwBw

The initial phage clone size is N0. The initial condition is

G(z(0),x=0)=z(0)N0=G(z(x),t(x))

The last equality is from the method of characteristics: the solution for G at the initial condition gives the full solution which is constant as parameterized by x.

The probability of extinction is given by setting z=0 in G(z,t).

P0(t)=G(z(t)=0,t)=z(0)N0|z(t)=0=ζN0

where ζ solves Equation 153, which we obtain by setting z(x)=0 in Equation 152 and replacing z(0) with ζ, since G(z,t)|z=0=z(0)N0|z=0=P0(t).

(153) x=0ζdw1p+pwBw

Long-time approximation for P0(t)

To approximate P0(t) at long times, we notice that since extinction probability becomes large at t=, then the large-t limit corresponds to ζ1.

The denominator of Equation 153 will be smallest when w is near 1, so the largest contribution to the integral will come from w near 1. Expanding the denominator to second order with w=1-ϵ, where ϵ is small:

1p+pwBw1p+p(1ϵB+12ϵ2B(B1))1+ϵ=ϵpϵB+p2ϵ2B(B1)

Changing back to the variable w and substituting back into Equation 153, the integral becomes

(154) x0ζdw1wpB(1w)+p2(1w)2B(B1)

which when evaluated gives

(155) ζ=(23Bp+B2p)(1est)2Bp(3est)+B2p(1est)
(156) ζ=(B(δ+s)2s)(est1)2s+B(est1)(δ+s)

where t is in units of minutes and we introduce the variable s=β(B-1)-δ=δ(p1-p(B-1)-1) (and p=s+δs+Bδ). The parameter s is the average growth rate of phage clones. When phage mutants first appear and nBi=0, this corresponds to s>0 and we write s=s0. As nBinBi*, s0.

The s=0 limit of 155 is

(157) ζ=Bδt2+Bδt

The long-time limit of 156 is

(158) ζ=23Bp+B2pBp(B1)=12sB(s+δ)

For a phage clone that begins at size N0=1 and for which there are no matching bacterial clones (nBi=0), Equation 159 gives the probability of that clone going extinct.

(159) P0=12s0B(s0+δ0)

If B=2 (a return to a simple birth-death equation where the population grows and dies by increments of 1), Equation 156 becomes

(160) ζ=δ(est1)est(δ+s)δ

This is the expected probability of extinction for a simple birth-death process with selection given in Desai and Fisher, 2007. (Note that time can be rescaled so that δ=1.)

The long-time limit of 160 is

(161) 1pp=δs+δ=δβ

This is the expected limit for the extinction probability for a simple birth-death process with selection. As s0, the probability of extinction goes to 1.

Note that the probability of extinction for B>2 (Equation 158) is larger than for B=2 (Equation 161). Somewhat paradoxically, this implies that one effect of the burst size B>2 is to increase the probability of extinction for new phage mutants. This happens because bursts are infrequent events; on average, B death events must occur for every one birth event, making death a more common stochastic outcome. The extinction probability for B>2 is 1B(2B-1+β(B-2)δ) times greater than for B=2. Expanding in 1/B for large B, this is βδ(B3)+O(1/B).

Small-time approximation for P0(t)

When t is small, P0 will be close to 0, which means ζ will be close to 0 and therefore we can approximate w as being near 0.

The denominator simplifies to 1-p-w for small w, and we get

(162) x0ζdw1pw
(163) ζ=(1p)(1ex)=(1p)(1eβ0t/p)=δ0β0+δ0(1e(β0+δ0)t)

Long-time approximation for P0(t) with no selection

In the above derivations of the probability of phage clone extinction, we generally assumed that phages had a selective advantage because there were no bacteria with matching spacers at the time a phage clone arose by mutation. If instead a phage clone does not have a selective advantage, this corresponds to s=β(B-1)-δ0, or p1/B is exactly 1/B where nB is given by the mean-field steady state with no CRISPR; in other words, phage clones do not have a selective advantage if the population is in steady state and there is no CRISPR immunity to distinguish clones from each other.

Expanding the denominator with w=1-ϵ as before:

1p+pwBwϵpϵB+p2ϵ2B(B1)

When p=1/B, the first two terms cancel out. We get the integral

(164) x2pB(B1)0ζdw(w1)2

which gives

(165) ζ=(B+B2)px2Bpx+B2px=(B1)Bpx2+(B1)Bpx=(B1)Bβt2+(B1)Bβt

Equation 165 is equivalent to Equation 157, assuming β(B-1)=δ. This is true when s=0, since s=0 corresponds to (B-1)β=δ.

Appendix 3—figure 5
Predicted bacteria clone time to extinction with drift.

Measured vs. predicted mean time to extinction for bacterial clones after reaching size nBi. The predicted time to extinction is the solution with drift, given by numerically solving Equation 167.

Neutral time to extinction from backward master equation

The phage time to extinction calculated here is compared to the speed of phage evolution in Figure 6, 'Dynamics are determined by diversity'.

We can calculate the time to extinction using the backward master equation corresponding to Equation 147. The backward equation is an equation for the time to extinction Tn from a given state. Instead of working with frequencies, we write this in terms of the number of phages belonging to a clone, n. Here β=αpVnB-αpVenBi and δ=F+αnB(1-pV)+αpVenBi.

(166) TnΔt=βnΔtTn+B1+δnΔtTn1+(1βnΔtδnΔt)Tn

Rearranging Equation 166, we arrive at Equation 167.

(167) 1=βnTn+B1+δnTn1(βn+δn)Tn

For boundary conditions, we have T(n=0)=0 (time to extinction is 0 when already extinct) and dTdn|n=nV=0 (reflecting boundary at n=nV). As with bacteria clones, we assume here that the upper limit on phage clone size is constant at nV at steady state.

To solve Equation 167, we expand about n and keep terms up to second order to get the Fokker–Planck equation:

(168) 1=dTdn(βn(B1)δn)+12d2Tdn2(βn(B1)2+δn)

At steady state and for large clones, β(B-1)δ, which means clones are approximately neutral and we can drop the drift term:

(169) 112d2Tdn2(βn(B1)2+δn)

This is a straightforward second-order ordinary differential equation, which has the solution

(170) T(n)=c1+nc22n(1+ln[n(β(B1)2+δ)])β(B1)2+δ

Using the boundary condition T(0)=0, the constant c1=0. Using the second boundary condition, c2=2ln[nV(β(B-1)2+δ)]β(B-1)2+δ, and the full solution is

(171) T(n)=2nβ(B1)2+δ(1lnnnV)

Once phage clones become large (by reaching size nVi* as given by Equation 26), they behave neutrally and Equation 171 agrees well with observed extinction times in simulations. We set n=nVi* and use the deterministic steady-state clone size for nBi given by Equation 25 in β and δ. Appendix 3—figure 9 compares observed large clone extinction times in a simulation (blue) to the prediction given by Equation 171 (orange dashed). Predicted and measured times to extinction for many simulations with Equation 171 are shown in Appendix 3—figure 10.

If nVi*nVm, which it is at steady state, we can rewrite Equation 171 as:

(172) T(n)=2nβ(B1)2+δ(1+lnm)
Appendix 3—figure 6
Approximations for bacteria clone extinction.

Measured mean time to extinction for large bacterial clones vs. three successively more aggressive analytic approximations for the mean time to extinction. (A) The predicted time to extinction is given by taking a series expansion in e/m and keeping terms to 1st order. (B) A series expansion in e/m and keeping the 0th order term (e=0) only. (C) Series expansion in e/m and r and keeping the 0th order term only (Equation 143).

Appendix 3—figure 7
Approximations for bacteria clone extinction as a function of mean clone size.

Bacteria extinction time as a function of mean clone size. Large bacteria clone mean time to extinction as a function of nBm, where m is measured from simulations and nB is the total bacteria population size. This parameter combination describes the trend in both phage and bacteria extinction reasonably well at large values of (bottom panels). The solid line is given by Equation 145 without the $1 + \ln m$ term.

The factor 1+lnm comes from the integral of the second-order differential equation and from the reflecting boundary assumption, and interestingly it would happen even if B=2 (no burst): integrating the Fokker–Planck equation gives a factor of 1-lnn; if we start from the steady-state value of nnV/m, this becomes 1+lnm-lnnV. Applying the reflecting boundary at n=nV cancels out the lnnV term. Very simply: the integral we are solving is -1nd2Tdn2, and this gives n(1-lnn) up to some constants. This simplified expression for time to extinction gives insight into the main drivers of extinction. The phage time to extinction decreases as the outflow rate f increases — if phages die at a higher rate, they go extinct more quickly. Interestingly, a larger burst size B decreases the time to extinction for large phage clones, consistent with an overall increase in the size of fluctuations as the burst size increases.

This expression still depends on m, which is an emergent quantity dependent on all other parameters as well. We apply the approximation for m from ‘Approximation for m for more insight.

(173) Tphage1m2gC0(1f)fαBpV1f[2rαBeη(1pV)μL]13

Comparing with Equation 146 for bacteria, both phage and bacteria extinction depend inversely on α, B, e, μL, and η(1-pV), with the time to extinction decreasing with all parameters except the spacer loss rate r. The bacteria time to extinction increases with increasing flow rate f, while the phage time to extinction decreases with f.

The clone size dependence of phage extinction is shown in Appendix 3—figure 11, comparing measured extinction times to the prediction of Equation 172 without the 1+lnm term. This captures the dominant trend in extinction time behaviour.

Appendix 3—figure 8
Phage clone extinction times and theoretical predictions for a simulation with parameters C0=104,η=0.001,e=0.95,μ=10-5.

Time zero for each trajectory is the time at which that clone arose by mutation. All simulation trajectories are plotted in blue, and a subset of trajectories that do not reach a size of nVi*=16170 as given by Equation 26 are plotted in orange. Trajectories that do not become established by this definition go extinct more quickly than ones that do. All other curves are theoretical predictions for the extinction time. The green and red solid lines and purple and brown dashed lines show a numerical solution to Equation 153 with different values for s. The remaining dashed lines are a small time approximation given by Equation 163, and a large time approximation given by Equation 156 with either s=s0 or s=0. All of these predictions agree with the simulation data in different regimes; none accurately captures the entire timecourse of extinctions.

Appendix 3—figure 9
Large clone extinction times from a simulation with parameters C0=104,η=0.001,e=0.95,μ=10-5.

Trajectories are counted as large if the phage clone size passes nVi*=16170, the theoretical deterministic mean phage clone size for these parameters given by Equation 26. Once a trajectory reaches nVi*, we count that point as time zero to measure the extinction time of large clones. The numerical solution is given by solving Equation 153 with s=0 (p=1/B). The large time approximation is given by Equation 157. The orange dashed line gives an exponential decay prediction with the mean time to extinction given by Equation 171 with n=nVi*.

Appendix 3—figure 10
Mean time to extinction for large bacteria (left) and phage (right) clones as a function of the neutral fitness mean time to extinction prediction.

The solid black lines describe approximate analytic expressions for bacteria and phage time to extinction using a neutral fitness assumption. For bacteria, the black line solves Equation 139 using the mean bacteria and phage clone sizes. For phages, the black line solves Equation 171 using the mean phage clone size in place of n.

Appendix 3—figure 11
Phage extinction time as a function of mean clone size.

Large phage clone mean time to extinction as a function of nVm, where m is measured from simulations and nV is the total phage population size. The solid line is given by Equation 172 without the 1+lnm term.

Data availability

We analysed published genetic data from three sources: PRJNA268031 (Burstein et al., 2016), PRJNA275232 (Paez-Espino et al., 2015), and PRJNA484416 (Guerrero et al., 2021a). We also used phage genomes associated with Guerrero et al., 2021a available on GitHub at https://github.com/GuerreroCRISPR/Gordonia-CRISPR (Guerrero et al., 2021b). Processed spacer and protospacer sequences for all datasets, along with source code and data for all main text figures, is available on GitHub at https://github.com/mbonsma/CRISPR-dynamics-model (copy archived at Bonsma-Fisher and Goyal, 2023). Raw simulation data has been uploaded to Dryad: https://doi.org/10.5061/dryad.sn02v6x74.

The following data sets were generated
    1. Bonsma-Fisher M
    2. Goyal S
    (2022) Dryad Digital Repository
    Simulation data from: Dynamics of immune memory and learning in bacterial communities.
    https://doi.org/10.5061/dryad.sn02v6x74
The following previously published data sets were used
    1. Paez-Espino D
    2. Sharon I
    3. Morovic W
    4. Stahl B
    5. Thomas BC
    6. Barrangou R
    7. Banfield JF
    (2015) NCBI BioProject
    ID PRJNA275232. Streptococcus thermophilus infected with lytic phage 2972.
    1. Guerrero LD
    2. Pérez MV
    3. Orellana E
    4. Piuri M
    5. Quiroga C
    6. Erijman L
    (2020) NCBI BioProject
    ID PRJNA484416. Time series genome centric analysis in activated sludge microbial communities.

References

    1. Childs LM
    2. Baskerville EB
    3. Cobey S
    (2015) Trade-Offs in antibody repertoires to complex antigens
    Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences 370:1–10.
    https://doi.org/10.1098/rstb.2014.0245
    1. Koonin EV
    2. Makarova KS
    (2019) Origins and evolution of CRISPR-cas systems
    Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences 374:20180087.
    https://doi.org/10.1098/rstb.2018.0087
  1. Thesis
    1. Lucchini S
    (1999)
    Genetic Diversity of Streptococcus thermophilusPhages and Development of PhD thesis
    Swiss Federal Institute of Technology Zurich.

Article and author information

Author details

  1. Madeleine Bonsma-Fisher

    Department of Physics, University of Toronto, Toronto, Canada
    Contribution
    Conceptualization, Data curation, Software, Formal analysis, Funding acquisition, Investigation, Visualization, Methodology, Writing - original draft, Writing - review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-5813-4664
  2. Sidhartha Goyal

    1. Department of Physics, University of Toronto, Toronto, Canada
    2. Institute of Biomaterials & Biomedical Engineering, University of Toronto, Toronto, Canada
    Contribution
    Conceptualization, Resources, Supervision, Funding acquisition, Methodology, Project administration, Writing - review and editing
    For correspondence
    goyal@physics.utoronto.ca
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-7452-892X

Funding

Natural Sciences and Engineering Research Council of Canada (Vanier Canada Graduate Scholarship)

  • Madeleine Bonsma-Fisher

Ministry of Colleges and Universities (Queen Elizabeth II Graduate Scholarship in Science & Technology)

  • Madeleine Bonsma-Fisher

Walter C. Sumner Foundation (Walter C. Sumner Memorial Fellowship)

  • Madeleine Bonsma-Fisher

Natural Sciences and Engineering Research Council of Canada (Discovery Grant RGPIN-2015)

  • Sidhartha Goyal

Natural Sciences and Engineering Research Council of Canada (Discovery Grant and RGPIN-2021)

  • Sidhartha Goyal

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We acknowledge helpful discussions and comments from all the members of the Goyal and Zilman groups. We thank Vijay Kumar for help with extinction probability of phage clones.

Copyright

© 2023, Bonsma-Fisher and Goyal

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 891
    views
  • 204
    downloads
  • 1
    citations

Views, downloads and citations are aggregated across all versions of this paper published by eLife.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Madeleine Bonsma-Fisher
  2. Sidhartha Goyal
(2023)
Dynamics of immune memory and learning in bacterial communities
eLife 12:e81692.
https://doi.org/10.7554/eLife.81692

Share this article

https://doi.org/10.7554/eLife.81692

Further reading

    1. Computational and Systems Biology
    2. Physics of Living Systems
    Divyoj Singh, Sriram Ramaswamy ... Mohd Suhail Rizvi
    Research Article Updated

    Planar cell polarity (PCP) – tissue-scale alignment of the direction of asymmetric localization of proteins at the cell-cell interface – is essential for embryonic development and physiological functions. Abnormalities in PCP can result in developmental imperfections, including neural tube closure defects and misaligned hair follicles. Decoding the mechanisms responsible for PCP establishment and maintenance remains a fundamental open question. While the roles of various molecules – broadly classified into ‘global’ and ‘local’ modules – have been well-studied, their necessity and sufficiency in explaining PCP and connecting their perturbations to experimentally observed patterns have not been examined. Here, we develop a minimal model that captures the proposed features of PCP establishment – a global tissue-level gradient and local asymmetric distribution of protein complexes. The proposed model suggests that while polarity can emerge without a gradient, the gradient not only acts as a global cue but also increases the robustness of PCP against stochastic perturbations. We also recapitulated and quantified the experimentally observed features of swirling patterns and domineering non-autonomy, using only three free model parameters - rate of protein binding to membrane, the concentration of PCP proteins, and the gradient steepness. We explain how self-stabilizing asymmetric protein localizations in the presence of tissue-level gradient can lead to robust PCP patterns and reveal minimal design principles for a polarized system.

    1. Physics of Living Systems
    Tommaso Amico, Samuel Toluwanimi Dada ... Amos Maritan
    Research Article

    Many proteins have been recently shown to undergo a process of phase separation that leads to the formation of biomolecular condensates. Intriguingly, it has been observed that some of these proteins form dense droplets of sizeable dimensions already below the critical concentration, which is the concentration at which phase separation occurs. To understand this phenomenon, which is not readily compatible with classical nucleation theory, we investigated the properties of the droplet size distributions as a function of protein concentration. We found that these distributions can be described by a scale-invariant log-normal function with an average that increases progressively as the concentration approaches the critical concentration from below. The results of this scaling analysis suggest the existence of a universal behaviour independent of the sequences and structures of the proteins undergoing phase separation. While we refrain from proposing a theoretical model here, we suggest that any model of protein phase separation should predict the scaling exponents that we reported here from the fitting of experimental measurements of droplet size distributions. Furthermore, based on these observations, we show that it is possible to use the scale invariance to estimate the critical concentration for protein phase separation.