Recent reports have suggested that self-propagating CRISPR-based gene drive systems are unlikely to efficiently invade wild populations due to drive-resistant alleles that prevent cutting. Here we develop mathematical models based on existing empirical data to explicitly test this assumption for population alteration drives. Our models show that although resistance prevents spread to fixation in large populations, even the least effective drive systems reported to date are likely to be highly invasive. Releasing a small number of organisms will often cause invasion of the local population, followed by invasion of additional populations connected by very low rates of gene flow. Hence, initiating contained field trials as tentatively endorsed by the National Academies report on gene drive could potentially result in unintended spread to additional populations. Our mathematical results suggest that self-propagating gene drive is best suited to applications such as malaria prevention that seek to affect all wild populations of the target species.https://doi.org/10.7554/eLife.33423.001
Gene drive is a genetic engineering technology that can spread a particular suite of genes throughout a population. Among the types of gene drive systems, those based on the CRISPR genome editing technology are predicted to be able to spread genes particularly rapidly. This is because components of the CRISPR system can be tailored to replace alternative copies of a particular gene, ensuring that only the desired version is passed on to offspring. In this way, for example, a gene that prevents mosquitoes from carrying or transmitting the malaria parasite could be introduced to a very large wild population to reduce the incidence of the disease among humans.
Gene drives can be “self-propagating” or “self-exhausting”: the former are designed so that they can always spread as long as there are wild organisms around, whereas the latter are expected to lose their ability to spread over time. Self-propagating CRISPR gene drives have been shown to work in controlled populations of fruit flies, mosquitoes and yeast. These experiments happen in a controlled environment in the laboratory, so the organisms edited to have the gene drive elements do not come in contact with susceptible wild organisms. However, if just a few were to escape, the gene drive could theoretically spread quickly outside the laboratory.
Noble, Adlam et al. investigated, using mathematical models, whether or not – and how fast – a self-propagating CRISPR-based gene drive would spread if a number of organisms with the gene-drive elements were released into the wild. The models showed that the release of just a few of the edited organisms would result in the gene drive spreading to most populations that interbreed. This happened regardless of the structure of the wild populations or whether a degree of resistance to the drive emerged. As a result, even the smallest breach of a contained trial could lead to significant gene drive spread in the wild.
The findings suggest that self-propagating gene drive technologies would be most useful where the invasion of most wild populations of the target species is the intended purpose, rather than a risk to be avoided. As a result, a self-propagating CRISPR-based gene drive could be well suited to spreading among mosquitoes to impede the malaria parasite, provided there were strong international agreements in place. The findings also underline the difficulty of carrying out safe field trials of self-propagating gene drives, and the need for very tight control of laboratories carrying out experiments in this field. Lastly, they highlight the importance of developing and testing the evolutionary stability of self-exhausting gene drives, which could be better contained to local populations.https://doi.org/10.7554/eLife.33423.002
CRISPR-based gene drive systems can bias inheritance of desired traits by cutting a wild-type allele and copying the drive system in its place (Esvelt et al., 2014). Following reports of successful CRISPR gene drive systems in yeast (DiCarlo et al., 2015) and fruit flies (Gantz and Bier, 2015), scientists emphasized the need to employ strategies beyond traditional barrier containment as a laboratory safeguard (National Academies of Sciences, Engineering, and Medicine, 2016; Akbari et al., 2015). These precautions were judged necessary to prevent unintended ecological effects, but also because any unauthorized release affecting a wild population could severely damage trust in scientists and governance, significantly delaying or even precluding applications of gene drive and other biotechnologies.
Drive resistance can result from mutations that block cutting by the CRISPR nuclease. Recent examinations of the phenomenon by experiments and deterministic models have generated substantial media attention (Champer et al., 2017; Unckless et al., 2017; Drury et al., 2017; Noble et al., 2017). Resistance can arise from standing genetic variation at the drive locus or because the drive mechanism is not perfectly efficient and is predicted to prevent drive fixation in wild populations unless additional mitigating strategies are employed (Burt, 2003; Deredec et al., 2008; Esvelt et al., 2014; Noble et al., 2017; Marshall et al., 2017). Recent articles highlighting the problem of resistance for self-propagating gene drives have suggested that it might prevent drive invasion in wild populations—with some even implying that resistance could serve as an experimental safeguard. While we agree that resistance should prevent drive fixation, an allele can nonetheless spread to significant frequency without fixing. To clarify this point, we sought to quantify the likelihood and magnitude of spread in the most likely unauthorized release scenario—a small number of engineered individuals released into a wild population.
CRISPR-based gene drive systems function by converting drive-heterozygotes into homozygotes in the late germline or early embryo (Esvelt et al., 2014) (Figure 1A). First, a CRISPR nuclease encoded in the drive construct cuts at the corresponding wild-type allele—its target prescribed by an independently expressed guide RNA (gRNA)—producing a double-strand break (Jinek et al., 2012). This break is then repaired either through homology-directed repair, producing a second copy of the gene drive construct, or through a nonhomologous repair pathway (non-homologous end joining, NHEJ, or microhomology-mediated end joining, MMEJ), which typically introduces a mutation at the target site (Mali et al., 2013; Cong et al., 2013). Because the drive target is determined through sequence homology, such a mutation generally results in resistance to future cutting by the gene drive. Thus, the allele converts from a wild-type to resistant allele if it undergoes repair by a pathway other than homology-directed repair. Moreover, drive-resistant alleles are expected to exist in wild populations simply due to standing genetic variation (Unckless et al., 2017; Drury et al., 2017).
Deterministic models, which assume an infinite, well-mixed population, predict whether an allele is favored to increase in frequency when initially rare in a wild population. Whether gene drives are predicted to invade by deterministic models depends on two key parameters: the homing efficiency (), or the probability of undergoing homology-directed repair instead of nonhomologous repair, and fitness (), or the relative fecundity or death rate the drive and its cargo confer on their organism compared to the wild-type. Mathematically, drives are initially favored by selection if , i.e., if the inheritance bias of the drive exceeds its fitness penalty (Noble et al., 2017; Deredec et al., 2008; Unckless et al., 2015). Given that the homing efficiencies of reported drive systems typically range from 0.37 to 0.99 (Appendix 1—table 1), current drive systems can clearly invade in deterministic models. Although the fitness parameter, , is typically not measured in proof-of-concept studies, a substantial fitness cost is tolerable by all reported CRISPR drive constructs (DiCarlo et al., 2015; Gantz and Bier, 2015; Champer et al., 2017; Gantz et al., 2015; Hammond et al., 2016) (Figure 1B).
However, in finite populations, the fate of initially rare alleles is determined not only by selection but also by stochastic fluctuations (Wright, 1931; Fisher, 1930; Haldane, 1927). Therefore, stochastic models are required to predict the probability that a drive will invade a population upon the introduction of a very small number of individuals, even when deterministic models predict that they are to invade. A previous, and arguably prescient, stochastic model of endonuclease drive containment found that homing-based drives, such as those subsequently developed using CRISPR, were among the likeliest to invade of the various drive alternatives (Marshall, 2009). To determine whether self-propagating homing drives are still able to invade in the presence of resistance, we formulated a finite population, stochastic, Moran-based model that allows us to study small releases in finite and structured populations (Materials and methods).
Our model considers three distinct allelic classes: wild-type (W), gene drive (D), and resistant (R). Consistent with experiments, we assume that the drive invariably cuts the wild-type allele in the germline of a heterozygous WD individual, converting to a drive allele with probability , or a resistant allele with probability . Each genotype, AB, has a relative reproductive rate, , corresponding to its fitness in deterministic models, normalized such that the wild-type homozygote has fitness one (), the drive confers a dominant cost (), and resistance is neutral (). This ordering of the parameters conservatively represents the worst-case scenario for drive spread (Comparison with deterministic model).
At the population level, our basic model considers diploid individuals mating randomly. The process unfolds in discrete steps, during which parents are chosen for reproduction, an offspring is chosen according to the mechanism above, and another individual is replaced by the offspring (Figure 1C and Materials and methods). These steps are repeated until one allele fixes. A generation is time-steps, which corresponds to the mean lifespan of an individual.
Code to perform numerical simulations of this model and all model extensions described below (C++, Matlab), as well as data files, documentation, and code to reproduce all of the figures shown here (Matlab) can be found at GitHub (Noble, 2018; copy archived at https://github.com/elifesciences-publications/drive-invasiveness).
Figure 1D shows typical simulations for drive efficiencies of , and , which correspond respectively to a constitutively active drive system targeting a common insertion site, and conservative and high efficiency systems (based on previous experimental studies, Appendix 1—table 1, Figure 1B, Empirical data supplement). These simulations assume a dominant drive fitness cost of 10%, a population of size 500, and a release of 15 drive-homozygous individuals. (Note that the dynamics are similar for larger population sizes; see Population size and Figure 3). In all three cases, the drive, on average, irreversibly alters a majority of the population, either via invasion of the drive itself or via spread of drive-created resistant alleles. We call the maximum frequency of drive alleles reached during a simulation the peak drive, and we say a drive has invaded if it establishes in the population, ensuring behavior qualitatively similar to deterministic models (Comparison with deterministic model). Notably, for sufficiently large populations, arbitrarily low frequencies meet this standard, as it depends on the absolute number of drive alleles rather than their frequency (Analytic formulae for the escape probability in structured populations). Note also that each of these examples is chosen from the parameter regime in which invasion is predicted by deterministic models, since invasion is very unlikely outside of this regime.
We next calculated the distribution of peak drive while varying the number of organisms released (Figure 1E and F). We find that these distributions are bimodal, with one mode centered around the initial frequency (corresponding to drift leading rapidly to extinction) and one centered roughly around the maximum values observed in the large-release scenarios in Figure 1D. The former mode shrinks rapidly as more organisms are released, and for the parameters studied, a release of individuals nearly guarantees invasion with substantial peak drive (Comparison with deterministic model, Figure 10).
To understand the extent to which isolation might prevent invasion of other populations connected by gene flow, we introduced population structure. Our model consists of five subpopulations (or islands) that are equally connected by migration (Figure 2A and Finite population model with population structure). Typical dynamics are illustrated in Figure 2C. Figure 2B and D show the escape probability, or the probability of the drive invading (arbitrarily defined as attaining a frequency of 0.1) at least one subpopulation other than its originating one, and Figure 2E shows the probability of invading a varying number of subpopulations.
Our results in Figure 2 suggest that if the migration rate is extremely low, then the drive is effectively contained in the initial subpopulation. If the migration rate is high, the drive is almost guaranteed to invade all subpopulations linked to the originating one. For intermediate migration rates—characterized roughly by migration rates on the order of the inverse of the drive extinction time—both outcomes occur. In the scenario studied in Figure 2, a migration rate of , which corresponds to a single migration event every generations on average (Materials and methods), virtually guarantees escape for moderate drive efficiencies (Materials and methods). For further details and analytical formulae allowing rapid estimation of escape probabilities, see Analytic formulae for the escape probability in structured populations.
Finally, we sought to understand the effects of additional mitigating factors that could potentially affect peak drive or invasion. We considered the most prominent factors that have arisen in previous papers, and we studied each by varying parameters in our basic model and developing model extensions. Our results are explored in detail in the Materials and methods.
First, we considered preexisting drive resistance resulting from standing genetic variation (Unckless et al., 2017; Drury et al., 2017) (Standing genetic variation). We find that increasing the proportion of the population that is initially resistant linearly decreases the mean peak drive (). Using the parameters in Figure 1E and considering a release of 15 individuals, more than preexisting resistance is required to contain average peak drive below (Figure 4).
Second, we studied the effect of varying family size, which may be relevant to species such as mosquitoes with large egg batch sizes (Hammond et al., 2016; Yaro et al., 2006). We extended the model so that (adult) offspring are produced from a reproduction event, rather than one. We find that this effect scales the release and population sizes (Hill, 1972) by a factor of . For illustration, we estimated for Anopheles gambiae to be roughly (Offspring number distribution), so that a release of individuals roughly corresponds to a release of individual in our basic model. While this effect somewhat reduces the chance of drive invasion for small release sizes, it does not preclude it.
Third, we varied drive fitness, resistant-individual fitness and homing efficiency across their entire parameter regimes and recorded peak drive (Effect of varying fitness and homing efficiency, Figure 7, Figure 8). While varying drive fitness, we find that peak drive is on average greater than across the majority of the regime and almost always greater than (Figure 7, left)—and, as a technical aside, we find that this is the case whether the fitness cost of the drive manifests itself via a reduction in birth rate or via increase in death rate (Figure 7, right). Moreover, in line with previous deterministic results, we find that peak drive can be substantially increased by associating a fitness cost with resistance (Figure 8), which could be expected for drive constructs intended for large-scale application, utilizing methods such as multiplex targeting of essential genes (Esvelt et al., 2014; Noble et al., 2017; Marshall et al., 2017).
Fourth, we studied the effect of inbreeding, which has been shown in several recent theoretical studies (Bull, 2017; Drury et al., 2017) to impede drive spread (Inbreeding). We extended the model to include a probability of an individual selfing rather than mating with a second individual (Bull, 2017). The model assumes no inbreeding depression and thus considers the worst-case scenario for drive (Bull, 2017). We find that even in this scenario, high selfing probabilities are required to reduce peak drive and the probability of invasion for moderate drive costs.
There are a variety of other phenomena that could affect invasiveness, e.g., density dependence (Deredec et al., 2011), environment (Tanaka et al., 2017), costly resistance (Traulsen and Reed, 2012), local ecology, and even mating incompatibilities between some laboratory strains and wild individuals. Such effects should be carefully studied in subsequent papers. Most importantly, the drive architecture itself should affect invasiveness; we consider here only alteration-type drive systems, while others, e.g., sex-ratio distorters and genetic load drives, would be expected to yield different dynamics. In particular, population suppression drive systems may locally self-extinguish before invading new populations. However, for alteration drives, our key qualitative finding—that peak drive is difficult to reliably contain below a socially tolerable threshold following a very small release of organisms—appears robust to a variety of mitigating factors. Fundamentally, we exercise caution by omitting application-specific phenomena that might aid containment in particular instances but not in general.
Our results suggest that current first-generation CRISPR-based gene drive systems for population alteration are capable of far-reaching—perhaps, for species distributed worldwide, global—spread, even for very small releases. A simple, constitutively expressed CRISPR nuclease and guide RNA cassette targeting the neutral site of insertion—an arrangement that could occur accidentally—may be capable of altering many populations of the target species depending on the homing efficiency of the organism in question. More generally, resistance can be problematic for intentional applications of gene drives, but we find that it is not a major impediment to invasion of unintended populations.
These findings raise two important questions: (1) How likely are unauthorized releases of self-propagating gene drive systems in the first place? (2) How likely are serious negative consequences given the apparently high likelihood of spread to most populations of the target species? Rigorously addressing these questions is an important direction for future work, and we can offer only opinions here. The answer to the first question likely depends on a large number of factors, such as species, application, containment strategies, economic motivations, drive development stages, geography, and the caution of the investigators, so we omit speculation here. However, we consider the answer to the second question to be clearer: although most laboratory gene drive systems are unlikely to cause ecological changes—they are typically predicted to be transient and are not designed to alter traits of the host organism, least of all interactions with other species—the history of genetic engineering offers many examples suggesting that substantial social backlash could be triggered by unauthorized spread of a self-propagating gene drive (Funk and Rainie, 2015; Couzin and Kaiser, 2005). Any such event could significantly reduce public support for interventions against diseases such as malaria that could possibly save millions of lives. We believe it would be profoundly unwise to proceed with anything less than an abundance of caution.
On a more technical note, our findings are specific to population alteration drive and cannot be directly generalized to self-propagating suppression drive, which could potentially self-extinguish before invading other populations. However, our results suggest a method for rough comparison between these scenarios: we find that the primary factor in determining drive spread between adjacent populations is the average number of migrants per generation (Analytic formulae for the escape probability in structured populations), which can, in principle, be compared between models. For example, an earlier model of suppression drive systems (Deredec et al., 2011) predicted a total number of drive-carrying organisms over time which is remarkably similar to our example of an inefficient alteration drive system that is rapidly outcompeted by resistant alleles (Figure 1D, middle). Thus, assuming comparable migration rates, it might not be surprising to see qualitatively similar levels of invasiveness. Accordingly, we urge researchers to exercise caution in developing or advocating for self-propagating suppression drives for applications other than malaria prevention—or similar projects intended to affect an entire species—until explicit models of invasiveness are available.
Additionally, our findings emphasize the importance of the containment strategy known as ‘ecological confinement’, which was proposed previously (Esvelt et al., 2014; Akbari et al., 2015). Given the risk that organisms may escape through accidents or outside intervention, laboratories in regions with endemic wild populations may wish to refrain from constructing self-propagating systems capable of invading those populations and undergoing unwanted spread. Laboratories in regions with endemic wild populations can reliably prevent accidental invasion by employing intrinsic molecular confinement mechanisms such as synthetic site targeting or split drive as recommended by the National Academies’ report on gene drives (National Academies of Sciences, Engineering, and Medicine, 2016).
Perhaps most importantly, any development efforts looking ahead toward field trials, a component of the staged testing strategy outlined by the National Academies report, should be aware that there could be a high likelihood of unwanted spread across international borders, even from ostensibly isolated islands. The development of ‘local’, intrinsically self-exhausting gene drive systems (Chen et al., 2007; Akbari et al., 2014; Noble et al., 2016; Magori and Gould, 2006; Gould et al., 2008), sensitive methods of monitoring population genetics, and strategies for countering self-propagating drive systems and removing all engineered genes from wild populations should be correspondingly high priorities.
To model gene drives in finite populations, we introduce a Moran-type model with sexual reproduction (illustrated in Figure 1C). We consider a population of individuals, each of which is diploid. We focus on a locus with three allelic classes: wild-type (W), CRISPR gene drive element (D) and drive-resistant (R). There are six possible genotypes: WW, WD, WR, DD, DR, and RR. We assign to each genotype a reproductive rate .
The process proceeds in discrete time-steps, during each of which three events occur in succession (Figure 1C). First, two individuals are chosen without replacement for mating with probabilities proportional to their reproductive rates, so that genotype is selected with probability
Here is the number of individuals having genotype , and the sum in the denominator is over all six genotypes. Second, after selecting the two parents, the offspring genotype is chosen randomly based on the genotypes of the two parents. To proceed, we introduce notation to mean that genotype consists of alleles and , and we index these alleles via and . Note that we track only one genotype for each heterozygote, implicitly combining counts for genotypes AB and BA. Using this notation, the probability that an offspring of genotype is chosen given a mating between parents of genotypes and is given by the quantity , which is equal to
Here is a gamete production probability—the probability that a parent with genotype produces a gamete with haplotype —and is the Kronecker delta, defined by if (i.e., if the offspring under consideration is a homozygote), and otherwise. The gamete production probabilities, , are determined by accounting for the gene drive process described above. They are given by: , , , . The remaining values not listed, e.g., , are zero. Third, an individual is chosen uniformly at random for death. Thus, the population size remains constant. The resulting counts become the starting abundances for the next iteration of the process. The process is initialized with a small number, , of drive homozygotes (DD) and the remaining population, , wild-type homozygotes (WW). The process continues as described above either until a specified number of time steps have elapsed or until one of the three alleles has fixed. Any of the alleles can fix, but typically either the wild-type or resistant alleles fix, due to the emergence of resistance.
To study the effects of population structure on drive containment, we extended the well-mixed model from the previous section. We now consider well-mixed subpopulations, each consisting initially of individuals. The process proceeds in discrete time steps, as before. In each time step, we either migrate an individual from one population to another, or we choose a particular subpopulation and proceed through one mating and replacement iteration, as outlined above. More specifically, one step of the process proceeds as follows (illustrated in Figure 11). With probability , we initiate a migration event. In this case, we perform three steps. First, we choose a source population with probability proportional to its size. Second, we choose an individual uniformly at random from the source population for migration. Finally, we move the chosen individual to a linked subpopulation uniformly at random. Or, with probability , we initiate a mating event as described in the well-mixed section. To carry this out, we first choose the population in which the event will occur. We choose this population with probability proportional to the square of its total fitness, since this counts the rate of reproduction for every possible mating pair in the population (as matings occur with rates proportional to the fitness of each parent). We then step through one iteration of the well-mixed mating process within this subpopulation. Note that in this model the migration rate has a simple interpretation. The time between migrations is geometrically distributed with parameter , so the mean time between migrations is time steps. Recall that a ‘generation’ is equal to the mean lifespan of an individual, that is, reproduction events or time steps. Then the typical time between migrations can be expressed with the units as generations:
To compare our stochastic simulations with deterministic results, we use a recently published model (Noble et al., 2017). From that work, we employ the ‘previous drive’ model, as it was designed to agree with the existing proof-of-concept CRISPR drive constructs that we consider here. Specifically, we consider the case of guide RNA ( in that work’s notation), and zero production of costly resistant alleles ().
Above, we present results from simulations which assume populations of size . We claim that is a reasonable approximation for the dynamics in the large-population limit, which is the relevant regime for widespread invasion or for species with very large population sizes, e.g., mosquitoes. Here we briefly evaluate this claim.
Figure 3 recreates Figure 1E from the main text with additional population sizes overlaid: , , , and . The distributions narrow for larger until plateauing at roughly . However, the central tendencies show little change with increasing .
Several recent studies have explored the effect of pre-existing drive resistant alleles in a population brought about by standing genetic variation (SGV) at the target locus (Unckless et al., 2017; Drury et al., 2017). These studies developed deterministic models and showed that pre-existing resistant alleles—presumably neutral—should rapidly outcompete costly drives due to selection, resulting in rapid drive extinction. The study by Drury et al. (Drury et al., 2017) used sequencing to quantify this standing variation in diverse populations of flour beetles and found resistance-conferring mutations to exist at a wide range of frequencies, from to , with an average of roughly .
However, these studies were primarily concerned with long-term outcomes following drive release, in which case resistance certainly outcompetes the drive. For our purposes, however, we are concerned with the intermediate time regime in which the dynamics of resistance are less clear. Moreover, these studies employed deterministic models, whereas our model is stochastic. Here, we seek to understand the effect of SGV in our model.
To incorporate SGV, we simply alter the initial conditions: rather than introducing drive homozygotes into a population of wild-type homozygotes, we introduce drive homozygotes into a population consisting of resistant homozygotes (we choose resistant homozygotes for simplicity, since they rapidly go to Hardy-Weinberg equilibrium following release) and wild-type homozygotes. Figure 4 shows the effect of SGV on peak drive for pre-existing resistance frequencies up to .
We find that the effect of SGV is to linearly decrease the mean peak drive (). Our intuition for this result is as follows. Because the population is well-mixed, the effect of resistance is simply to decrease the size of the population that is susceptible to the effects of the drive. This can be roughly viewed as linearly scaling the drive-frequency axis. For example, if the population has a frequency of resistant alleles immediately prior to release, then the population that is susceptible to drive is roughly of the census population size, and the drive undergoes its usual dynamics within this subpopulation. There are of course complications to this simplistic explanation, e.g., selection increasing the size of the resistant population and diploidy mixing resistant and drive alleles. Furthermore, the linear relationship only holds for sufficiently low levels of SGV. In our example here, the relationship holds to roughly 0.5 initial resistance frequency. However, this is still higher than would be anticipated for drives engineered to spread in the wild.
Overall, our results suggest that a high level of SGV would be required to protect against drive invasion. In our conservative example (Figure 4) assuming homing efficiency, drive fitness, and neutral resistance, pre-existing resistance of greater than frequency is required to contain peak drive to below of the population, compared to in the absence of SGV.
In the model presented above, we assume that each mating produces one offspring. However, a variety of application-relevant species are known to produce many offspring per mating. For example, female Anopheles gambiae mosquitoes can lay hundreds of eggs per lifetime (Hammond et al., 2016). It is not clear, a priori, how varying the offspring number distribution in our model would affect the results presented above. Thus we here analyze a simple extension of the model which allows us to vary the number of offspring following a given mating event.
To begin, recall our model. We consider a population of constant size with the following process: At each time-step, two individuals are chosen for mating; an offspring is sampled according to the parental genotypes; a third individual is chosen for removal from the population, and the parents’ offspring takes its place. (We implicitly assume that these offspring are only the offspring which successfully reach adulthood, i.e., reproductive age). We now add a new parameter, , which determines number of (adult) offspring produced by a mating pair. The process proceeds as before, except now offspring are independently sampled from the parental genotypes, and individuals are chosen uniformly (without replacement) for removal from the population. Clearly the model presented in the main text is the special case .
Note that this parameter is not equivalent to brood size, clutch size, egg batch size, etc.—values often considered in the ecological literature—in that k describes the number of offspring produced per mating which successfully attain reproductive age. This number can of course be much lower than these other parameters due to death during juvenile life stages. We provide an example calculation for this parameter in An. gambiae at the end of this section.
We now argue that increasing the number of offspring per mating, , corresponds to decreasing the effective size of the population, . We omit rigorous proof here, but we provide a formula for the effective population size in our model and present numerical simulations as support. To begin, Hill showed in 1972 that the variance effective population size in the standard Moran model is (Hill, 1972)
Here is the census population size, and is the variance in the distribution of the total number of offspring produced by an individual over the course of its lifetime (i.e., its lifetime reproductive success). It was proven that this formula holds both for the Wright-Fisher model with discrete generations and for the Moran model with overlapping generations, provided that is the same and that the total number of individuals entering the population in each generation is equal (Hill, 1972). Our model meets both of these requirements—indeed, the only difference is that two parents are chosen to sample offspring types, rather than one, and this has no bearing on the number of offspring produced—so we conjecture that Equation (4) holds for our case as well.
To proceed, we calculate for our extended model and employ the variance effective population size given by Equation (4). Consider one particular individual in the population, and let count time-steps. As described, in each step, individuals are uniformly sampled (without replacement) for removal. Thus, an individual has probability of dying in each step. Its lifespan, , is thus geometrically distributed, .
Next, let be a random variable describing the number of offspring an individual produces in its lifetime, so that is the number of such events given that the individual survives time-steps. Because each mating event is independent, . The success probability derives from the fact that two individuals are chosen for mating in each time-step and that the process is neutral. Thus,
Returning to the variance effective population size expression in Equation (4), we obtain for our model:
Note that in the case we recover , which is the variance effective population size for the standard Moran model.
In Figure 5, we present peak drive distributions (as in Figures 1E and 3) for varying values of with the effective population size, , and effective release size, , both determined by Equation (5), held constant. In this case we used and , which correspond to and an initial release of in our standard model with . The peak drive distributions for all values of studied are approximately identical. This suggests that the dynamics for larger can indeed be inferred from the standard model with and population/release sizes appropriately scaled via Equation (5). An immediate consequence of this result is that releases of organisms which have many offspring (e.g., mosquitoes) are effectively smaller than would be expected from simply counting. For example, an organism which typically has offspring that survive to adulthood would need a release size of roughly to surpass the 10-individual initial release threshold we have observed. Note that the 10-individual threshold discussed throughout the text is the census release size; the effective release size is .
In Figure 6, we recalculate the distributions in Figure 5 holding the actual population and release sizes constant, rather than their effective values. Two effects are apparent. First, the decrease in effective population size, , leads to greater variation in peak drive among simulations that invade, i.e., the distribution centered around widens. Second, the decrease in effective release size, , leads to a greater probability of simulations immediately going extinct, i.e., the relative mass of the mode centered around increases. In sufficiently large populations the first effect would be less pronounced—see Figure 3—while the second effect should apply for any small release.
Finally, as an example, we provide an estimate of our model’s parameter for a particularly relevant species, An. gambiae. To do this, we find the typical size, , of egg batches laid by females following a particular mating event; then we estimate the total number of these which survive to adulthood using parameters from the literature.
The first number, , varies according to a variety of environmental and ecological factors (Hammond et al., 2016; Yaro et al., 2006), so we assume a large but reasonable value in order to avoid underestimating our parameter . For this, we assume that , which is roughly the highest value observed by Hammond et al. in the CRISPR drive study (Hammond et al., 2016) and is in line with previous field work (Yaro et al., 2006).
To estimate the survival probability for each egg to adulthood, we employ the method and parameters presented by Deredec et al. (Deredec et al., 2011) Each egg goes through three juvenile stages before reaching adulthood—the egg stage, the larva stage, and the pupae stage. We denote the probabilities of surviving each of these stages by , , and , respectively. The probability of a particular egg reaching adulthood is then . These parameters were estimated to be , , and . Thus we have .
Given this formulation, the number of eggs laid per mating event which reach adulthood is distributed according to . We take the mean of this distribution to obtain:
Therefore, while An. gambiae females exhibit large egg batch sizes, the value of for our model is much lower—indeed, low enough that the central tendency of the peak drive distribution remains roughly unchanged in Figure 6.
Above, we study various values of the homing efficiency, , but we perform less exploration of the parameters governing drive fitness, , and resistance cost, . This is motivated primarily by the abundance of data for the former—see Appendix 1—table 1—and the lack of data for the latter parameters.
In addition, we have assumed throughout that death rates are identical for the various genotypes, while reproductive events occur with probabilities proportional to fitness. On the other hand, some drive constructs might behave the opposite way: reducing fitness by increasing an organism’s death rate, while leaving its birth rate unchanged.
In this section we explore these three effects: (i) varying drive fitness across its entire range, (ii) varying the fitness cost of resistance across its entire range, and (iii) modifying the model so that death rates are affected by fitness, rather than birth rates.
To begin, we consider our standard model for fitness and study drive spread across the entire range of values for drive fitness, , and homing efficiency, . In particular, we consider values of each parameter: and , both evenly spaced, for a total of 2601 parameter pairs. For each pair, the average peak drive is calculated over 100 simulations, and the results are shown in Figure 7, left.
We find that maximum drive frequencies of greater than are common across a wide range of drive fitness values. In particular, for our lower-bound estimate of empirical drive efficiency (), drives can confer fitness costs as high as before the peak drive drops below . For more typical empirical efficiencies (), the peak drive is typically greater than even for costly drives (), and low-cost drives () have peak drive of greater than .
We next modified our standard well-mixed model in the following way. Recall that the model involves choosing two parents to mate, then choosing an individual to die and be replaced by the parents’ offspring. In our standard model, the two parents are chosen to reproduce with probabilities proportional to their fitnesses, and an individual is chosen to die uniformly. In our modified model, we choose the two parents uniformly and then choose the individual to die with probability proportional to the inverse of its fitness. Results from the modified model are shown in Figure 7, right and are nearly identical to the results from the standard model.
In both cases, it is important to note that the peak drive and likelihood of invasion deemed socially acceptable for accidental release would likely be lower than those discussed above. With this in mind, our simulations suggest that if a drive is predicted to invade by deterministic models (i.e., if it lies above the boundary in Figure 7), then it will almost certainly reach a maximum frequency greater than . While acceptable levels of peak drive are as-yet unknown and will likely vary between species, applications, jurisdictions and so on, spread to this extent will likely surpass it.
Finally, we sought to understand the effect of varying the fitness cost associated with drive-resistance. Throughout the text above we have assumed that resistance is neutral, as this presumably represents the best case for containment. However, drive constructs developed for applications are likely to employ resistance-mitigating strategies, such as multiplex targeting of essential genes (Esvelt et al., 2014; Noble et al., 2017), which essentially increase the fitness cost associated with drive-resistance. Thus, we ran simulations varying drive-individual fitness, , in the range , and resistant-individual (RR) fitness in the range , assuming conservative drive efficiency, . In both dimensions we considered 51 parameter values, evenly spaced, for a total of 2601 parameter pairs. For each pair, the average peak drive is calculated over 100 simulations, and the results are shown in Figure 8.
We find qualitatively that there are two regimes, determined by the fitness cost of resistance, (i.e., individuals with genotype RR have fitness ), and the deterministic invasion condition, . In the figure, we assume that , so the deterministic invasion condition is simply . When the fitness cost of resistance, , is sufficiently low (), then the dynamics are determined by the relationship between the fitness of drive individuals and the fitness of resistant individuals: if the fitness of drive individuals is greater than the fitness of resistant individuals, then the spread of the drive is dramatically improved—typically reaching fixation—compared to the baseline neutral-resistance case. However, if the fitness cost of resistance is sufficiently high (), then the improvement in drive spread brought about by increasing the cost of resistance saturates, since the drive can now be less costly than resistance () but also too costly to invade (). That is, for resistance costs higher than , the mean peak drive as a function of drive fitness, , remains essentially unchanged with increasing , since the deterministic invasion condition can no longer be satisfied when the drive has fitness , no matter the cost of resistance.
Since the drive functions only in heterozygotes, inbreeding in a population—which in effect reduces the frequency of heterozygotes—would be expected to impact drive invasiveness. Indeed, this has been shown in recent theoretical studies by Bull, 2017 and Drury et al. (2017) Thus we here extend our well-mixed model to include inbreeding and study its effect.
For simplicity, we consider a partial selfing model. In each update step of our process (see Figure 1C), we typically choose two parents for mating with probabilities proportional to their fitnesses. To include selfing, we instead choose the first parent as usual, with probability proportional to its fitness. We then choose the first parent as the second parent as well with probability ; or, with probability , we choose a second parent from the remaining population, with probability proportional to its fitness. Note that the fitness of each offspring is determined entirely by its genotype and does not account for inbreeding depression. Implicitly, we thus consider the case of zero inbreeding depression. As this effect helps protect against drive invasion, we essentially consider the worst-case scenario for drive containment (Bull, 2017).
Using our extended model, we then computed peak drive distributions for values of between and and for the three values of explored above: . The results are shown in Figure 9. We find that a fairly high degree of selfing is required to impact the peak drive distribution in a meaningful way. For highly effective drive, , the mass of the upper mode in the frequency distribution is larger than the lower mode until roughly . For conservative drive, , this occurs at roughly , and for ineffective drive there is little change, as the maximum frequency begins very near zero. To compare with previous results, we can consider the inbreeding coefficient rather than the selfing probability. In our model, the inbreeding coefficient, , is given by . Thus highly effective drive can tolerate inbreeding of and conservative drive can tolerate .
To show that the deterministic ODE solutions provide reasonable approximations to the typical behavior of our stochastic model, we overlay numerical solutions to the ODEs for the systems studied in Figure 1D of the main text. The results are shown in Figure 10.
Throughout we have assumed that resistance is neutral with respect to the wild-type. This assumption is biologically realizable as resistance is conferred by changing sequence homology to the drive’s gRNA—something that could be achieved with synonymous codon substitutions, for example. In practice, some resistance mutations could be costly and those that are neutral could be rare. However, assuming resistance is always neutral represents the worst-case scenario for drive invasiveness, as resistance can increase in frequency without being selected against with respect to the wild-type.
When resistance is no longer assumed to be neutral, other interesting dynamics can occur (Traulsen and Reed, 2012). In particular, when resistance is costly with respect to the wild-type, but not so costly as the drive and its cargo, the dynamics resemble the Rock-Paper-Scissors game. This allows the drive to avoid extinction indefinitely.
We consider a deme structured population, where each subpopulation has size and there are demes. We define a Moran-type process, where in each time step either a reproduction or migration event takes place (illustrated in Figure 11). A reproduction event occurs with probability and a migration event occurs otherwise. If a reproduction occurs, then a subpopulation is selected proportional to the square of its total fitness. Next, two individuals in the subpopulation are selected proportional to their fitnesses and they produce an offspring according to the mechanism above. Finally, another individual from the subpopulation is chosen uniformly at random for death. If a migration event occurs, then an individual is selected uniformly at random and migrates to a new subpopulation uniformly at random. We denote the proportion of genotype at time in the initial subpopulation by .
The process begins with drive homozygotes and wild-type homozygotes in a single subpopulation. The remaining subpopulations consist only of wild-type homozygotes. Let be the event that the frequency of drive alleles reaches 10% in a subpopulation other than where the drive was released, given that drive homozygotes were released in the initial subpopulation. We assume that is small with respect to .
As an aside, note that the choice of 10% is arbitrary—any other percentage (less than the peak drive in the deterministic model, ) would be equivalent if is large enough. This is clear from Figure 1E, where either the drive does not invade and so peak drive is roughly equal to the initial frequency or the drive does invade and the peak drive is close to . This claim is equivalent to stating that the probability that the drive starting at frequency attains frequency (such that ) before going extinct tends to 1. This behavior is typical of Moran-type models, since the extinction probability of drive homozygotes rapidly approaches 0, even in an infinite population, as increases (Marshall, 2009). Specifically, if we have , then the extinction probability approaches 0 as becomes large, and moreover, if the drive does not go extinct, then it behaves almost deterministically and will reach frequency and thus also .
Returning to approximating the probability of , note that for to take place a drive allele has to migrate from the initial subpopulation and this allele has to survive stochastic fluctuations and avoid extinction in its new subpopulation. The drive alleles do not last indefinitely in the initial population. We denote the random time at which the drive alleles go extinct by . As long as the initial drives do not go extinct due to stochastic fluctuations, the frequency of the drive increases rapidly, as it outcompetes the wild-type. Concurrently, resistant alleles are produced that eventually push the drive to extinction. This means that the drive has a finite time to migrate to other subpopulations. Although this process is stochastic it shows fairly deterministic behavior once there are a sufficient number of drive alleles (see Figure 10)—that is, if the drive avoids immediate extinction. Let , be the probability that the drive survives stochastic fluctuations and avoids immediate extinction when starting with drive homozygotes and heterozygotes. Implicitly, here we = assume that does not depend on whether the heterozygotes are wild-type or resistant heterozygotes. Note that when or are , is approximately 1, so when , we assume that the probability that the drive migrates is approximately 0. Moreover, since the drive will almost certainly go extinct, there is some time where the frequency of drive alleles is again much less than . We also assume here that the probability that the drive migrates is approximately 0.
At each time step, there is a small probability that the drive migrates from the initial population and invades another subpopulation. To calculate, we first condition on the non-extinction of the initial drive homozygotes. Second, we note that if the drive does not migrate and avoid extinction in another subpopulation, then it does not do so at any particular time . Third, we assume that these events for each are approximately independent. Finally, we numerically solve a deterministic ODE system representing the dynamics (Noble et al., 2017) to approximate the probability that the drive does not migrate at time . Thus,
since if the drive avoids extinction it will invade. Now we substitute the ODE solution for in the above expression to find that
Here we approximated the product with an integral and used a change of variables.
Note that if and heuristically we replace in the above expressions with its time average, denoted , then
Thus, when the migration rate is on the order of the inverse of the drive extinction time, the invasion probability is order 1.
In Appendix 1—table 1, we present empirical homing efficiencies for all CRISPR gene drive constructs reported to date. These studies varied in multiple ways: they studied different organisms; they used different methods for counting drive constructs (ranging from direct genetic measurement, such as quantitative PCR, to indirectly observing visible phenotypes), and they sometimes observed differential inheritance rates between sexes, possibly due to differences in male and female gamete characteristics. Given this complexity, we elaborate here on the specific data we selected for review to produce Appendix 1—table 1 and the reasoning for our choices.
To begin, all studies performed some variation of producing drive/wild-type heterozygotes (DW), followed by counting the number which converted their wild-type allele to a drive allele. There were two main approaches.
Some constructs acted in the early embryo, in which case WW and DD individuals were mated to produce offspring which were initially WD. Observations were then made of adult genotypes. DD individuals must have undergone drive conversion, while WD individuals must have avoided conversion. Without drive, all adults are expected to be WD, but with drive, all are expected to be DD.
Other constructs acted in the germline of adults, so that adult WD individuals produce D gametes more often than chance under the effects of drive. To study these constructs, WD individuals were mated with WW individuals. Without drive, half of adults should be WW, and half should be WD. With drive, however, all adults should be WD.
To employ a consistent strategy across the studies, we calculate two numbers for each drive construct: (i) the total number of initial alleles counted which were drives or were subject to drive, , and (ii) the total number of resulting drive alleles, . The homing efficiency can then be calculated in the following way:
Notice that if drive is perfectly efficient (), we have , i.e., there are twice as many drive alleles as starting heterozygotes, while under standard inheritance (), the number of drive alleles is unchanged from the initial heterozygous state, . Below, we explain our calculations of these quantities for Appendix 1—table 1.
The study by DiCarlo et al. studied five distinct gene drive systems in yeast (DiCarlo et al., 2015). We address each distinct system in subsections below.
This is the basic split drive system containing only a guide RNA. Its design is depicted in Figure 2B, and it is described on pp. 1250–1251, with results pictured in Figure 2D and Figure 4. Drive abundances were measured via colony counting (Figure 2D), obtaining absolute colony numbers, and via qPCR (Figure 4), obtaining relative abundances of drive alleles. By the colony counting method, the drive efficiency is measured at 100% (). By the qPCR method, of alleles counted from offspring were drive alleles, so . Therefore:
Strictly speaking, the inequality entails , but we set this to because the qPCR results were indistinguishable from . We make a similar approximation below for systems 4 and 5.
This system aimed to test whether an associated ‘cargo’ gene could be spread with the minimal drive element. Its design is depicted in Figure 3a, and results are shown in Figure 3b. The related experiment measured drive presence via a visible phenotype (red pigment). In total, 60 haploids were red, or , out of 60 total alleles, . Thus:
The sgRNA +ABD1 drive system tested the ability to target a recoded essential gene. Its design is depicted in Figure 3c, and results are discussed in the text (first full paragraph on pp. 1252). The presence of the drive was measured via sequencing of the ABD1 locus. In total, 72 haploids were found to have the drive, , out of counted, .
The first example of an ‘autonomous’ drive in the paper, this system is depicted in Figure 5a. It consisted of a gRNA and cas9 together targeting the ADE2 locus (recoded due to safety/containment considerations). The fractional abundance of drive allele was measured by performing qPCR on diploid offspring from wild-type/drive haploid matings; the corresponding data is found in Figure 5b. The fractional abundance of the drive allele was measured to be , so , as for the first construct above.
This system is DiCarlo et al.’s example of a ‘reversal’ drive, designed to target and overwrite the autonomous drive (cas9 +sgRNA, directly above). The system is depicted in Figure 5c. The drive efficiency was measured in the same way as that for the cas9 +sgRNA drive (qPCR to calculate fractional abundance of the overwriting drive allele in diploid offspring from haploid matings). The fractional abundance was calculated to be , so , as above.
Gantz et al. constructed an X-linked drive construct targeting the (X-linked) yellow locus in Drosophila melanogaster and acting in the early embryo (Gantz and Bier, 2015). The drive functions to knock out the yellow gene, which produces a yellow-body phenotype, denoted , due to lack of black melanin pigment formation. The wild-type phenotype is referred to as . Females with copies of the drive or males with 0 copies should appear , while females with 2 copies of the drive or males with one copy should appear . The related data is found in Figure 2E and Table 1.
Two sets of crosses were performed: (i) drive-males with wild-type females, and (ii) drive-females with wild-type males. To tabulate the allele counts and , we discuss the two crosses separately.
First, cross (i): In this cross, male offspring could not have possibly inherited a drive allele nor received one through conversion. This is because the only allele they could have inherited from the drive-male parent was the Y chromosome, but the drive is X-linked. Thus we do not consider male offspring in the total. As for female offspring, these should inherit exactly one drive allele and one wild-type allele prior to conversion. Then the adult female individuals should appear if and only if drive-mediated conversion was successful. Thus we add exactly two alleles for each female offspring toward the total allele count, while we add one or two drive alleles to the drive allele count if the adults are or , respectively. This yields and . The drive efficiency for this cross is .
Second, cross (ii): In this cross, male offspring are again uninformative, since each should inherit exactly one drive allele from the female parent and one Y allele from the male wild-type parent. Thus we ignore male offspring in our counting. Female offspring, on the other hand, should all begin as WD embryos, with phenotypes. Then adults are if and only if they have undergone drive-mediated conversion. Thus we count two alleles for every female offspring in the total, one drive allele per adult and two drive alleles per adult. This yields , and . The drive efficiency for this cross is thus .
We then consider crosses (i) and (ii) together to calculate the overall drive efficiency. This yields:
Champer et al. constructed two CRISPR gene drive constructs in D. melanogaster (Champer et al., 2017). The first resembled the vasa promoter-driven construct from Gantz et al., discussed in the section immediately above. An important addition, however, was a DsRed fluorescent protein as payload in the drive construct, which allows the drive to be detected in heterozygotes, as its red fluorescent phenotype is dominant. The second construct used the nanos promoter, which has been shown to restrict drive function to the germline and is expected to produce less toxicity (and thus a lower fitness cost associated with the drive construct).
This construct was similar to the one studied by Gantz et al., discussed above. The construct targets the X-linked yellow gene. Disruption of the gene produces a recessive yellow phenotype, while the drive itself carries a DsRed payload, producing a dominant red fluorescent eye phenotype. To assess the construct’s homing efficiency, wild-type males were crossed with heterozygous DW females. In this setup, all progeny should exhibit the red eye phenotype if the drive is perfectly efficient, while roughly 50% of progeny should exhibit the red eye phenotype in the absence of conversion. Here we count toward the total number of drive or susceptible alleles one allele per male offspring and one allele per female offspring, since in either case only one allele is inherited from the drive parent. Toward the number of drive alleles, we count one per offspring if the offspring displays the DsRed phenotype and zero otherwise. This data is shown in Table 2B of the Champer et al., 2017 study. We count as follows: (i.e., the number of drive alleles counted over female offspring), , , . Then we obtain:
This construct is essentially the same as the vasa construct, except that it uses a different promoter and targets a different sequence in the yellow gene (the coding sequence, rather than the promoter as in the previous construct). The data is found in Table 1B of the Champer et al., 2017 study. We count potential drive alleles and total alleles as above. Our count is as follows: , , , . We obtain:
The constructs described above were then tested in a variety of additional D. melanogaster lines, detailed in Table 3 of that work. The authors’ efficiency calculations are detailed in the S1 Dataset. For the vasa construct (two lines), the minimum is , and the maximum is . For the nanos construct (seven lines), the minimum is , and the maximum is .
In this study, Gantz et al. constructed an autonomous CRISPR-based gene drive system in the malaria vector mosquito Anopheles stephensi (Gantz et al., 2015). The construct comprises two effector genes with anti-Plasmodium falciparum activity, a dominant marker gene (DsRed), and the CRISPR components (Cas9 with a single gRNA), spanning roughly 17 kb. The construct targets the kynurenine hydroxylase (kh) locus, which has a recessive white-eye phenotype. The effect of this targeting is that drive/wild-type heterozygotes display a DsRed phenotype, while drive homozygotes display both DsRed and white eyes.
While this one construct was made and studied, it exhibited differential transmission between lines founded by drive males/wild-type females and drive-females/wild-type males. More specifically, lines in which drive alleles are inherited only through male parents display drastically higher drive efficiencies than lines in which the drive allele is inherited at some point via a female parent. To explain this discrepancy, the authors propose a model whereby in crosses between transgenic females and wild-type males, maternal deposition of Cas9 in eggs results in NHEJ-mediated disruption of the paternally derived wild-type chromosome in the early embryo. Crosses between transgenic males and wild-type females, on the other hand, do not see Cas9 deposited in the early embryo, and Cas9 cutting is better contained to the later germline, where HDR is more efficient.
To account for this discrepancy, we choose to consider these two cases separately and report homing efficiencies for each.
Here we consider all offspring (larvae + adults) whose drive alleles (or potentially-inherited drive alleles) have been passed down only through male ancestors. This includes all offspring from the male-founder crosses in Table 1 of the main text (10.1 G and 10.2 G), as well as crosses 6 and 8 in Table 2 (also Figure 3). We choose to compile all alleles from each of these crosses together to calculate an average efficiency across all available data. Because the constructs are on autosomes, we treat male offspring and female offspring identically, and we count toward the total allele count, , one allele from each offspring (since at most one drive allele can be inherited in each cross), and we count toward the drive allele total, , one allele for each DsRed individual observed, since this is a dominant marker for the drive. Finally, we consider both larvae and adults identically, as conversion is anticipated to have occurred before this stage, and results are similar between adults and larvae. Values of and for each cross are displayed in Appendix 1—table 2.
To obtain an average efficiency for the construct, we sum the values of and across all crosses in Appendix 1—table 2. We obtain:
To understand the effect of maternal Cas9 deposition, we count all offspring (larvae + adults) from crosses such that the any (potentially) inherited drive allele has been inherited via a female parent at least once. This includes no G offspring, as the drive alleles present in G parents were inherited from G males. Thus we include only G offspring of G parents, specifically Crosses 1–4, and as for the transgenic male lines, we sum both larval and adult crosses. Values of and for each cross are displayed in Appendix 1—table 3. Summing the values in Appendix 1—table 3 yields:
In this study, the authors construct three CRISPR-based gene drive systems in the malaria vector An. gambiae, each targeting a different gene with a recessive female sterility phenotype upon disruption (Hammond et al., 2016). These are examples of suppression drives whose purpose is to reduce or eradicate wild populations. Each drive construct carries a copy of Cas9, a single guide RNA, and red fluorescent protein (RFP) which has a dominant fluorescent phenotype. Each construct targets one of three female fertility genes, referred to as AGAP011377, AGAP005958, and AGAP007280, but otherwise they are identical.
To determine homing efficiency, drive-heterozygotes were crossed with wild-type homozygotes, and offspring were scored visually for the presence of the dominant marker RFP gene. Thus in our tabulations, we count one allele per individual toward the total, , and we count one allele per RFP individual toward the drive allele count, . Furthermore, the outcrosses were performed over several generations. To obtain average homing efficiencies, we sum drive alleles and total alleles over G, G, G, and G generations, when applicable. (Some constructs were tested over more generations than others.) This data is found in Table 2 in the study. Furthermore, we sum across male- and female-drive parent crosses, since we would expect these to behave identically with respect to homing, given that the female drive parents are capable of producing offspring.
This construct was studied over generations G to G in Table 2. The total number of relevant alleles resulting from crosses between drive-male parents and wild-type females was , while the male drive total was . The female total was , and the female drive total was . The average efficiency is then:
This construct was studied over generations G and G. There were no offspring from female-drive crosses to wild-type due to the low fertility of these individuals. The total was , and the drive total was . The efficiency is thus:
This construct was studied over generations G and G. The male total was , and the male drive total was . The female total was , and the female drive total was . The efficiency is:
OUP: lethal gene drive selects inbreedingEvolution, Medicine, and Public Health 2017:eow030.
Site-specific selfish genes as tools for the control and genetic engineering of natural populationsProceedings of the Royal Society B: Biological Sciences 270:921–928.https://doi.org/10.1098/rspb.2002.2319
Public and scientists’ views on science and societyPew Research Center, 29.
A killer-rescue system for self-limiting gene drive of anti-pathogen constructsProceedings of the Royal Society B: Biological Sciences 275:2823–2829.https://doi.org/10.1098/rspb.2008.0846
A mathematical theory of natural and artificial Selection, part V: selection and mutationMathematical Proceedings of the Cambridge Philosophical Society 23:838–844.https://doi.org/10.1017/S0305004100015644
Effective size of populations with overlapping generationsTheoretical Population Biology 3:278–289.https://doi.org/10.1016/0040-5809(72)90004-4
The effect of gene drive on containment of transgenic mosquitoesJournal of Theoretical Biology 258:250–265.https://doi.org/10.1016/j.jtbi.2009.01.031
Gene Drives on the Horizon: Advancing Science, Navigating Uncertainty, and Aligning Research with Public ValuesNational Academies Press.
From genes to games: cooperation and cyclic dominance in meiotic driveJournal of Theoretical Biology 299:120–125.https://doi.org/10.1016/j.jtbi.2011.04.032
Reproductive output of female Anopheles gambiae (Diptera: Culicidae): comparison of molecular formsJournal of Medical Entomology 43:833–839.https://doi.org/10.1093/jmedent/43.5.833
Michael DoebeliReviewing Editor; University of British Columbia, Canada
In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.
Thank you for submitting your article "Current CRISPR gene drive systems are likely to be highly invasive in wild populations" for consideration by eLife. Your article has been reviewed by three peer reviewers, one of whom is a member of our Board of Reviewing Editors, and the evaluation has been overseen by Diethard Tautz as the Senior Editor. The following individuals involved in review of your submission have agreed to reveal their identity: James Bull (Reviewer #2); Bernard Dujon (Reviewer #3).
The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.
Your paper shows that although resistance prevents drive systems from spreading to fixation in large populations, even very ineffective systems are highly invasive. Based on this it is argued that standard gene drive systems should not be developed nor field-tested in regions harboring the host organism.
The main concern of the reviewers is that the rather forceful message of the paper is not really substantiated in the sense that it is unclear why a high peak frequency of gene drive systems is necessarily a bad thing. For example, transposable elements have recently risen to high frequency in wild Drosophila populations, and while investigating the mechanism of this rise (and its potential prevention) are very interesting topics, it is unclear that this rise is a bad thing for these Drosophila populations. In other words, a rapid spread does not necessarily imply doom. I think this should be acknowledged in some form by the authors, and the message should be tuned accordingly.
The reviewers also raised a number of technical points, e.g. about the effects of a cost of resistance. These concerns should be carefully addressed.
This is an interesting paper on the dynamics of the CRISPR gene drive system. Contrary to prevalent opinion, the authors conclude that invading gene drive systems may have a lasting impact even if they do not go to fixation. In particular, substantial "peak drive" occurs under many conditions. Based on this the authors caution against being careless with such systems, particularly when it comes to introduction of gene drive systems into wild populations.
The theoretical analysis is comprehensive and appears to be sound. The conclusions regarding the danger of gene drive may be a bit too sweeping.
- The authors start out by pointing out the need for stochastic models of populations with finite size. However, in Figure 9 they effectively show that deterministic models make the same predictions as their stochastic models (if I understand this figure correctly). So why do we need stochastic models after all? This should be better explained, since it is part of the main rationale for the analysis presented in the paper.
- The model assumes that death is random, but births occur according to fitness. What if birth is random, and death is proportional to fitness? (There are some models in which this makes a difference…)
- It seems that in the model, invasion is always initialized with homozygotes? Can this assumption be relaxed?
- I don't really understand this statement, "Invasion is very unlikely when the drive is not initially favoured by selection." I thought the drive allele is never favoured by selection, as measured by the fitness values.
- Why show mean and median in Figure 1E,F.
- I don't really understand Figure 2E: it shows that the probability of invasion into one local population depends on migration rate, but that probably should not depend on migration at all, because as I understand it, the figure presumably refers to invasion of the drive in the local population into which the drive is initially released.
- Choosing populations proportional to the "square of total fitness" seems odd. Shouldn't it be the sum of squares of individual fitness values, or something like that?
Conceptually, I view this paper as having two parts. The first is an analysis of gene drive models to study the spread of drives and consequent evolution of resistance to the drives. The second part is a value judgement on the deployment of drives. The latter occupies little space in the paper, but is extreme and certain to attract attention ("Contrary to the National Academies report on gene drive, our results suggest that standard drive systems should not be developed nor field-tested in regions harboring the host organism" is the last sentence of the Abstract). For convenience, I will refer to these as Part I and II, even though the paper is not structured that way.
Part I is possibly the most comprehensive analysis to date of drive and resistance evolution under alternative population structure scenarios. The overall message is that drives are highly invasive, albeit only temporary, despite resistance evolution. The quantitative details are not intuitive, but I would say the qualitative conclusion is obvious. Most of the models here assume a drive fitness cost of 10%, and what is not obvious is now far a drive allele would rise before being shut down by resistance evolution – the resistance evolution is assured by the fitness cost of the drive. So one cannot immediately guess how abundant a drive allele will get before being suppressed, and indeed, that answer also depends heavily on the rate of mutations to resistance. But if we reduce that cost to 1% (or even 0, values which I think are not addressed, but I did not really check), then resistance evolution is much less of an issue, and the drive allele will get very high before it goes away. (My intuition says that the drive allele never goes away if it has no fitness cost.)
So anyone broadly familiar with the process will a priori appreciate that low-cost drives get to very high frequencies in the population. And if they get to high frequencies, they will be able to invade new populations and overcome all sorts of barriers – will easily escape many hoped-for containments. I don't mean to trivialize the effort here, but if the point of this work is to propose a halt to gene drive releases on the grounds that gene drives are invasive, I'm not sure the analyses here are necessary. (In contrast, if the point was to identify realms of parameter values in which the authors thought releases were safe, then such analyses would be needed.)
One interesting outcome of this study is the demonstration that resistance will evolve quickly and suppress further spread of the drive allele even with a relatively low drive fitness cost. This result may have relevance to proposed uses of gene drives to extinguish populations.
The results are used to bolster an opinion, expressed at the end of the paper and in the Abstract (as noted above) that gene drives should not be released where there is any possibility of escape. While I might accept some justification for this opinion, it goes far beyond the work presented in the paper. Whether a drive should or should not be released depends on many social factors, including the possible good that might come from the release. It is a decision for societies, and the role of science is to inform those decisions on the possible consequences. I thus think that such an apparently bold statement puts the authors in (what I consider to be) the indefensible position of appearing so arrogant as to claim the right to impose their value judgement on the entire world – when many scientists think that gene drives could ultimately save tens of millions of lives a year. Furthermore, the paper does not actually identify any biologically serious consequence to a drive release – the drive in these models merely spreads and modifies the genome throughout the population (which some people would object to, but which may have almost no fitness consequence).
So I would suggest (= my 'opinion') that the opinion expressed be tempered accordingly and perhaps tied more closely to the findings here: "Our results suggest that drives are highly invasive under many scenarios. If there are negative consequences of drive escape, then.…". But it could certainly be interesting to watch the reaction if the paper maintains its strong, unqualified statement. If nothing else, the authors might at least label their advice as an opinion.
This is a short, dense and interesting article in which the authors quantitatively predict allele frequencies in a variety of theoretical populations submitted to artificial gene drive. The article limits its investigation to sexual populations of diploid individuals and to alteration-type drives. The general conclusion is that the drives have high probabilities of being invasive in wild populations, even highly structured ones, unless the gene flow between subpopulations is very low. This is true even if the homing efficiency is low. Based on their quantitative simulations, the authors conclude that presently available drive systems should not be field-tested, contrary to recent conclusions of the National academies of Sciences, Engineering and Medicine. This issue is serious enough to merit publication of this article.
I found the article well documented on the CRISPR gene drive systems and the recently published laboratory assays. Their analysis presented in Appendix is very useful.
My only concern about this work is that all computations were made with the hypothesis that resistant mutations were neutral. This may be true for the experimental models reported but cannot be considered as universal. A fitness cost of the resistant mutations would immediately alter the results in Figure 1D and in Figure 2, for example, and I would urge the authors to take this parameter in consideration.
Incidentally, in the first natural gene-drive ever reported, the group I intron of yeast mitochondrial DNA discovered nearly 40 years ago, resistant mutations to the homing-endonuclease had a major cost because they fall into the peptidyl-transferase center of rRNA. The only choice left to yeast was between being sensitive to intron invasion or being severely unfit.https://doi.org/10.7554/eLife.33423.020
- Charleston Noble
- Kevin M Esvelt
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
We thank J Wakeley for helpful discussions and M Edgington for helpful comments on the manuscript. CN received support from the NSF Graduate Research Fellowship Program under grant no. DGE1144152. KME was supported by the Burroughs Wellcome Fund (IRSA 73786). MAN, KME and GMC are funded in part by DARPA under the Safe Genes program.
- Michael Doebeli, Reviewing Editor, University of British Columbia, Canada
- Received: November 9, 2017
- Accepted: May 15, 2018
- Version of Record published: June 19, 2018 (version 1)
© 2018, Noble et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.