The success of artificial selection for collective composition hinges on initial and target values
eLife Assessment
This important study of artificial selection in microbial communities shows that the possibility of selecting a desired fraction of slow and fast-growing types is impacted by their initial fractions. The evidence, which relies on mathematical analysis and simulations of a stochastic model, is compelling. It highlights the tension between selection at the strain and the community level. This study should be of interest to researchers interested in ecology, both theoretical and experimental.
https://doi.org/10.7554/eLife.97461.3.sa0Important: Findings that have theoretical or practical implications beyond a single subfield
- Landmark
- Fundamental
- Important
- Valuable
- Useful
Compelling: Evidence that features methods, data and analyses more rigorous than the current state-of-the-art
- Exceptional
- Compelling
- Convincing
- Solid
- Incomplete
- Inadequate
During the peer-review process the editor and reviewers write an eLife Assessment that summarises the significance of the findings reported in the article (on a scale ranging from landmark to useful) and the strength of the evidence (on a scale ranging from exceptional to inadequate). Learn more about eLife Assessments
Abstract
Microbial collectives can perform functions beyond the capability of individual members. Enhancing collective functions through artificial selection is, however, challenging. Here, we explore the ‘rafting-a-waterfall’ metaphor where achieving a target population composition depends on both target and initial compositions. Specifically, collectives comprising fast-growing (F) and slow-growing (S) individuals were grown for ‘maturation’ time, and the collective with S-frequency closest to the target value is chosen to ‘reproduce’ via inoculating offspring collectives. During collective maturation, intra-collective selection acts like a waterfall, relentlessly driving the S-frequency to lower values, while during collective reproduction, inter-collective selection resembles a rafter striving to reach the target frequency. Using simulations and analytical calculations, we show that intermediate target S frequencies are the most challenging, akin to a target within the vertical drop of a waterfall, rather than above or below it. This arises because intra-collective selection is the strongest at intermediate S-frequencies, which can overpower inter-collective selection. While achieving a low target S frequencies is consistently feasible, attaining high target S-frequencies requires an initially high S-frequency — much like a raft that can descend but not ascend a waterfall. As Newborn size increases, the region of achievable target frequency is reduced until no frequency is achievable. In contrast, the number of collectives under selection plays a less critical role. In scenarios involving more than two populations, the evolutionary trajectory must navigate entirely away from the metaphorical ‘waterfall drop.’ Our findings illustrate that the strength of intra-collective evolution is frequency-dependent, with implications in experimental planning.
Introduction
Microbial collectives can carry out functions that arise from interactions among member species. These functions, such as waste degradation (Woo et al., 2020; Sun et al., 2022), probiotics (Bober et al., 2018), and vitamin production (Wang et al., 2016), can be useful for human health and biotechnology. To improve collective functions, one can perform artificial selection (directed evolution) on collectives (see Figure 1): Low-density ‘Newborn’ collectives are allowed to ‘mature’ during which cells proliferate and possibly mutate, and community function develops. ‘Adult’ collectives with high functions are then chosen to reproduce, each seeding multiple offspring Newborns. Artificial selection of collectives have been attempted both in experiments (Goodnight, 1990; Swenson et al., 2000b; Swenson et al., 2000a; Blouin et al., 2015; Panke-Buisse et al., 2015; Panke-Buisse et al., 2017; Jochum et al., 2019; Wright et al., 2019; Raynaud et al., 2019; Arora et al., 2020; Chang et al., 2020; Mueller et al., 2021; Jacquiod et al., 2022; Raynaud et al., 2022; Arias-Sánchez et al., 2024) and in simulations (Penn, 2003; Penn and Harvey, 2004; Williams and Lenton, 2007; Xie et al., 2019; Doulcier et al., 2020; Xie and Shou, 2021; Chang et al., 2021; Fraboul et al., 2023; Lalejini et al., 2022; Zaccaria et al., 2023; Vessman et al., 2023), often with unimpressive outcomes.

Schematic for artificial selection on collectives.
Each selection cycle begins with a total of Newborn collectives, each with total cells of slow-growing S population (light gray dots) and fast-growing F population (dark gray dots). During maturation (over time ), S and F cells divide at rates and (), respectively, and S mutates to F at rate . During inter-collective selection, the Adult collective with F frequency closest to the target composition is chosen to reproduce Newborns for the next cycle. Newborns are sampled from the chosen Adult (yellow star) with cells per Newborn. The selection cycle is then repeated until the F frequency reaches a steady state, which may or may not be the target composition. To denote a variable of -th collective in cycle at time (), we use notation where . Note that time is for Newborns and is for Adults.
One of the major challenges in selecting collectives is to ensure the inheritance of a collective function (Xie et al., 2023; Thomas et al., 2024). Inheritance from a parent collective to offspring collectives can be compromised by changes in genotype and species compositions. During maturation of a collective, genotype compositions within each species can change due to intra-collective selection favoring fast-growing individuals (Figure 1, ‘intra-collective’ selection), while species compositions can change due to ecological interactions. Furthermore, during the reproduction of a collective, genotype and species compositions of offspring can vary stochastically from those of the parent (Figure 1, ‘genetic drift’).
Here, we consider the selection of collectives comprising two or three populations with different growth rates, and our goal is to achieve a target composition in the Adult collective. This is a common quest: whenever a collective function depends on both populations, the collective function is maximized, by definition, at an intermediate frequency (e.g. too little of either population will hamper function; Xie et al., 2019). Earlier work has demonstrated that nearly any target species composition can be achieved when selecting communities of two competing species with unequal growth rates (Doulcier et al., 2020; Rainey, 2023), so long as the shared resource is depleted during collective maturation (Doulcier et al., 2020). In this case, initially, both species evolved to grow faster, and the slower-growing species was preserved due to stochastic fluctuations in species composition during collective reproduction. Eventually, both species evolved to grow sufficiently fast to deplete the shared resource during collective maturation, and evolution in competition coefficients then acted to stabilize the species ratio to the target value (Doulcier et al., 2020). Regardless, earlier studies are often limited to numerical explorations, with prohibitive costs for a full characterization of the parameter space for such nested populations (population of collectives, and populations of variants within a collective).
We mathematically examine the selection of composition in collectives consisting of populations growing at different rates. We made simplifying assumptions so that we can analytically examine the evolutionary tipping point between intra-collective and inter-collective selection. We show that this tipping point creates a ‘waterfall’ effect which restricts not only which target compositions are achievable, but also the initial composition required to achieve the target. We also investigate how the range of achievable target composition is affected by the total population size in Newborns and the total number of collectives under selection. Finally, we show that the waterfall phenomenon extends to systems with more than 2 populations.
Results and discussion
To enable the derivation of an analytical expression, we have made the following simplifying assumptions. First, growth is always exponential, without complications such as resource limitation, ecological interactions between the two populations, or density-dependent growth. Thus, the exponential growth equation can be used. Second, we initially consider only two populations (genotypes or species): the fast-growing F population with size and the slow-growing S population with size . We do not consider a spectrum of mutants or species, since with more than two populations, an analytical solution becomes very difficult. Finally, the single top-functioning community is chosen to reproduce, which allows us to employ the simplest version of the extreme value theory (see section below for further justification).
Our goal is to select for collective composition in terms of F frequency , or equivalently, S frequency . More precisely, we want collectives such that after maturation time , is as close to the target value as possible (Figure 1). Note that even if the target frequency has been achieved, since F frequency will always increase during maturation, inter-collective selection is required in each cycle to maintain the target frequency.
We will start with a complete model where S mutates to F at a nonzero mutation rate . We made this choice because it is more challenging to attain or maintain the target frequency when the abundance of fast-growing F is further increased via mutations. This scenario is encountered in biotechnology: an engineered pathway will slow down cell growth, and breaking the pathway (and thus faster growth) is much easier than the other way around. When the mutation rate is set to zero, the same model can be used to capture collectives of two species with different growth rates. We show that intermediate F frequencies or equivalently, intermediate S frequencies, are the hardest targets to achieve. We then show using simulations that similar conclusions hold when selecting for a target composition in collectives of three populations.
Model structure
A selection cycle (Figure 1; Table 1) starts with a total of Newborn collectives. At the beginning of cycle (), each Newborn collective has a fixed total cell number where and denote the numbers of S and F cells in collective () at time () of cycle . The average F frequency among the Newborn collectives in cycle is , such that the initial F cell number in each Newborn is drawn from the binomial distribution .
Nomenclature.
Variables | Representing |
---|---|
Number of slower-growing (S) cells | |
Number of faster-growing (F) cells | |
Total cell numbers in a collective, | |
Frequency of S cells, | |
Frequency of F cells, | |
F frequency of the selected collective in a cycle | |
Parameters | Representing |
Growth rate of S | |
Growth rate advantage of F over S | |
Mutation rate from S to F | |
Total number of collectives | |
Maturation time | |
Total number of cells in Newborn, or Newborn size | |
Target frequency in or . | |
Low and High thresholds of inaccessible | |
Fold-growth of S cells over time , | |
Fold ratio change of F cells over S cells over time , |
Collectives are allowed to grow for time (‘Maturation’ in Figure 1). During maturation, S and F grow at rates and (), respectively. If maturation time is too small, a matured collective (‘Adult’) does not have enough cells to reproduce Newborn collectives with cells. On the other hand, if maturation time is too long, fast-growing F will take over. Hence, we set the maturation time , which guarantees sufficient cells to produce Newborn collectives from a single Adult collective. At the end of a cycle, a single Adult with the highest function (with F frequency closest to the target frequency ) is chosen to reproduce Newborn collectives, each with cells (‘Selection’ and ’Reproduction’ in Figure 1). Note that even though S and F do not compete for nutrients, they compete for space: because the total number of cells transferred to the next cycle is fixed, an overabundance of one population will reduce the likelihood of the other being propagated.
Collective function is dictated by the Adult’s F frequency . Among all Adult collectives, the selected Adult is the one whose F frequency is closest to the target value, . In contrast with findings from an earlier study (Xie et al., 2019), choosing top 1 is more effective than the less stringent ‘choosing top 5%.’ In the earlier study, variation in the collective trait is partly due to nonheritable factors such as random fluctuations in Newborn biomass. In that context, a less stringent selection criterion proved more effective, as it helped retain collectives with favorable genotypes that might have exhibited suboptimal collective traits due to unfavorable non-heritable factors. However, since this study excludes non-heritable variations in collective traits, selecting the top 1 collective is more effective than selecting the top 5% (see Appendix 7—figure 1).
The selected Adult, with F frequency denoted as , is then used to reproduce offspring collectives, each with total cells. The number of F cells in a newborn follows a binomial distribution . By repeating the selection cycle, we aim to achieve and maintain the target composition .
Overall, our model considers mutational stochasticity, as well as demographic stochasticity in terms of stochastic birth and stochastic sampling of a parent collective by offspring collectives. Other types of stochasticity, such as environmental stochasticity and measurement noise, are not considered and require future research.
The success of collective selection is constrained by the target composition, and sometimes also by the initial composition
Since intra-collective selection favors F, we expect that a higher target (a lower target ) is easier to achieve. By ‘achieve,’ we mean that the absolute error between the target frequency and the selected frequency averaged among independent simulations is smaller than 0.05 (i.e.).
We fixed the total population size of a Newborn to 1000, and obtained selection dynamics for various initial and target F frequencies by implementing stochastic simulations (Appendix 1). If the target is high (e.g. 0.9, Figure 2a magenta), selection is successful (computed absolute errors Appendix 1—figure 4): regardless of the initial frequency, of the chosen collective eventually converges to the target and stays around it. In contrast, without collective-level selection (e.g. choosing a random collective to reproduce), F frequency increases until F reaches fixation (Supplementary information Appendix 1—figure 3b).

Initial and target compositions determine the success of artificial selection on collectives.
(a–c) F frequency of the selected Adult collective () over cycles at different target values (long dashed lines). between and (orange dotted and solid line segments) is inaccessible where selection will fail. (a) A high target F frequency (e.g.; magenta) can be achieved from any initial frequency (black dots). (b) An intermediate target frequency (e.g.; green) is never achievable, as all initial conditions converge to . (c) A low target frequency (e.g. ; dark blue) is achievable, but only from initial frequencies below . For initial frequencies at , stochastic outcomes (gray curves) are observed: while some replicates reached the target frequency, others reached . For parameters, we used S growth rate , F growth advantage , mutation rate , maturation time , and . The number of collectives . Each black line is averaged from independent 300 realizations. (d) Inter-collective selection opposes intra-collective selection. We plot probability density distributions of F frequency during two consecutive cycles when selection is successful. Data correspond to cycles 31 and 32 from the second lowest initial point in c. is the selection progress within a cycle (see Box 1). Black triangle: median. (e) Two accessible regions (gold). Either high (; region 2) or low starting from low initial ( and ; region 1) can be achieved. We theoretically predict (by numerically integrating Equation 1) (orange solid line) and (orange dotted line), which agree with simulation results (gold regions). (f) Example trajectories from initial compositions (black dots) to the target compositions (dashed lines). The gold areas indicate the region of initial frequencies where the target frequency can be achieved. (g) The tension between intra-collective selection and inter-collective selection creates a ‘waterfall’ phenomenon. See the main text for details.
Changes in the distribution of F frequency after one cycle
We consider the case where , the F frequency of the selected Adult at cycle , is above the target value (). This case is particularly challenging because intra-collective evolution favors fast-growing F and thus will further increase away from the target. From , Newborns of cycle will have fluctuating around , and after they mature, the minimum is selected (). If the selected composition at cycle can be reduced compared to that of cycle (i.e. ), the system can evolve to the lower target value.
To find values such that , we used the median value of the conditional probability distribution of given the selected at cycle (mathematical details in Appendix 2). If the median value () is smaller than , then selection will likely be successful since the selected Adult in cycle has more than 50% chance to have a reduced F frequency compared to cycle .
There are two points where the median values are the same as (Figure 3a), which are assigned as lower-threshold () and higher-threshold ().
Following the extreme value theory, the conditional probability density function is
Equation 1 can be described as the product between two terms related to probability: (i) describes the probability density that any one of the Adult collectives achieves given , and (ii) describes the probability that all other collectives achieve frequencies above and thus not selected.
Since computing the exact formula of Adults’ distribution in cycle is hard, we approximate it as Gaussian with mean and variance . The Gaussian approximation on Equation 1 requires sharp Gaussian distributions of and (i.e. and ). Compared to Gaussian, the exact (negative binomial) distribution and (Luria-Delbrück) distribution are right-skewed and heavy-tailed. However, these problems are alleviated when the initial numbers of and cells are not small (on the order of 100). Indeed, the sharpness of distributions could be achieved (see Appendix 1—figure 1).
To obtain an analytical solution of the change in over one cycle, we first assume that in a Newborn collective, the number of S cells is distributed as Gaussian with mean and variance . Then, the number of F cells, , is distributed as Gaussian with mean and variance . From these, we can calculate for Adult collectives the mean and variance of population sizes (i.e. , ) and (i.e. , ) (mathematical details in Appendix 1). This task is simplified by the exponential growth of S and F: describes the fold growth of S over maturation time , and since is the fitness advantage of F over S, describes the fold change of F/S over time . From , , (mutation rate scaled with the fitness difference), (F frequency in the selected collective at cycle ), (Newborn size), (relative fitness advantage), we can calculate the mean and variance of F frequency among the Adults of cycle (, detailed formula in Equations 48 and 49).
Selection progress - the difference between the median value of the conditional probability distribution and the selected frequency of (Appendix 2) - can be expressed as:
where is the inverse cumulative function of standard normal distribution (see main text for an example). We chose the median because compared to the mean, it is easier to get an analytical expression since is known in a closed form. Regardless, using median generated results similar to simulations (Appendix 2—figure 3). As expected, selection progress is governed by both the mean () and the variation () in among Adults.
When the mutation rate , and can be simplified to:
and
In the limit of small , Equation 3 becomes while Equation 4 becomes . Thus, both Newborn size () and fold-change in F/S during maturation () are important determinants of selection progress.
In contrast, an intermediate target frequency (e.g. ; Figure 2b green) is never achievable. High initial F frequencies (e.g. 0.95) decline toward the target but stabilize at the ‘high-threshold’ (∼ 0.7, solid orange line segment in Figure 2a-c) above the target. Low initial F frequencies (e.g. 0) increase toward the target, but then overshoot and stabilize at the value.
If the target frequency is low (e.g. ; Figure 2c dark blue), artificial selection succeeds when the initial frequency is below the ‘lower-threshold’ (dotted orange line segment in Figure 2a-c). Initial F frequencies above (e.g. 0.45 and 0.95) converge to instead. Initial F frequencies near display stochastic trajectories, converging to either or .
To achieve target , inter-collective selection must overcome intra-collective selection. We can visualize the distributions of over two consecutive cycles (bottom to top, Figure 2d) where started above target . When newborns matured into adults, the distribution of up-shifted due to intra-collective selection. The distribution of was then down-shifted toward the target due to inter-collective selection. If the magnitude of down-shift exceeded that of up-shift, progress toward the target was made. During reproduction of collectives, the distribution of retained the same mean but became broader due to stochastic sampling by the Newborns from their parent.
In summary, two regions of target frequencies are ‘accessible’ (gold in Figure 2e, f; Box 1): (1) target frequencies above () or (2) target frequencies below () and starting at an average frequency below ().
Intra-collective evolution is the fastest at intermediate F frequencies, creating the ‘waterfall’ phenomenon
To understand what gives rise to the two accessible regions, we calculated , the selection progress in F frequency over two consecutive cycles (Box 1, Equation 2). The solution (Figure 3a, green) has the same shape as results from numerically integrating Equation 1 (Figure 3a, orange) and from stochastic simulations (Figure 3a, blue).

Intra-collective selection and inter-collective selection jointly set the boundaries for selection success.
(a) The change in F frequency over one cycle. When is sufficiently low or high, inter-collective selection can lower the F frequency to below (). The points where (in the orange line) are denoted as and , corresponding to the boundaries in Figure 2. (b) The distributions of frequency differences obtained by 1000 numerical simulations. The cyan, purple, and black box plots respectively indicate the changes in F frequency after intra-collective selection (the mean frequency among the 100 Adults minus the mean frequency among the 100 Newborns during maturation), after inter-collective selection (the frequency of the 1 selected Adult minus the mean frequency among the 100 Adults), and over one selection cycle (the frequency of the selected Adult of one cycle minus that of the previous cycle). The box ranges from 25% to 75% of the distribution, and the median is indicated by a line across the box. The upper and lower whiskers indicate maximum and minimum values of the distribution. ***p<0.001 in an unpaired -test.
If is negative, then inter-collective selection will succeed in countering intra-collective selection and reducing toward the target. is negative if the selected is low or high, but not if it is intermediate between and (Figure 3a). This is because the increase in during maturation is the most drastic when Newborn is intermediate (Figure 3b), for intuitive reasons: when Newborn is low, the increase in will be minor; when Newborn is high, the fitness advantage of F over the population average is small and hence the increase is also minor. Thus, when Newborn F frequency is intermediate, intra-collective selection is the strongest and may overwhelm inter-collective selection (Figure 3b and Appendix 2—figure 2a). Not surprisingly, similar conclusions are derived where S and F are slow-growing and fast-growing species which cannot be converted through mutations (Appendix 4 and Appendix 4—figure 1).
Thus, inter-collective selection is akin to a raftman rowing the raft to a target, while intra-collective selection is akin to a waterfall. This metaphor is best understood in terms of S frequency . The lower-threshold corresponds to higher-threshold in . Intra-collective selection is akin to a waterfall, driving the S frequency from high to low (Figure 2g). Intra-collective selection acts the strongest when is intermediate (), similar to the vertical drop of the fall. Intra-collective selection acts weakly at high () or low () , similar to the gentle sloped upper and lower pools of the fall (regions 1 and 2 of Figure 2e and g). Thus, an intermediate target frequency can be impossible to achieve: a raft starting from the upper pool will be flushed down to (), while a raft starting from the lower pool cannot go beyond (). In contrast, a low target S frequency (in the lower pool) is always achievable. Finally, a high target S frequency (in the upper pool) can only be achieved if starting from the upper pool (as the raft cannot jump to the upper pool if starting from below).
Manipulating experimental setups to expand the achievable target region
In Equation 2; Box 1, selection progress depends on the total number of collectives under selection (). also depends on the mean and the standard deviation of Adult F frequency — and . Equations 3 and 4 of Box 1 provide simplified expressions of and when mutation rate has been set to 0. When the mutation rate is not zero (Equations 48 and 49 in Appendix 2), selection progress is additionally influenced by (mutation rate scaled with fitness difference ).
Our goal is to make as negative as possible so that any increase in during collective maturation may be reduced. From Equation 2 in Box 1, a small will facilitate collective-level selection. Additionally, a large will also facilitate collective-level selection due to negative . Note that since <0.5 for , — corresponding to the number such that the probability of a standard normal random variable being less than or equal to is — is negative.
From Equation 4 in Box 1, will be large if Newborn size is small. Indeed, as Newborn size declines, the region of achievable target frequency expands (gold area in Figure 4a). If the Newborn size is sufficiently small (e.g. ≤ 700 in our parameter regime), any target frequency can be reached. An analytical approximation of the maximal Newborn size permissible for all target frequencies is given in Appendix 3.

Expanding the region of success for artificial collective selection.
(a) Reducing the population size in Newborn expands the region of success. In the gold area, the probability that becomes smaller than in a cycle is more than 50%. We used and . Figures 2–3 correspond to in this graph. Black dotted line indicates the critical Newborn size below which all target frequencies can be achieved. (b) Increasing the total number of collectives also expands the region of success, although only slightly. We used a fixed Newborn size . The maturation time is set to be long enough so that an Adult can generate at least 100 Newborns. (c) Increasing the maturation time shrinks the region of success. We used a fixed Newborn size and number of collectives .
From Equations 3 and 4 in Box 1, maturation time affects and through (the fold change in F/S over ), and affects additionally through (fold-growth of S over ). Longer increases and is thus detrimental to selection progress. The relationship between and is not monotonic (Appendix 2—figure 2c), meaning that an intermediate value of is the best for achieving large . However, the effect of dominates that of and therefore, the region of success monotonically reduces with longer maturation time (Figure 4c). Similarly, will be small if (fitness advantage of F over S) is small. Indeed, as becomes larger, the region of success becomes smaller (Appendix 5—figure 1).
, the number of collectives under selection, also affects selection outcomes. As increases, the value of becomes more negative, and so does — meaning collective-level selection will be more effective. Intuitively, with more collectives, the chance of finding a closer to the target is more likely. Thus, a larger number of collectives broadens the region of success (Figure 4b). However, the effect of is not dramatic. To see why, we note that the only place that appears is Equation 2 in . When becomes large, is asymptotically expressed as (Appendix 2) (Phllip, 1960), and thus does not change dramatically as varies.
The waterfall phenomenon in a higher dimension
To examine the waterfall effect in a higher dimension, we investigate a three-population system where a faster-growing population (FF) grows faster than the fast-growing population (F) which grows faster than the slow-growing population (S) (Figure 5a and Appendix 8—figure 1). In the three-population case, the evolutionary trajectory travels in a two-dimensional plane. A target population composition can be achieved if inter-collective selection can sufficiently reduce the frequencies of F as well as FF (accessible regions, gold in Figure 5b).

In higher dimensions, the success of artificial selection requires the entire evolutionary trajectory remaining in the accessible region.
(a) During collective maturation, a slow-growing population (S) (with growth rate ; light gray) can mutate to a fast-growing population (F) (with growth rate ; medium gray), which can mutate further into a faster-growing population (FF) (with growth rate ; dark gray). Here, the rates of both mutational steps are , and . (b) Evolutionary trajectories from various initial compositions (open circles) to various targets (filled triangles). Intra-collective evolution favors FF over F (vertical blue arrow) over S (horizontal blue arrow). The accessible regions are marked gold (see Appendix 1). We obtain final compositions starting from several initial compositions while aiming for different target compositions in i, ii, and iii. The evolutionary trajectories are shown in dots with color gradients from initial time (light grey) to final time (dark grey). (i) A target composition with a high FF frequency is always achievable. (ii) A target composition with intermediate FF frequency is never achievable. (iii) A target composition with low FF frequency is achievable only if starting from an appropriate initial composition such that the entire trajectory never meanders away from the accessible region. The figures are drawn using the mpltern package (Ikeda et al., 2019). (c) The accessible region in the three-population problem is interpreted as an extension of the two-population problem. First, the accessible region between FF and S+F is given, and then the S+F region is stretched into S and F.
From numerical simulations, we identified two accessible regions: a small region near FF and a band region spanning from S to F (gold in Figure 5b i). Intuitively, the rate at which FF grows faster than S+F is greater than the rate at which F grows faster than S (see Appendix 8). Thus, the problem can initially be reduced to a two-population problem (i.e. FF versus F+S; Figure 5c left), and then expanded to a three-population problem (Figure 5c right).
Similar to the two-population case, targets in the inaccessible region are never achievable (Figure 5b ii), while those in the FF region are always achievable (Figure 5b i). Strikingly, a target composition in an accessible region may not be achievable even when the initial composition is within the same region: once the composition escapes the accessible region, the trajectory cannot return back to the accessible region (Figure 5biii, the leftmost initial condition). However, if the initial position is closer to the target in the accessible region, the target becomes achievable (Figure 5b iii, initial condition near the bottom). Note that here, the selection outcome is path-dependent in the sense of being sensitive to initial conditions. This phenomenon is distinct from hysteresis, where path-dependence results from whether a tuning parameter is increased or decreased.
In conclusion, we have investigated the evolutionary trajectories of population compositions in collectives under selection, which are governed by intra-collective selection (which favors fast-growing populations) and inter-collective selection (which, in our case, strives to counter fast-growing populations). Intra-collective selection has the strongest effect at intermediate frequencies of faster-growing populations, potentially creating an inaccessible region of target frequency analogous to the vertical drop of a waterfall. High and low target frequencies are both accessible, analogous to the lower and the upper pools of a waterfall, respectively. A less challenging target (high ; low ) is achievable from any initial position. In contrast, a more challenging target (low ; high ) is only achievable if the entire trajectory is contained within the region, similar to a raft striving to reach a point in the upper pool must start at and remain in the upper pool. Our work suggests that the strength of intra-collective selection is not constant, and that strategically choosing an appropriate starting point can be essential for successful collective selection.
Materials and methods
Stochastic simulations
Request a detailed protocolA selection cycle is composed of three steps: maturation, selection, and reproduction. At the beginning of the cycle , a collective has slow-growing cells and fast-growing cells. At the first cycle, the mean F frequency of collectives is set to be is sampled from the binomial distribution with mean . Then, S cells are in the collective . In the maturation step, we calculate and by using stochastic simulation. We can simulate the division and mutation of each individual cell stochastically by using the tau-leaping algorithm (Gillespie, 2001; Cao et al., 2006; see Appendix 1—figure 3). However, individual-based simulations require a long computing time. Instead, we randomly sample and from the joint probability density distribution . To obtain , we solve the master equation which describes the time evolution of the probability distribution under the random processes (see Appendix 1). We assumed that are independent (as S and F populations grow independently without ecological interactions), and thus is product of two probability density functions . Each distribution follows a Gaussian distribution, with the mean and variance numerically obtained from ordinary differential equations derived from the master equation (see Appendix 1). We choose the collective with the closest frequency to the target to generates Newborns. The number of F cells is sampled from the binomial distribution with the mean of . We start a new cycle with those Newborn collectives. Then, the number of S cells in a collective is .
Analytical approach to the conditional probability
Request a detailed protocolThe conditional probability distribution of observing at a given is calculated by the following procedure. Given the selected collective in cycle with , the collective-level reproduction proceeds by sampling Newborn collectives with cells in cycle . Each Newborn collective contains certain F numbers at the beginning of the cycle , which can be mapped into with the constraint of cells. If the number of cells in the selected collective is large enough, the joint conditional distribution function is well described by the product of independent and identical Gaussian distribution . So we consider the frequencies of Newborn collectives as identical copies of the Gaussian random variable . The mean and variance of are given by and . Then, the conditional probability distribution function of being is given by
After the reproduction step, the Newborn collectives grow for time . The frequency is changed from the given frequency to by division and mutation processes. We assume that the frequency of an Adult is also approximated by a Gaussian random variable . The mean and variance are calculated by using means and variances of and (see Appendix 2). Since and also depend on , the conditional probability distribution function of being is given by
The conditional probability distribution of an Adult collective in cycle () to have frequency at a given is calculated by multiplying two Gaussian distribution functions and integrating overall values, which is given by
Since we select the minimum frequency among identical copies of , the conditional probability distribution function of follows a minimum value distribution, which is given in Equation 1. Here, for the case of , the selected frequency is the minimum frequency . So we have by replacing with .
We assume that the conditional probability distribution in Equation 7 follows a normal distribution, whose mean and variances are described by Equation 48 and Equation 49. Then, the extreme value theory (Gumbel, 1958) estimates the median of the selected Adult by
The selection progress in Equation 2 is obtained by subtracting from Equation 8.
Appendix 1
Stochastic simulation of the selection cycle
In the main text, we design a simple model of artificial selection on collectives. The selection cycle starts with ‘Newborn’ collectives which consist of two populations - slow-growing population (S) and fast-growing population (F). S mutates to F at a rate . The newborns mature for a fixed time . The matured collective (‘Adult’) with the highest function (with F frequency closest to the target ) is chosen to reproduce Newborn collectives, each with cells.
In our selection cycle, variation among collectives mainly resulted from demographic noises during cell birth, cell mutation, and collective reproduction. In this section, we provide details of the simulation.
Maturation
Here, we calculate the cell numbers during maturation. Each collective () has S cells and F cells where is the cycle number and indicates time (). At the beginning of cycle (), each Newborn collective has a total of cells. The collectives are allowed to ‘mature’ for during which S and F grow at rates and (), respectively. In this subsection, we ignore the cycle number index and the collective index for convenience. That is, we denote and as and , respectively.
We describe cell divisions of S and F cells and mutation from S to F with the following chemical reaction rules:
One can run an individual-based simulation by counting the number of events occurring during collective maturation via the tau-leaping algorithm (Gillespie, 2001; Cao et al., 2006) to generate a sample trajectory of and for each collective. However, the individual-based simulation requires long computing times due to a large number of random events to be counted. Hence, we used a ‘sampling method’ by sampling the numbers of S and F cells in collectives from a joint probability density distribution (jpdf) which denotes the probability density to have number of S cells and number of F cells at time in the cycle. To do so, we require an analytical expression of .
First, we assume that the chemical reactions in Equations 9–11 occur independently, and never occur simultaneously within a short time interval . Then, the differential of with respect to time is given by
This master equation describes a probability density ‘flux’ at the state . The first term describes the scenario where a single birth event of a S cell happens during time interval , which changes the collective’s composition from to . Similarly, the second term comes from a birth event of an F cell. The third term indicates the mutation event from to . The last term corresponds to the outflow of probability density by birth and mutation processes, which describes the changes from to any other states.
Calculating the exact form of is not simple. Instead, we assume that the mutation rate is much smaller than the growth rates, and hence the correlation between and is sufficiently small. Additionally, S and F do not interact ecologically. Then, we can express as a product of two probability density functions (pdf) of and , . We assume that each pdf of and can be approximated as Gaussian (), which is supported by the Central Limit Theorem and Appendix 1—figure 1. In more detail, the cell numbers and are mainly determined by growth (Equations 9; 10), and also mutations (Equation 11). Even though the number of events would be different among different realizations, the mean numbers of events will follow Gaussian distributions. So, we can simply assume that the distributions of cell numbers also follow Gaussian distributions. This assumption requires that the distributions have insignificant skewness and no heavy tails, which we will numerically check afterwards. The pdfs of and are given by
and
That is, is written as
Now we need means ( and ) and variances ( and ) of S and F cell numbers to express the distribution analytically.
The means are defined by and . The differential equations for means are obtained by applying the definition to the master equation in Equation 12, as
We assume that the mutation rate is much smaller than and . By solving Equation 16 and Equation 17, the means and are given by
where and are the mean numbers of S and F cells at the beginning of cycle, . Note that the second term of Equation 19 is consistent with previous studies (Zheng, 1999). Now we introduce factors and in Equations 18; 19 in order to simplify the formula. is the multiplying factor by which the S cell number increases after time . is the fold change in . Then, we can rewrite
We define the second momenta of and as
Then, the corresponding differential equations are given by
The solution of Equation 24 is
where is the second moment of initial values. Thus, the variance is
where is a variance of S cell numbers at . In Equation 25, we require to calculate . Equation 12 provides a differential equation for as
The solution of Equation 30 is given by
By using Equation 31, the solution of Equation 25 is given by
where is the second moment of initial values. Thus, the variance is given, up to the order of , by
Using and , we rewrite
Using Equations 18; 19; 28; 33, we construct pdfs for and at the end of cycle . Then, we randomly sample a number from for and another number from for . Those two numbers are cell numbers in a single Adult. We repeat this process for each Newborn to get cell numbers of all Adults. Note that the initial values for the Newborn are and . This process only requires two random numbers per collective, while the result is consistent with the individual-based simulation.
Now, we check the validity of the Gaussian approximation for probability density functions of S and F populations. If we consider mutation from S to F as death in the S population, then the process in S corresponds to a branching process with death. Also, the birth process in F, including mutation, results in a Luria-Delbrück distribution (Zheng, 1999). Thus, the distributions of Adults’ S and F numbers are more skewed and heavy-tailed than Gaussian. But this problem is alleviated by larger initial S and F numbers and when the maturation time is not very long (see Appendix 1—figure 1). Since we usually consider larger initial cell numbers, we use the Gaussian approximation on S and F populations in further calculations.

Comparison between the calculated Gaussian distribution (‘Gauss,’ with the mean and variances computed from Equations 18; 19; 28; 33) and simulations using tau-leaping (‘tau’).
The simulations run 3000 times. The initial number of cells are , and for each column. The parameters , , , and are used.
Selection
After sampling cell numbers of each Adult in the maturation step, we compute the F frequencies in each collectives . We denote the F frequency of collective at time in cycle as . Among the Adults, we select one collective with the F frequency which is the closest value to the target frequency . The selected Adult’s F frequency value is denoted by . In mathematical expression, the selected frequency is defined by
Reproduction
Using the chosen Adult, we generate Newborn collectives for the next cycle . The most natural way is consecutive random sampling cells from the selected Adult without replacement. In the mathematical expression, we first randomly sample F cells and draw S cells from the selected Adult. Next, we sample F cells and S cells from the remaining cells in the Adult. We repeat the process times. Then the jpdf to choose F cells, , follows a multivariate hypergeometric distribution.
If we assume that the selected Adult size is large enough compared to Newborn size , the consecutive sampling is well approximated to the independent binomial sampling (see Appendix 1—figure 2). Thus, we independently sample numbers of F cells, , from the binomial distribution. The probability mass function of each is given by
After sampling, the numbers of S cells are set to be for each collective. We can now start cycle with these Newborn collectives. By repeating the above three steps (maturation, selection, and reproduction), we run the simulation until F frequency reaches a stationary state.

Congruence between consecutive sampling (MHG for multivariate hypergeometric distribution) and independent binomial (BN) sampling.
The initial number of cells are and for the left panel, and and for the right panel. 10,000 samples are drawn for each distribution. Here, a parent collective is divided into 10 collectives.
Simulation result
Appendix 1—figure 3 presents the composition trajectories of all collectives using the tau-leaping algorithm in the maturation step. The selected adults have the closest composition to the target composition . The selected Adult can have smaller F frequency than its parent Adult, so F frequency can be lowered after cycles.

Trajectories of F frequency for 10 collectives () over time.
(a) The collective whose frequency is closest to the target value is selected in every cycle (black lines). The gray lines denote the other collectives. For parameters, we used S growth rate , F growth advantage , mutation rate , maturation time , and . (b) Comparison between frequency trajectories with selection (the chosen one Adult producing all offspring; black) and without selection (each Adult producing one offspring; blue) clearly shows the effect of artificial selection. The black line indicates F frequency of the selected collective at each cycle in (a). The blue line indicates the average trajectory without selection (the average of individual lineages without inter-collective selection at the end of each cycle).
In Appendix 1—figure 4, we plot the absolute error between the target frequency and (i.e. ) at the end of simulations (1000 cycles). Since the computing time for the Tau-leaping algorithm (individual-based simulation) to reach 1000 cycles is very long, we used the sampling scheme in the above subsection. In the colormap, errors higher than 0.15 are marked with gray, which indicates selection failure. The dashed lines indicate the same boundary in Figure 2e in the main text.

Color map of the absolute error averaged selected collectives at the end of simulations () and the target frequency .
The solid and dashed lines are drawn by the arguments in the main text. For parameters, we used , , and . The result is the average of 300 independent simulations. Compared to Figure 2e, this figure has a higher resolution.
Appendix 2
Conditional probability distribution of the selected collective frequency and selection progress
In the main text, we identify the region of success by using selection progress , which is obtained from the conditional pdf of (the F frequency of the selected Adult at cycle ) given the selected at cycle , written as . We consider the challenging case where is above the target value (), and therefore the Adult with minimum F frequency will be selected. To get an analytical expression of , we first find the conditional pdf of of Adults in cycle given at cycle . Then, we find from the minimum value distribution of F frequencies among Adults. Below, we describe the mathematical details of this process.
Let us start from the reproduction step from the selected Adult in cycle . We reproduce Newborns in the next cycle Then the probability distribution of the F cell numbers in Newborn collectives is given in Equation 36. If the total number of cells in a Newborn collective is large enough, Equation 36 is approximated by the Gaussian distribution . Then, the probability density function that to be in Newborn collective is
The Newborn collective has initial cell numbers and . From here, we ignore cycle index in subscript and superscript for convenience.
Next, we write the conditional pdf of Adults’ F frequency with given Newborn F frequency . We assume that cell numbers in Adult and follow Gaussian distributions as in Equations 13; 14. Based on Equations 18; 19; 28; 33, we have
where and are random variables following the standard distribution . Note that each Gaussian is sharp if Newborn size is sufficiently large ( and ). Then, we can approximately write as
The mean of is given by
and the variance is
where and . The average Adult size is . Thus, the Adult’s F frequency follows the Gaussian distribution whose pdf is given by
Next, we get the conditional pdf of (offspring Adult’s F frequency in cycle ) given . We multiply Equations 37 and 43 and take the integral over :
After maturation in cycle , the Adult with the smallest frequency is selected among Adult collectives, denoted as . The pdf of is obtained by the theory of extreme value statistics (Gumbel, 1958). The cumulative distribution function (cdf) of the minimum value is given by
Since frequencies are independent and identically distributed, . Note that , and Equation 45 becomes
Then, the probability density function is obtained by differentiating Equation 46 with respect to and replacing ,
We compute the probability density function Equation 47 by using numerical integration and compare it with the stochastic simulation results in Appendix 2—figure 1. The two distributions are similar.
To get the analytic approximation of the median of Equation 47, we assume that the Adult’s F frequency distribution is Gaussian. Then we only need to calculate the mean and variance of Adult’s F frequency. Instead of calculating the integral with respect to in Equation 44, we put a set of initial values from Newborn’s F frequency distribution in Equations 20, 21, 28 and 34: , , , , and . Then we have
which give rise to Equation 3 and Equation 4 in the main text, respectively. The functional form of Equations 48; 49 are plotted in Appendix 2—figure 2a.
The median () of Equation 47 satisfies , which means . If we assume that the distribution Equation 47 is Gaussian, then the inverse function can be written as
where is an inverse cumulative density function (CDF) of the normal distribution with mean in Equation 49 and standard deviation , a square root of Equation 49. Subtracting from Equation 50 gives the selection progress
which is Equation 2 in the main text.
Furthermore, we get an asymptotic expression of when is large (or with small ). Here, we introduce a method from Phllip, 1960. We start from the CDF of the standard normal distribution, where the function is the complementary error function. To get the expression of , we need an asymptotic expression of the inverse of function () as the inverse CDF . The known asymptotic expansion of for large is . By taking the logarithm of both sides, we have
Replacing on the right-hand side in Equation 52 into the expression itself, we get a continued logarithmic form of
Inserting (square root of Equation 53) into the inverse CDF , we have . So, the asymptotic expression of is given by

The probability density functions of the selected Adult’s F frequency subtracted by .
For simulations (blue), at each , we performed 1000 stochastic simulations. The orange distribution represents Equation 47 computed by numerical integration. The median values of the distributions are shown in Figure 3a in the main text.

Effect of experimental parameters in the distribuiton of Adult's F frequency.
(a) Mean (Equation 41) and variance (Equation 42) of values of Adult collectives with respect to the Newborn frequency . (b) Scaling relation of F frequency variance (Equation 49) with Newborn collective size . The initial F frequency is 0.5. The parameters are , , , and . (c) Relation of F frequency variance (Equation 49) with maturation time . Other parameters are the same as b.
Appendix 3
Critical newborn size to allow all target frequencies
First, we note that in Equation 49 is proportional to for the following reasons. Variance in Equation 28 scales linearly with since both and scale linearly with . Variance in Equation 33 also scales linearly with because , , , and covariance all scale linearly with . The mean adult size is also proportional to because the average cell numbers in Equations 18; 19 are linear with respect to . Thus, the scaling relation of Equation 49 is given by .
Small makes all target frequencies achievable, as shown in Figure 4a in the main text. That is because small induces large , and thus smaller than a certain critical value makes the selection progress always negative, regardless of the value of (i.e. ). That means the inter-collective selection overcomes intra-collective selection in any target frequencies. To get an analytical approximation of the critical newborn size , we simply assume that selection progress is maximum at where the changes in and are fastest. If the maximum value of is zero, all other values of Equation 50 are negative, which naturally states that all targets are achievable. Putting , Equations 48; 49 become
and
So, by setting with , we get a solution of
Thus, all target frequencies are successfully selected with Newborn size smaller than . If the mutation rate is zero, the critical value becomes
Appendix 4
Selection without mutation
When the mutation rate is zero, two genotypes behave as two distinct species. The compositional change is provided by Equation 50 with setting . Corresponding in Equation 48 and in Equation 49 become
Equations 59; 60 suggest that when a community consists of two competing species, we obtain similar conclusions on the accessible region for target composition. The stochastic simulation results are presented in Appendix 4—figure 1.
Appendix 5
Stronger or weaker advantages
The solution of Equation (2) in main text provides the boundary values with varying the , the fitness advantage of F over S. We numerically calculate the solutions and plot in Appendix 5—figure 1.
Appendix 6
Deleterious mutation
In the main text, we show that the target composition can be achieved in some ranges of initial and target values when the mutation is beneficial to growth. The same analogy can be applied when the mutation is deleterious. Since the F cells grow slower than the S cells (), the F frequency naturally decreases in the maturation step. Then, the challenging case is selecting a larger F frequency against the intra-collective selection. So the conditional probability distribution that we consider now is a maximum value distribution of Equation 44. Thus, instead of Equation 45, we look for the cumulative distribution function of the maximum value such that
If all frequencies are independent and identically distributed random variables, the cumulative distribution function becomes
Likewise in the previous section, we get the conditional probability density function by differentiating Equation 62 with respect to and replacing as
The distribution in Equation 63 is evaluated for various in Appendix 6—figure 1a with numerical simulations, and the median values of distributions are presented in Appendix 6—figure 1b. In the case of , the target frequency is lower than around 0.3 and larger than around 0.7 can be selected. Since the sign of is opposite to the result in the main text, the diagram is reversed from Figure 2e in the main text.

Artificial selection also works for deleterious mutation.
(a) Conditional probability density functions of for various values. The left-hand side distribution is obtained from simulations and the right-hand side distribution is numerically obtained by evaluating Equation 63. Small triangles inside indicate the median values of the distributions. (b) The median value of distributions at a given . The points where the shifted median becomes zero, are denoted as and , respectively. (c) The relative error between the target frequency and the ensemble averaged selected frequency is measured after 1000 cycles starting from the initial frequency . Either the lower target frequencies or the higher target frequencies starting from the high initial frequencies can be achieved. The black dashed lines indicate the predicted boundary values and in a.
Appendix 7
Selecting more than one collective
In the main text, we choose one collective which has the closest frequency to the target among collectives. Such a ‘top 1’ strategy allows us to apply extreme value theory. However, ‘top 1’ may be too restrictive (Xie et al., 2019). Thus, we test the ‘top-tier’ strategy by choosing the top five among 100 Adults (Appendix 7—figure 1). The top-tier strategy is shown to be inefficient in our system. This is because in Xie et al., 2019, nonheritable variations – such as stochastic fluctuations in species composition introduced by pipetting – caused nonheritable variations in collective function. Nonheritable variations could potentially mask desired mutations if these mutations happened to occur in an ‘unlucky’ environment that yielded lower collective functions. Hence, lenient selection would allow the preservation of these mutations. In contrast here, stochastic fluctuations in genotype composition are heritable: a parent Newborn with lower F frequency will tend to have offspring Newborns with lower values. Hence, top-1 is more effective in this study.
Appendix 8
Extension to three-population system
We assume that collectives consist of three genotypes with slow-growing (S), fast-growing (F), and faster-growing (FF) types. The growth rate of S is . Each mutation adds to the growth rate. Thus, the F and FF types have growth rates and , respectively. The mutation rate is . So, the birth and mutation events are written by the chemical reactions:
We write a master equation of the processes for which is the probability to have , , and numbers of S, F, and FF cells at time , respectively.
The composition of collective in cycle is now represented with two frequencies where the F frequency is and the FF frequency is . Then, the target composition is set to be . The composition of the selected Adult in cycle is . We apply the processes used in the above Appendix 2 to obtain the conditional probability by using the master Equation 69.
At the reproduction step in cycle , we choose cells from the selected Adult whose composition is (,). Then, newborn collectives are independently sampled from a multinomial distribution. For convenience, we drop the collective index . Then, the conditional joint probability mass function of cells is represented by
where the number of S is automatically set to be . Then, the approximated multivariate normal distribution is where the mean distribution is and covariance matrix is . The diagonal terms of are variances and the off-diagonal terms are covariances . The matrix is given by
Then a Newborn’s composition follows the multivariate Gaussian distribution whose joint probability distribution is given by
At the beginning of cycle , a newborn collective starts from cells (for convenience, cycle index is dropped.) In terms of , each initial numbers are , and . Their initial covariance matrix is . By using Equation 69, we can write ordinary differential equations up to the second moment.
The initial conditions of the system in coupled Equations 73–81 are obtained by the mean and (co)variances of Equation 70. By solving equations numerically, we obtain a set of mean cell numbers and a set of variances as well as covariances . We assume that the covariances are smaller than the variances. We consider and as Gaussian random variables
Then, the F frequency becomes
where and . Similarly, the FF frequency is
where and . The dynamic flow of F and FF frequencies during maturation is shown in Appendix 8—figure 1a. If the covariances are small enough, we can approximate the joint probability distribution of Adult’s composition as
With cycle index , we get the conditional probability of matured collectives by
We select the Adult collective among Adult collectives such that the change in frequencies during maturation could be compensated. During maturation, a frequency distribution moves in different directions in space depending on the initial composition So, we take different directions to obtain the extreme value distributions. Considering only the sign of the frequency changes in and , we take either maximum or minimum. The mean change in is always positive in the whole space since is always positive in Equation 75. Thus, we choose the minimum value in every selection step.
If the mean is larger than , the minimum value among will be chosen in the selection step to compensate for the frequency change in the maturation step. Let us denote the selected valued of and as and . We temporarily drop the time index for simplicity. Then, the joint cumulative distribution function is
The probability can be converted as
where is a conditional joint cumulative distribution function of . The marginal cumulative distribution functions are
Similarly, the probabilities and are converted into and . Thus, the joint cumulative distribution function is
Then, the conditional probability of the selected collective is given by
where and .
If the mean is smaller than , the chosen collective is likely to have maximum values among matured collectives. Then, the definition of is written by . We rewrite the joint cumulative distribution function to be a little different from Equation 89 because now we have to utilize the condition instead of ,
The probability is converted as
Thus, the joint cumulative distribution function is given by
In this case, the conditional probability distribution function is given by
By replacing to , we finally obtain the conditional probability distribution ,
Using Equation 100, we get the mean values of and as
We define the accessible region in frequency space where the signs of the changes in both F frequency and FF frequency after a cycle are opposite to that of maturation (see Appendix 8—figure 1),
where and are the mean values of F frequencies after the maturation step in cycle before and after selection, respectively, and values are defined similarly for FF. Or, if the condition is not met, the composition of the selected collective may diverge from the target composition after several cycles. The accessible regions are marked in the gold-colored area in Appendix 8—figure 1b. Similar to the two-population case, the accessible region is shaped by the flow velocity of the composition during the maturation step, as depicted in the flow diagram in Appendix 8—figure 1a. Both F and FF frequencies tend to increase, and the inter-collective selection can compensate for these changes if the composition changes slowly when the F and FF frequencies are small. However, if the changes occur too rapidly when the FF frequency is intermediate, the frequency cannot be stabilized. So the accessible region is limited to the regions where the composition changes slowly.
This is explainable by projecting the three-population problem into the two-population problem. The selective advantage of FF relative to the rest of the collective mainly determines the accessible region. The growth rate of the rest varies from to according to F frequency, so the mean growth rate of the rest is written by where is F frequency in S+F. Then, the corresponding selective advantage of FF is which varies between to . Using and similar to Appendix 2, we get bounds of the accessible region (see dashed line in Appendix 8—figure 1b). The boundary from the projected problem agreed well with the original three-population problem.

Accessible regions in the three-population system.
(a) The flow of composition change in fast-growing (F) and faster-growing (FF) frequencies at each composition . Top corner indicates that FF cells fix in the collective. Right bottom corner means collectives with only F cells, while collectives contain S cells only at left bottom corner. Arrow length means the speed of change. (b) The accessible regions are marked by the gold area. If the signs of changes in both F frequency and FF frequency after inter-collective selection are opposite to those during maturation, then the given composition is accessible. Otherwise, the composition is not accessible and will change after cycles. Dashed lines are the boundary of the accessible region by projecting the collective into a two-population problem (FF vs. S+F). The figures are drawn using the mpltern package (Ikeda et al., 2019).
Appendix 9
Derivation of equations
In this section, we go over the derivation of Equations 18–42 for readers not equipped with advanced mathematics training.
Assumptions:
Equations 18 and 19
Equation 18 is straightforwardly solved by integrating Equation 16. Equation 19 is obtained from Equation 17 using integration factor :
Integrating both sides, we get . Thus,
Equations 24 and 25
Applying Equation 12, we have
We collect the two purple-colored terms and change the order of summation. Note that the first purple-colored term does not change regardless of whether starts from 0 or 1 because the term is zero for . Thus, the first purple-colored term is equivalent to . Let , and this becomes . We reassign α as , and obtain:
We collect the two blue terms, and similarly obtain:
Finally, we collect the two red terms. For the first red term, the sum is the same regardless of whether we start from or -1. Let start from -1, and we have
Let then the term becomes . We reassign α as , and additionally apply index change on :
Now, add the three parts together, and we have
which is Equation 24. Likewise,
which is Equation 25.
Equation 26 and 28
Using integration factor and Equation 24, we have:
Since , we have
where is the expected at time 0. For Equation 28,
Equation 30 and 31
Since ,
We can solve this, again using the integration factor technique above:
Thus, we have
which results in
Equation 32
From Equation 25, we have
The right-hand side becomes
Note that we have checked that the second and third terms of can be ignored after we compare the full calculation with this simpler version. Integrate both sides:
Then, we have
Equation 33
Equations 40-42
To derive this equation, we use the fact that for small . We will omit for simplicity. Also, note that we are considering relatively large populations so that the standard deviation is much smaller than the mean.
Recall that if , then . Thus, is distributed as a Gaussian with the mean of
Note that the initial value of mean is equal to the mean of the binomial distribution Equation 37, . The variance is
Data availability
Data and source code of stochastic simulations are available in https://github.com/schwarzg/artificial_selection_collective_composition (copy archived at Lee, 2025).
References
-
Levels and limits in artificial selection of communitiesEcology Letters 18:1040–1048.https://doi.org/10.1111/ele.12486
-
Synthetic biology approaches to engineer probiotics and members of the human microbiota for biomedical applicationsAnnual Review of Biomedical Engineering 20:277–300.https://doi.org/10.1146/annurev-bioeng-062117-121019
-
Efficient step size selection for the tau-leaping simulation methodThe Journal of Chemical Physics 124:044109.https://doi.org/10.1063/1.2159468
-
Artificially selecting bacterial communities using propagule strategiesEvolution; International Journal of Organic Evolution 74:2392–2403.https://doi.org/10.1111/evo.14092
-
Engineering complex communities by directed evolutionNature Ecology & Evolution 5:1011–1023.https://doi.org/10.1038/s41559-021-01457-5
-
Artificial selection of communities drives the emergence of structured interactionsJournal of Theoretical Biology 571:111557.https://doi.org/10.1016/j.jtbi.2023.111557
-
Approximate accelerated stochastic simulation of chemically reacting systemsThe Journal of Chemical Physics 115:1716–1733.https://doi.org/10.1063/1.1378322
-
Experimental studies of community evolution i: the response to selection at the community levelEvolution; International Journal of Organic Evolution 44:1614–1624.https://doi.org/10.1111/j.1558-5646.1990.tb03850.x
-
SoftwareArtificial_selection_collective_composition, version swh:1:rev:b87606f2397f1b9f97fb39dd7d103245ddcf4a6eSoftware Heritage.
-
BookModelling artificial ecosystem selection: a preliminary investigationIn: Banzhaf W, Ziegler J, Christaller T, Dittrich P, Kim JT, editors. Advances in Artificial Life. Berlin, Heidelberg: Springer. pp. 659–666.https://doi.org/10.1007/978-3-540-39432-7_71
-
ConferenceThe role of non-genetic change in the heritability, variation, and response to selection of artificially selected ecosystemsArtificial Life IX: Proceedings of the Ninth International Conference on the Simulation and Synthesis of Artificial Life.https://doi.org/10.7551/mitpress/1429.003.0059
-
Major evolutionary transitions in individuality between humans and AIPhilosophical Transactions of the Royal Society of London. Series B, Biological Sciences 378:20210408.https://doi.org/10.1098/rstb.2021.0408
-
Effect of the reproduction method in an artificial selection experiment at the community levelFrontiers in Ecology and Evolution 7:416.https://doi.org/10.3389/fevo.2019.00416
-
Community diversity determines the evolution of synthetic bacterial communities under artificial selectionEvolution; International Journal of Organic Evolution 76:1883–1895.https://doi.org/10.1111/evo.14558
-
Artificial selection of microbial ecosystems for 3-chloroaniline biodegradationEnvironmental Microbiology 2:564–571.https://doi.org/10.1046/j.1462-2920.2000.00140.x
-
Artificial selection of microbial communities: what have we learnt and how can we improve?Current Opinion in Microbiology 77:102400.https://doi.org/10.1016/j.mib.2023.102400
-
Fast and facile biodegradation of polystyrene by the gut microbial flora of Plesiophthalmus davidis larvaeApplied and Environmental Microbiology 86:e01361-20.https://doi.org/10.1128/AEM.01361-20
-
Progress of a half century in the study of the Luria-Delbrück distributionMathematical Biosciences 162:1–32.https://doi.org/10.1016/s0025-5564(99)00045-0
Article and author information
Author details
Funding
National Research Foundation of Korea (RS-2023-00214071)
- Juhee Lee
- Hye Jin Park
National Research Foundation of Korea (RS-2024-00460958)
- Juhee Lee
- Hye Jin Park
Asia Pacific Center for Theoretical Physics (JRG Program)
- Juhee Lee
- Hye Jin Park
Academy of Medical Sciences (Professorship)
- Wenying Shou
Royal Society (Wolfson Fellowship)
- Wenying Shou
Inha University (Research grant)
- Hye Jin Park
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Acknowledgements
J Lee and HJ Park were supported by the National Research Foundation of Korea grant funded by the Korean government (MSIT), Grant No. RS-2023–00214071 and RS-2024–00460958 and by an appointment to the JRG Program at the APCTP through the Science and Technology Promotion Fund and the Lottery Fund of the Korean Government. W Shou was supported by the Academy of Medical Sciences Professorship and a Royal Society Wolfson Fellowship. This was also supported by the Korean Local Governments–Gyeongsangbuk-do Province and Pohang City and INHA UNIVERSITY Research Grant. We thank Su-Chan Park, Li Xie, Alex Yuan, and Botond Major for constructive comments and discussions.
Version history
- Preprint posted:
- Sent for peer review:
- Reviewed Preprint version 1:
- Reviewed Preprint version 2:
- Version of Record published:
Cite all versions
You can cite all versions using the DOI https://doi.org/10.7554/eLife.97461. This DOI represents all versions, and will always resolve to the latest one.
Copyright
© 2024, Lee et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics
-
- 713
- views
-
- 27
- downloads
-
- 0
- citations
Views, downloads and citations are aggregated across all versions of this paper published by eLife.