The success of artificial selection for collective composition hinges on initial and target values

  1. Juhee Lee
  2. Wenying Shou  Is a corresponding author
  3. Hye Jin Park  Is a corresponding author
  1. Department of Physics, Inha University, Republic of Korea
  2. Asia Pacific Center for Theoretical Physics, Republic of Korea
  3. Centre for Life’s Origins and Evolution, Department of Genetics, Evolution and Environment, University College London, United Kingdom

eLife Assessment

This important study of artificial selection in microbial communities shows that the possibility of selecting a desired fraction of slow and fast-growing types is impacted by their initial fractions. The evidence, which relies on mathematical analysis and simulations of a stochastic model, is compelling. It highlights the tension between selection at the strain and the community level. This study should be of interest to researchers interested in ecology, both theoretical and experimental.

https://doi.org/10.7554/eLife.97461.3.sa0

Abstract

Microbial collectives can perform functions beyond the capability of individual members. Enhancing collective functions through artificial selection is, however, challenging. Here, we explore the ‘rafting-a-waterfall’ metaphor where achieving a target population composition depends on both target and initial compositions. Specifically, collectives comprising fast-growing (F) and slow-growing (S) individuals were grown for ‘maturation’ time, and the collective with S-frequency closest to the target value is chosen to ‘reproduce’ via inoculating offspring collectives. During collective maturation, intra-collective selection acts like a waterfall, relentlessly driving the S-frequency to lower values, while during collective reproduction, inter-collective selection resembles a rafter striving to reach the target frequency. Using simulations and analytical calculations, we show that intermediate target S frequencies are the most challenging, akin to a target within the vertical drop of a waterfall, rather than above or below it. This arises because intra-collective selection is the strongest at intermediate S-frequencies, which can overpower inter-collective selection. While achieving a low target S frequencies is consistently feasible, attaining high target S-frequencies requires an initially high S-frequency — much like a raft that can descend but not ascend a waterfall. As Newborn size increases, the region of achievable target frequency is reduced until no frequency is achievable. In contrast, the number of collectives under selection plays a less critical role. In scenarios involving more than two populations, the evolutionary trajectory must navigate entirely away from the metaphorical ‘waterfall drop.’ Our findings illustrate that the strength of intra-collective evolution is frequency-dependent, with implications in experimental planning.

Introduction

Microbial collectives can carry out functions that arise from interactions among member species. These functions, such as waste degradation (Woo et al., 2020; Sun et al., 2022), probiotics (Bober et al., 2018), and vitamin production (Wang et al., 2016), can be useful for human health and biotechnology. To improve collective functions, one can perform artificial selection (directed evolution) on collectives (see Figure 1): Low-density ‘Newborn’ collectives are allowed to ‘mature’ during which cells proliferate and possibly mutate, and community function develops. ‘Adult’ collectives with high functions are then chosen to reproduce, each seeding multiple offspring Newborns. Artificial selection of collectives have been attempted both in experiments (Goodnight, 1990; Swenson et al., 2000b; Swenson et al., 2000a; Blouin et al., 2015; Panke-Buisse et al., 2015; Panke-Buisse et al., 2017; Jochum et al., 2019; Wright et al., 2019; Raynaud et al., 2019; Arora et al., 2020; Chang et al., 2020; Mueller et al., 2021; Jacquiod et al., 2022; Raynaud et al., 2022; Arias-Sánchez et al., 2024) and in simulations (Penn, 2003; Penn and Harvey, 2004; Williams and Lenton, 2007; Xie et al., 2019; Doulcier et al., 2020; Xie and Shou, 2021; Chang et al., 2021; Fraboul et al., 2023; Lalejini et al., 2022; Zaccaria et al., 2023; Vessman et al., 2023), often with unimpressive outcomes.

Schematic for artificial selection on collectives.

Each selection cycle begins with a total of g Newborn collectives, each with N0 total cells of slow-growing S population (light gray dots) and fast-growing F population (dark gray dots). During maturation (over time τ), S and F cells divide at rates rS and rS+ω (ω>0), respectively, and S mutates to F at rate μ. During inter-collective selection, the Adult collective with F frequency f closest to the target composition f^ is chosen to reproduce g Newborns for the next cycle. Newborns are sampled from the chosen Adult (yellow star) with N0 cells per Newborn. The selection cycle is then repeated until the F frequency reaches a steady state, which may or may not be the target composition. To denote a variable x of i-th collective in cycle k at time t (0tτ), we use notation xk,t(i) where x{S,F,s, f}. Note that time t=0 is for Newborns and t=τ is for Adults.

One of the major challenges in selecting collectives is to ensure the inheritance of a collective function (Xie et al., 2023; Thomas et al., 2024). Inheritance from a parent collective to offspring collectives can be compromised by changes in genotype and species compositions. During maturation of a collective, genotype compositions within each species can change due to intra-collective selection favoring fast-growing individuals (Figure 1, ‘intra-collective’ selection), while species compositions can change due to ecological interactions. Furthermore, during the reproduction of a collective, genotype and species compositions of offspring can vary stochastically from those of the parent (Figure 1, ‘genetic drift’).

Here, we consider the selection of collectives comprising two or three populations with different growth rates, and our goal is to achieve a target composition in the Adult collective. This is a common quest: whenever a collective function depends on both populations, the collective function is maximized, by definition, at an intermediate frequency (e.g. too little of either population will hamper function; Xie et al., 2019). Earlier work has demonstrated that nearly any target species composition can be achieved when selecting communities of two competing species with unequal growth rates (Doulcier et al., 2020; Rainey, 2023), so long as the shared resource is depleted during collective maturation (Doulcier et al., 2020). In this case, initially, both species evolved to grow faster, and the slower-growing species was preserved due to stochastic fluctuations in species composition during collective reproduction. Eventually, both species evolved to grow sufficiently fast to deplete the shared resource during collective maturation, and evolution in competition coefficients then acted to stabilize the species ratio to the target value (Doulcier et al., 2020). Regardless, earlier studies are often limited to numerical explorations, with prohibitive costs for a full characterization of the parameter space for such nested populations (population of collectives, and populations of variants within a collective).

We mathematically examine the selection of composition in collectives consisting of populations growing at different rates. We made simplifying assumptions so that we can analytically examine the evolutionary tipping point between intra-collective and inter-collective selection. We show that this tipping point creates a ‘waterfall’ effect which restricts not only which target compositions are achievable, but also the initial composition required to achieve the target. We also investigate how the range of achievable target composition is affected by the total population size in Newborns and the total number of collectives under selection. Finally, we show that the waterfall phenomenon extends to systems with more than 2 populations.

Results and discussion

To enable the derivation of an analytical expression, we have made the following simplifying assumptions. First, growth is always exponential, without complications such as resource limitation, ecological interactions between the two populations, or density-dependent growth. Thus, the exponential growth equation can be used. Second, we initially consider only two populations (genotypes or species): the fast-growing F population with size F and the slow-growing S population with size S. We do not consider a spectrum of mutants or species, since with more than two populations, an analytical solution becomes very difficult. Finally, the single top-functioning community is chosen to reproduce, which allows us to employ the simplest version of the extreme value theory (see section below for further justification).

Our goal is to select for collective composition in terms of F frequency f=F/(S+F), or equivalently, S frequency s=1f. More precisely, we want collectives such that after maturation time τ, f(τ) is as close to the target value f^ as possible (Figure 1). Note that even if the target frequency has been achieved, since F frequency will always increase during maturation, inter-collective selection is required in each cycle to maintain the target frequency.

We will start with a complete model where S mutates to F at a nonzero mutation rate μ. We made this choice because it is more challenging to attain or maintain the target frequency when the abundance of fast-growing F is further increased via mutations. This scenario is encountered in biotechnology: an engineered pathway will slow down cell growth, and breaking the pathway (and thus faster growth) is much easier than the other way around. When the mutation rate is set to zero, the same model can be used to capture collectives of two species with different growth rates. We show that intermediate F frequencies or equivalently, intermediate S frequencies, are the hardest targets to achieve. We then show using simulations that similar conclusions hold when selecting for a target composition in collectives of three populations.

Model structure

A selection cycle (Figure 1; Table 1) starts with a total of g Newborn collectives. At the beginning of cycle k (t=0), each Newborn collective has a fixed total cell number N0=Sk,0(i)+Fk,0(i) where Sk,t(i) and Fk,t(i) denote the numbers of S and F cells in collective i (1ig) at time t (0tτ) of cycle k. The average F frequency among the g Newborn collectives in cycle k is f¯k,0, such that the initial F cell number in each Newborn is drawn from the binomial distribution Binom(N0,f¯k,0).

Table 1
Nomenclature.
VariablesRepresenting
SNumber of slower-growing (S) cells
FNumber of faster-growing (F) cells
NTotal cell numbers in a collective, N=S+F
sFrequency of S cells, s=S/(S+F)
fFrequency of F cells, f=F/(S+F)=1s
fF frequency of the selected collective in a cycle
ParametersRepresenting
rSGrowth rate of S
ω>0Growth rate advantage of F over S
μMutation rate from S to F
gTotal number of collectives
τMaturation time
N0Total number of cells in Newborn, or Newborn size
Target frequency in s or f.
fL,fHLow and High thresholds of inaccessible f^
RτFold-growth of S cells over time τ, Rτ=erSτ
WτFold ratio change of F cells over S cells over time τ, Wτ=eωτ

Collectives are allowed to grow for time τ (‘Maturation’ in Figure 1). During maturation, S and F grow at rates rS and rS+ω (ω>0), respectively. If maturation time τ is too small, a matured collective (‘Adult’) does not have enough cells to reproduce g Newborn collectives with N0 cells. On the other hand, if maturation time τ is too long, fast-growing F will take over. Hence, we set the maturation time τ=ln(g+1)/rS, which guarantees sufficient cells to produce g Newborn collectives from a single Adult collective. At the end of a cycle, a single Adult with the highest function (with F frequency f closest to the target frequency f^) is chosen to reproduce g Newborn collectives, each with N0 cells (‘Selection’ and ’Reproduction’ in Figure 1). Note that even though S and F do not compete for nutrients, they compete for space: because the total number of cells transferred to the next cycle is fixed, an overabundance of one population will reduce the likelihood of the other being propagated.

Collective function is dictated by the Adult’s F frequency f. Among all Adult collectives, the selected Adult is the one whose F frequency is closest to the target value, f^. In contrast with findings from an earlier study (Xie et al., 2019), choosing top 1 is more effective than the less stringent ‘choosing top 5%.’ In the earlier study, variation in the collective trait is partly due to nonheritable factors such as random fluctuations in Newborn biomass. In that context, a less stringent selection criterion proved more effective, as it helped retain collectives with favorable genotypes that might have exhibited suboptimal collective traits due to unfavorable non-heritable factors. However, since this study excludes non-heritable variations in collective traits, selecting the top 1 collective is more effective than selecting the top 5% (see Appendix 7—figure 1).

The selected Adult, with F frequency denoted as f, is then used to reproduce g offspring collectives, each with N0 total cells. The number of F cells in a newborn follows a binomial distribution B(N0,f). By repeating the selection cycle, we aim to achieve and maintain the target composition f^.

Overall, our model considers mutational stochasticity, as well as demographic stochasticity in terms of stochastic birth and stochastic sampling of a parent collective by offspring collectives. Other types of stochasticity, such as environmental stochasticity and measurement noise, are not considered and require future research.

The success of collective selection is constrained by the target composition, and sometimes also by the initial composition

Since intra-collective selection favors F, we expect that a higher target f^ (a lower target s^) is easier to achieve. By ‘achieve,’ we mean that the absolute error d between the target frequency f^ and the selected frequency averaged among independent simulations f is smaller than 0.05 (i.e.d=|ff^|0.05).

We fixed N0, the total population size of a Newborn to 1000, and obtained selection dynamics for various initial and target F frequencies by implementing stochastic simulations (Appendix 1). If the target f^ is high (e.g. 0.9, Figure 2a magenta), selection is successful (computed absolute errors Appendix 1—figure 4): regardless of the initial frequency, f of the chosen collective eventually converges to the target f^ and stays around it. In contrast, without collective-level selection (e.g. choosing a random collective to reproduce), F frequency increases until F reaches fixation (Supplementary information Appendix 1—figure 3b).

Initial and target compositions determine the success of artificial selection on collectives.

(a–c) F frequency of the selected Adult collective (f) over cycles at different target f^ values (long dashed lines). f^ between fL and fH (orange dotted and solid line segments) is inaccessible where selection will fail. (a) A high target F frequency (e.g.f^=0.9>fH; magenta) can be achieved from any initial frequency (black dots). (b) An intermediate target frequency (e.g.fL<f^=0.5<fH; green) is never achievable, as all initial conditions converge to fH. (c) A low target frequency (e.g. f^=0.1<fL; dark blue) is achievable, but only from initial frequencies below fL. For initial frequencies at fL, stochastic outcomes (gray curves) are observed: while some replicates reached the target frequency, others reached fH. For parameters, we used S growth rate rS=0.5, F growth advantage ω=0.03, mutation rate μ=0.0001, maturation time τ4.8, and N0=1000. The number of collectives g=10. Each black line is averaged from independent 300 realizations. (d) Inter-collective selection opposes intra-collective selection. We plot probability density distributions of F frequency f during two consecutive cycles when selection is successful. Data correspond to cycles 31 and 32 from the second lowest initial point in c. Δf is the selection progress within a cycle (see Box 1). Black triangle: median. (e) Two accessible regions (gold). Either high f^ (f^>fH; region 2) or low f^ starting from low initial f (f^<fL and f¯1,0<fL; region 1) can be achieved. We theoretically predict (by numerically integrating Equation 1) fH (orange solid line) and fL (orange dotted line), which agree with simulation results (gold regions). (f) Example trajectories from initial compositions (black dots) to the target compositions (dashed lines). The gold areas indicate the region of initial frequencies where the target frequency can be achieved. (g) The tension between intra-collective selection and inter-collective selection creates a ‘waterfall’ phenomenon. See the main text for details.

Box 1

Changes in the distribution of F frequency f after one cycle

We consider the case where fk, the F frequency of the selected Adult at cycle k, is above the target value (fk>f^). This case is particularly challenging because intra-collective evolution favors fast-growing F and thus will further increase f away from the target. From fk, Newborns of cycle k+1 will have f fluctuating around fk, and after they mature, the minimum f is selected (fk+1=min[fk+1,τ(1),fk+1,τ(2),,fk+1,τ(g)]). If the selected composition at cycle k+1 can be reduced compared to that of cycle k (i.e. fk+1<fk), the system can evolve to the lower target value.

To find fk values such that fk+1<fk, we used the median value of the conditional probability distribution Ψ of fk+1 given the selected fk at cycle k (mathematical details in Appendix 2). If the median value (Median[Ψ(fk+1|fk)]) is smaller than fk, then selection will likely be successful since the selected Adult in cycle k+1 has more than 50% chance to have a reduced F frequency compared to cycle k.

There are two points where the median values are the same as fk (Figure 3a), which are assigned as lower-threshold (fL) and higher-threshold (fH).

Following the extreme value theory, the conditional probability density function Ψ(fk+1=f|fk) is

(1) Ψ(fk+1=f|fk)=gPfk+1,τ(f|fk)[10fdfPfk+1,τ(f|fk)]g1.

Equation 1 can be described as the product between two terms related to probability: (i) gPfk+1,τ(f|fk) describes the probability density that any one of the g Adult collectives achieves f given fk, and (ii) [10fdfPfk+1,τ(f|fk)]g1 describes the probability that all other g1 collectives achieve frequencies above f and thus not selected.

Since computing the exact formula of Adults’ f distribution in cycle k+1 is hard, we approximate it as Gaussian with mean f¯(τ) and variance σf2(τ). The Gaussian approximation on Equation 1 requires sharp Gaussian distributions of S(τ) and F(τ) (i.e. S¯(τ)σs(τ) and F¯(τ)σF(τ)). Compared to Gaussian, the exact S(τ) (negative binomial) distribution and F(τ) (Luria-Delbrück) distribution are right-skewed and heavy-tailed. However, these problems are alleviated when the initial numbers of S and F cells are not small (on the order of 100). Indeed, the sharpness of distributions could be achieved (see Appendix 1—figure 1).

To obtain an analytical solution of the change in f over one cycle, we first assume that in a Newborn collective, the number of S cells is distributed as Gaussian with mean S¯0=N0(1fk) and variance σS,02=N0fk(1fk). Then, the number of F cells, F0=N0S0, is distributed as Gaussian with mean F¯0=N0fk and variance σf,02=N0fk(1fk). From these, we can calculate for Adult collectives the mean and variance of population sizes F(τ) (i.e. F¯(τ), σF2(τ)) and S(τ) (i.e. S¯(τ), σS2(τ)) (mathematical details in Appendix 1). This task is simplified by the exponential growth of S and F: Rτ=erSτ describes the fold growth of S over maturation time τ, and since ω is the fitness advantage of F over S, Wτ=eωτ describes the fold change of F/S over time τ. From Rτ, Wτ, μω (mutation rate scaled with the fitness difference), fk (F frequency in the selected collective at cycle k), N0 (Newborn size), ωrS (relative fitness advantage), we can calculate the mean and variance of F frequency among the Adults of k+1 cycle (f¯(τ);σf2(τ), detailed formula in Equations 48 and 49).

Selection progress - the difference between the median value of the conditional probability distribution Ψ(fk+1|fk) and the selected frequency of fk (Appendix 2) - can be expressed as:

(2) f=Median[Ψ(fk+1|fk)]fk=f¯(τ)+[Φ1(ln2g)]σf(τ)fk,

where Φ1() is the inverse cumulative function of standard normal distribution (see main text for an example). We chose the median because compared to the mean, it is easier to get an analytical expression since Φ1() is known in a closed form. Regardless, using median generated results similar to simulations (Appendix 2—figure 3). As expected, selection progress f is governed by both the mean (f¯(τ)) and the variation (σf(τ)) in f among Adults.

When the mutation rate μ=0, f¯(τ) and σf(τ) can be simplified to:

(3) f¯(τ)=fk1fkWτ+fk,

and

(4) σf2(τ)=1N0Wτ2fk(1fk)(22fk+2fk21fkRτWτfkRτ)(1fkWτ+fk)4.

In the limit of small fk, Equation 3 becomes f¯(τ)|fk1fkWτ while Equation 4 becomes σf2(τ)|fk1=(21RτWτ)fkWτ2/N0 . Thus, both Newborn size (N0) and fold-change in F/S during maturation (Wτ) are important determinants of selection progress.

In contrast, an intermediate target frequency (e.g. f^=0.5; Figure 2b green) is never achievable. High initial F frequencies (e.g. 0.95) decline toward the target but stabilize at the ‘high-threshold’ fH (∼ 0.7, solid orange line segment in Figure 2a-c) above the target. Low initial F frequencies (e.g. 0) increase toward the target, but then overshoot and stabilize at the fH value.

If the target frequency is low (e.g. f^=0.1; Figure 2c dark blue), artificial selection succeeds when the initial frequency is below the ‘lower-threshold’ fL (dotted orange line segment in Figure 2a-c). Initial F frequencies above fL (e.g. 0.45 and 0.95) converge to fH instead. Initial F frequencies near fL display stochastic trajectories, converging to either fH or f^.

To achieve target f^, inter-collective selection must overcome intra-collective selection. We can visualize the distributions of f over two consecutive cycles (bottom to top, Figure 2d) where f started above target f^. When newborns matured into adults, the distribution of f up-shifted due to intra-collective selection. The distribution of f was then down-shifted toward the target due to inter-collective selection. If the magnitude of down-shift exceeded that of up-shift, progress toward the target was made. During reproduction of collectives, the distribution of f retained the same mean but became broader due to stochastic sampling by the Newborns from their parent.

In summary, two regions of target frequencies are ‘accessible’ (gold in Figure 2e, f; Box 1): (1) target frequencies above fH (f^>fH) or (2) target frequencies below fL (f^<fL) and starting at an average frequency below fL (f¯1,0<fL).

Intra-collective evolution is the fastest at intermediate F frequencies, creating the ‘waterfall’ phenomenon

To understand what gives rise to the two accessible regions, we calculated f, the selection progress in F frequency over two consecutive cycles (Box 1, Equation 2). The solution (Figure 3a, green) has the same shape as results from numerically integrating Equation 1 (Figure 3a, orange) and from stochastic simulations (Figure 3a, blue).

Intra-collective selection and inter-collective selection jointly set the boundaries for selection success.

(a) The change in F frequency over one cycle. When fk is sufficiently low or high, inter-collective selection can lower the F frequency to below fk (Δf<0). The points where Δf=0 (in the orange line) are denoted as fL and fH, corresponding to the boundaries in Figure 2. (b) The distributions of frequency differences obtained by 1000 numerical simulations. The cyan, purple, and black box plots respectively indicate the changes in F frequency after intra-collective selection (the mean frequency among the 100 Adults minus the mean frequency among the 100 Newborns during maturation), after inter-collective selection (the frequency of the 1 selected Adult minus the mean frequency among the 100 Adults), and over one selection cycle (the frequency of the selected Adult of one cycle minus that of the previous cycle). The box ranges from 25% to 75% of the distribution, and the median is indicated by a line across the box. The upper and lower whiskers indicate maximum and minimum values of the distribution. ***p<0.001 in an unpaired t-test.

If f is negative, then inter-collective selection will succeed in countering intra-collective selection and reducing f toward the target. f is negative if the selected fk is low or high, but not if it is intermediate between fL and fH (Figure 3a). This is because the increase in f during maturation is the most drastic when Newborn f is intermediate (Figure 3b), for intuitive reasons: when Newborn f is low, the increase in f will be minor; when Newborn f is high, the fitness advantage of F over the population average is small and hence the increase is also minor. Thus, when Newborn F frequency is intermediate, intra-collective selection is the strongest and may overwhelm inter-collective selection (Figure 3b and Appendix 2—figure 2a). Not surprisingly, similar conclusions are derived where S and F are slow-growing and fast-growing species which cannot be converted through mutations (Appendix 4 and Appendix 4—figure 1).

Thus, inter-collective selection is akin to a raftman rowing the raft to a target, while intra-collective selection is akin to a waterfall. This metaphor is best understood in terms of S frequency s=1f. The lower-threshold fL corresponds to higher-threshold in sH=1fL. Intra-collective selection is akin to a waterfall, driving the S frequency s from high to low (Figure 2g). Intra-collective selection acts the strongest when s is intermediate (sL<s<sH), similar to the vertical drop of the fall. Intra-collective selection acts weakly at high (>sH) or low (<sL) s , similar to the gentle sloped upper and lower pools of the fall (regions 1 and 2 of Figure 2e and g). Thus, an intermediate target frequency can be impossible to achieve: a raft starting from the upper pool will be flushed down to sL (fH), while a raft starting from the lower pool cannot go beyond sL (fH). In contrast, a low target S frequency (in the lower pool) is always achievable. Finally, a high target S frequency (in the upper pool) can only be achieved if starting from the upper pool (as the raft cannot jump to the upper pool if starting from below).

Manipulating experimental setups to expand the achievable target region

In Equation 2; Box 1, selection progress f depends on the total number of collectives under selection (g). f also depends on the mean and the standard deviation of Adult F frequency — f¯(τ) and σf(τ). Equations 3 and 4 of Box 1 provide simplified expressions of f¯(τ) and σf(τ) when mutation rate μ has been set to 0. When the mutation rate μ is not zero (Equations 48 and 49 in Appendix 2), selection progress is additionally influenced by μω (mutation rate μ scaled with fitness difference ω).

Our goal is to make f as negative as possible so that any increase in f during collective maturation may be reduced. From Equation 2 in Box 1, a small f¯(τ) will facilitate collective-level selection. Additionally, a large σf(τ) will also facilitate collective-level selection due to negative Φ1(ln2g). Note that since ln2g<0.5 for g2, Φ1(ln2g) — corresponding to the number y such that the probability of a standard normal random variable being less than or equal to y is ln2g — is negative.

From Equation 4 in Box 1, σf(τ) will be large if Newborn size N0 is small. Indeed, as Newborn size N0 declines, the region of achievable target frequency expands (gold area in Figure 4a). If the Newborn size N0 is sufficiently small (e.g. ≤ 700 in our parameter regime), any target frequency can be reached. An analytical approximation of the maximal Newborn size permissible for all target frequencies is given in Appendix 3.

Expanding the region of success for artificial collective selection.

(a) Reducing the population size in Newborn N0 expands the region of success. In the gold area, the probability that fk+1 becomes smaller than fk in a cycle is more than 50%. We used g=10 and τ4.8. Figures 23 correspond to N˘0=1000 in this graph. Black dotted line indicates the critical Newborn size below which all target frequencies can be achieved. (b) Increasing the total number of collectives g also expands the region of success, although only slightly. We used a fixed Newborn size N0=1000. The maturation time τ=log(100)/rS9.2 is set to be long enough so that an Adult can generate at least 100 Newborns. (c) Increasing the maturation time shrinks the region of success. We used a fixed Newborn size N0=1000 and number of collectives g=10.

From Equations 3 and 4 in Box 1, maturation time τ affects f¯(τ) and σf(τ) through Wτ=eωτ (the fold change in F/S over τ), and affects σf(τ) additionally through Rτ=erSτ (fold-growth of S over τ). Longer τ increases f¯(τ) and is thus detrimental to selection progress. The relationship between σf(τ) and τ is not monotonic (Appendix 2—figure 2c), meaning that an intermediate value of τ is the best for achieving large σf(τ). However, the effect of f¯(τ) dominates that of σf(τ) and therefore, the region of success monotonically reduces with longer maturation time (Figure 4c). Similarly, f¯(τ) will be small if ω (fitness advantage of F over S) is small. Indeed, as ω becomes larger, the region of success becomes smaller (Appendix 5—figure 1).

g, the number of collectives under selection, also affects selection outcomes. As g increases, the value of Φ1(ln2g) becomes more negative, and so does f — meaning collective-level selection will be more effective. Intuitively, with more collectives, the chance of finding a f closer to the target is more likely. Thus, a larger number of collectives broadens the region of success (Figure 4b). However, the effect of g is not dramatic. To see why, we note that the only place that g appears is Equation 2 in Φ1(1g). When g becomes large, Φ1(1g) is asymptotically expressed as Φ1(1g)2lngln[lng]+ (Appendix 2) (Phllip, 1960), and thus does not change dramatically as g varies.

The waterfall phenomenon in a higher dimension

To examine the waterfall effect in a higher dimension, we investigate a three-population system where a faster-growing population (FF) grows faster than the fast-growing population (F) which grows faster than the slow-growing population (S) (Figure 5a and Appendix 8—figure 1). In the three-population case, the evolutionary trajectory travels in a two-dimensional plane. A target population composition can be achieved if inter-collective selection can sufficiently reduce the frequencies of F as well as FF (accessible regions, gold in Figure 5b).

In higher dimensions, the success of artificial selection requires the entire evolutionary trajectory remaining in the accessible region.

(a) During collective maturation, a slow-growing population (S) (with growth rate rS; light gray) can mutate to a fast-growing population (F) (with growth rate rS+ω; medium gray), which can mutate further into a faster-growing population (FF) (with growth rate rS+2ω; dark gray). Here, the rates of both mutational steps are μ, and ω>0. (b) Evolutionary trajectories from various initial compositions (open circles) to various targets (filled triangles). Intra-collective evolution favors FF over F (vertical blue arrow) over S (horizontal blue arrow). The accessible regions are marked gold (see Appendix 1). We obtain final compositions starting from several initial compositions while aiming for different target compositions in i, ii, and iii. The evolutionary trajectories are shown in dots with color gradients from initial time (light grey) to final time (dark grey). (i) A target composition with a high FF frequency is always achievable. (ii) A target composition with intermediate FF frequency is never achievable. (iii) A target composition with low FF frequency is achievable only if starting from an appropriate initial composition such that the entire trajectory never meanders away from the accessible region. The figures are drawn using the mpltern package (Ikeda et al., 2019). (c) The accessible region in the three-population problem is interpreted as an extension of the two-population problem. First, the accessible region between FF and S+F is given, and then the S+F region is stretched into S and F.

From numerical simulations, we identified two accessible regions: a small region near FF and a band region spanning from S to F (gold in Figure 5b i). Intuitively, the rate at which FF grows faster than S+F is greater than the rate at which F grows faster than S (see Appendix 8). Thus, the problem can initially be reduced to a two-population problem (i.e. FF versus F+S; Figure 5c left), and then expanded to a three-population problem (Figure 5c right).

Similar to the two-population case, targets in the inaccessible region are never achievable (Figure 5b ii), while those in the FF region are always achievable (Figure 5b i). Strikingly, a target composition in an accessible region may not be achievable even when the initial composition is within the same region: once the composition escapes the accessible region, the trajectory cannot return back to the accessible region (Figure 5biii, the leftmost initial condition). However, if the initial position is closer to the target in the accessible region, the target becomes achievable (Figure 5b iii, initial condition near the bottom). Note that here, the selection outcome is path-dependent in the sense of being sensitive to initial conditions. This phenomenon is distinct from hysteresis, where path-dependence results from whether a tuning parameter is increased or decreased.

In conclusion, we have investigated the evolutionary trajectories of population compositions in collectives under selection, which are governed by intra-collective selection (which favors fast-growing populations) and inter-collective selection (which, in our case, strives to counter fast-growing populations). Intra-collective selection has the strongest effect at intermediate frequencies of faster-growing populations, potentially creating an inaccessible region of target frequency analogous to the vertical drop of a waterfall. High and low target frequencies are both accessible, analogous to the lower and the upper pools of a waterfall, respectively. A less challenging target (high f^; low s^) is achievable from any initial position. In contrast, a more challenging target (low f^; high s^) is only achievable if the entire trajectory is contained within the region, similar to a raft striving to reach a point in the upper pool must start at and remain in the upper pool. Our work suggests that the strength of intra-collective selection is not constant, and that strategically choosing an appropriate starting point can be essential for successful collective selection.

Materials and methods

Stochastic simulations

Request a detailed protocol

A selection cycle is composed of three steps: maturation, selection, and reproduction. At the beginning of the cycle k, a collective i has Sk,0(i) slow-growing cells and Fk,0(i) fast-growing cells. At the first cycle, the mean F frequency of collectives is set to be f¯1,0. F1,0(i) is sampled from the binomial distribution with mean N0f¯1,0. Then, S1,0(i)(=N0F1,0(i)) S cells are in the collective i. In the maturation step, we calculate Sk,τ(i) and Fk,τ(i) by using stochastic simulation. We can simulate the division and mutation of each individual cell stochastically by using the tau-leaping algorithm (Gillespie, 2001; Cao et al., 2006; see Appendix 1—figure 3). However, individual-based simulations require a long computing time. Instead, we randomly sample Sk,τ(i) and Fk,τ(i) from the joint probability density distribution P(Sk,τ(i),Fk,τ(i)). To obtain P(Sk,τ(i),Fk,τ(i)), we solve the master equation which describes the time evolution of the probability distribution P(Sk,t(i),Fk,t(i)) under the random processes (see Appendix 1). We assumed that Sk,τ(i) and Fk,τ(i) are independent (as S and F populations grow independently without ecological interactions), and thus P(Sk,τ(i),Fk,τ(i)) is product of two probability density functions P(Sk,τ(i)) and P(Fk,τ(i)). Each distribution follows a Gaussian distribution, with the mean and variance numerically obtained from ordinary differential equations derived from the master equation (see Appendix 1). We choose the collective with the closest frequency to the target f^ to generates g Newborns. The number of F cells is sampled from the binomial distribution with the mean of N0fk. We start a new cycle with those Newborn collectives. Then, the number of S cells in a collective i is Sk+1,0(i)=N0Fk+1,0(i).

Analytical approach to the conditional probability

Request a detailed protocol

The conditional probability distribution Ψ(fk+1|fk) of observing fk+1 at a given fk is calculated by the following procedure. Given the selected collective in cycle k with fk, the collective-level reproduction proceeds by sampling g Newborn collectives with N0 cells in cycle k+1. Each Newborn collective contains certain F numbers Fk+1,0(1),,Fk+1,0(g) at the beginning of the cycle k+1, which can be mapped into fk+1,0(1),,fk+1,0(g) with the constraint of N0 cells. If the number of cells in the selected collective is large enough, the joint conditional distribution function P(fk+1,0(1),,fk+1,0(g)|fk) is well described by the product of g independent and identical Gaussian distribution N(μ,σ2). So we consider the frequencies of g Newborn collectives as g identical copies of the Gaussian random variable fk+1,0. The mean and variance of fk+1,0 are given by m=fk and σ2=fk(1fk)/N0. Then, the conditional probability distribution function of fk+1,0 being ζ is given by

(5) Pfk+1,0(ζ|fk)=12πexp((ζm)22σ2).

After the reproduction step, the Newborn collectives grow for time τ. The frequency is changed from the given frequency ζ to f by division and mutation processes. We assume that the frequency f of an Adult is also approximated by a Gaussian random variable N(f¯(τ),σf2(τ)). The mean f¯(τ) and variance σf2(τ) are calculated by using means and variances of S and F (see Appendix 2). Since f¯(τ) and σf2(τ) also depend on ζ, the conditional probability distribution function of fk+1,τ being f is given by

(6) Pfk+1,τ(f|ζ)=12πexp((ff¯(τ))22σf2(τ)).

The conditional probability distribution of an Adult collective in cycle k+1 (fk+1,τ) to have frequency f at a given fk is calculated by multiplying two Gaussian distribution functions and integrating overall ζ values, which is given by

(7) Pfk+1,τ(f|fk)=01dζ Pfk+1,τ(f|ζ) Pfk+1,0(ζ|fk).

Since we select the minimum frequency fk+1min among g identical copies of fk+1,τ, the conditional probability distribution function of fk+1min follows a minimum value distribution, which is given in Equation 1. Here, for the case of f^<fk, the selected frequency fk+1,0 is the minimum frequency fk+1min. So we have Ψ(fk+1|fk) by replacing fk+1min with fk+1.

We assume that the conditional probability distribution in Equation 7 follows a normal distribution, whose mean and variances are described by Equation 48 and Equation 49. Then, the extreme value theory (Gumbel, 1958) estimates the median of the selected Adult by

(8) Median(fk+1)=f¯(τ)+[Φ1(ln2g)]σf(τ).

The selection progress Δf in Equation 2 is obtained by subtracting fk from Equation 8.

Appendix 1

Stochastic simulation of the selection cycle

In the main text, we design a simple model of artificial selection on collectives. The selection cycle starts with g ‘Newborn’ collectives which consist of two populations - slow-growing population (S) and fast-growing population (F). S mutates to F at a rate μ. The newborns mature for a fixed time τ. The matured collective (‘Adult’) with the highest function (with F frequency f closest to the target f^) is chosen to reproduce g Newborn collectives, each with N0 cells.

In our selection cycle, variation among collectives mainly resulted from demographic noises during cell birth, cell mutation, and collective reproduction. In this section, we provide details of the simulation.

Maturation

Here, we calculate the cell numbers during maturation. Each collective i (i=1,,g) has Sk,t(i) S cells and Fk,t(i) F cells where k is the cycle number and t indicates time (0tτ). At the beginning of cycle k (t=0), each Newborn collective has a total of N0=Sk,0(i)+Fk,0(i) cells. The collectives are allowed to ‘mature’ for t=τ during which S and F grow at rates rS and rS+ω (ω>0), respectively. In this subsection, we ignore the cycle number index k and the collective index k for convenience. That is, we denote Sk,t(i) and Fk,t(i) as S(t) and F(t), respectively.

We describe cell divisions of S and F cells and mutation from S to F with the following chemical reaction rules:

(9) SrSS+S,
(10) Fω>0rS+ωF+F,
(11) SμF,

One can run an individual-based simulation by counting the number of events occurring during collective maturation via the tau-leaping algorithm (Gillespie, 2001; Cao et al., 2006) to generate a sample trajectory of S(t) and F(t) for each collective. However, the individual-based simulation requires long computing times due to a large number of random events to be counted. Hence, we used a ‘sampling method’ by sampling the numbers of S and F cells in collectives from a joint probability density distribution (jpdf) P(S,F,t) which denotes the probability density to have S number of S cells and fk+1,0 number of F cells at time t in the cycle. To do so, we require an analytical expression of P(S,F,t).

First, we assume that the chemical reactions in Equations 9–11 occur independently, and never occur simultaneously within a short time interval [t,t+dt). Then, the differential of P(S,F,t) with respect to time is given by

(12) dP(S,F,t)dt=rS(S1)P(S1,F,t)+(rS+ω)(F1)P(S,F1,t)+μ(S+1)P(S+1,F1,t)(rSS+(rS+ω)F+μS)P(S,F,t).

This master equation describes a probability density ‘flux’ at the state (S,F). The first term describes the scenario where a single birth event of a S cell happens during time interval [t,t+dt), which changes the collective’s composition from (S1,F) to (S,F). Similarly, the second term comes from a birth event of an F cell. The third term indicates the mutation event from (S+1,F1) to (S,F). The last term corresponds to the outflow of probability density by birth and mutation processes, which describes the changes from (S,F) to any other states.

Calculating the exact form of P(S,F,t) is not simple. Instead, we assume that the mutation rate is much smaller than the growth rates, and hence the correlation between S and F is sufficiently small. Additionally, S and F do not interact ecologically. Then, we can express P(S,F,t) as a product of two probability density functions (pdf) of S(t) and F(t), P(S,F,t)=P(S,t)P(F,t). We assume that each pdf of S and F can be approximated as Gaussian (N), which is supported by the Central Limit Theorem and Appendix 1—figure 1. In more detail, the cell numbers S and F are mainly determined by growth (Equations 9; 10), and also mutations (Equation 11). Even though the number of events would be different among different realizations, the mean numbers of events will follow Gaussian distributions. So, we can simply assume that the distributions of cell numbers also follow Gaussian distributions. This assumption requires that the distributions have insignificant skewness and no heavy tails, which we will numerically check afterwards. The pdfs of S(t) and F(t) are given by

(13) P(S,t)=12πσS2(t)e(S(t)S¯(t))22σS2(t)

and

(14) P(F,t)=12πσF2(t)e(F(t)F¯(t))22σF2(t).

That is, P(S,F,t) is written as

(15) P(S,F,t)=12πσS2(t)12πσF2(t)e(S(t)S¯(t))22σS2(t)(F(t)F¯(t))22σF2(t).

Now we need means (S¯(t) and F¯(t)) and variances (σs2(t) and σF2(t)) of S and F cell numbers to express the distribution analytically.

The means are defined by S¯(t)=S=0F=0SP(S,F,t) and F¯(t)=S=0F=0FP(S,F,t). The differential equations for means are obtained by applying the definition to the master equation in Equation 12, as

(16) dS¯(t)dt=rSS¯(t)μS¯(t),
(17) dF¯(t)dt=(rS+ω)F¯(t)+μS¯(t).

We assume that the mutation rate μ is much smaller than rS and ω. By solving Equation 16 and Equation 17, the means S¯(t) and F¯(t) are given by

(18) S¯(t)=S¯oe(rSμ)tS¯oerSt,
(19) F¯(t)=F¯oe(rS+ω)t+μS¯oω+μ(e(rS+ω)te(rSμ)t)F¯oe(rS+ω)t+μS¯oω(e(rS+ω)terSt),

where S¯oS¯(0) and F¯oF¯(0) are the mean numbers of S and F cells at the beginning of cycle, t=0. Note that the second term of Equation 19 is consistent with previous studies (Zheng, 1999). Now we introduce factors Rt=erSt and Wt=eωt in Equations 18; 19 in order to simplify the formula. Rt is the multiplying factor by which the S cell number increases after time t. Wt is the fold change in F/S. Then, we can rewrite

(20) S¯(t)=S¯oRt,
(21) F¯(t)=F¯oRtWt+μS¯oωRt(Wt1).

We define the second momenta of S and F as

(22) S2¯=S=0F=0S2P(S,F,t),
(23) F2¯=S=0F=0F2P(S,F,t).

Then, the corresponding differential equations are given by

(24) dS2¯dt=2(rSμ)S2¯+(rS+μ)S¯,
(25) dF2¯dt=2(rS+ω)F2¯+(rS+ω)F¯+2μS¯F¯+μS¯.

The solution of Equation 24 is

(26) S2¯(t)=rS+μμrSS¯o[e(μrS)t1]e2(rSμ)t+S2¯oe2(rSμ)t
(27) So2¯e2rSt+S¯oerSt(erSt1),

where So2¯S2¯(0) is the second moment of initial values. Thus, the variance σS2(t)=S2¯(t)[S¯(t)]2 is

(28) σS2(t)=σS,02e2rSt+S¯oerSt(erSt1)
(29) =σS,02Rt2+S¯oRt(Rt1)

where σS,02σS2(0) is a variance of S cell numbers at t=0. In Equation 25, we require SF¯(t)=S=0F=0SFP(S,F,t) to calculate F2¯(t). Equation 12 provides a differential equation for SF¯(t) as

(30) dSF¯dt=(2rS+ωμ)SF¯+μ(S2¯S¯).

The solution of Equation 30 is given by

(31) SF¯(t)=SF¯oe(2rS+ω)t+μω(S2¯0+S¯0)(e(2rS+ω)te2rSt)2μS¯0rS+ω(e(2rS+ω)terSt).

By using Equation 31, the solution of Equation 25 is given by

(32) F2¯(t)=F2¯0e2(rS+ω)t+F¯0e(rS+ω)t(e(rS+ω)t1)+μS¯oω(2ωr+2ωe2(rS+ω)te(rS+ω)t+rr+2ωerSt)+2μSF¯0ωe(rS+ω)t(e(rS+ω)terSt)+O(μ2),

where Fo2¯F2¯(0) is the second moment of initial values. Thus, the variance σF2(t)=F2¯(t)[F¯(t)]2 is given, up to the order of μ, by

(33) σF2(t)=σF,02e2(rS+ω)t+F¯0e(rS+ω)t(e(rS+ω)t1)+μS¯oω(2ωrS+2ωe2(rS+ω)te(rS+ω)t+rSrS+2ωerSt)+2μ(SF¯0S¯0F¯0)ωe(rS+ω)t(e(rS+ω)terSt)+O(μ2).

Using Rτ=erSτ and Wτ=eωτ, we rewrite

(34) σF2(t)=σF,02Rt2Wt2+F¯0RtWt(RtWt1)+μS¯oRtω(2ωrS+2ωRtWt2Wt+rSrS+2ω)+2μ(SF¯0S¯0F¯0)ωRt2Wt(Wt1)+O(μ2).

Using Equations 18; 19; 28; 33, we construct pdfs for S(t) and F(t) at the end of cycle t=τ. Then, we randomly sample a number from P(S,τ) for S(τ) and another number from P(F,τ) for F(τ). Those two numbers are cell numbers in a single Adult. We repeat this process for each Newborn to get cell numbers of all Adults. Note that the initial values for the Newborn i are S¯0=Sk,0(i), F¯0=Fk,0(i), σS,02=0, and σF,02=0. This process only requires two random numbers per collective, while the result is consistent with the individual-based simulation.

Now, we check the validity of the Gaussian approximation for probability density functions of S and F populations. If we consider mutation from S to F as death in the S population, then the process in S corresponds to a branching process with death. Also, the birth process in F, including mutation, results in a Luria-Delbrück distribution (Zheng, 1999). Thus, the distributions of Adults’ S and F numbers are more skewed and heavy-tailed than Gaussian. But this problem is alleviated by larger initial S and F numbers and when the maturation time τ is not very long (see Appendix 1—figure 1). Since we usually consider larger initial cell numbers, we use the Gaussian approximation on S and F populations in further calculations.

Appendix 1—figure 1
Comparison between the calculated Gaussian distribution (‘Gauss,’ with the mean and variances computed from Equations 18; 19; 28; 33) and simulations using tau-leaping (‘tau’).

The simulations run 3000 times. The initial number of cells are (S0,F0)=(990,10),(500,500), and (10,990) for each column. The parameters r=0.5, ω=0.03, μ=0.0001, and τ=4.8 are used.

Selection

After sampling cell numbers of each Adult in the maturation step, we compute the F frequencies in each collectives {fk,τ(1),,fk,τ(g)}. We denote the F frequency of collective i at time t=τ in cycle k as fk,τ(i). Among the g Adults, we select one collective with the F frequency which is the closest value to the target frequency f^. The selected Adult’s F frequency value is denoted by fk. In mathematical expression, the selected frequency is defined by

(35) fk=argminfk,τ(i),i{1,,g}|fk,τ(i)f^|.

Reproduction

Using the chosen Adult, we generate g Newborn collectives for the next cycle k+1. The most natural way is consecutive random sampling N0 cells from the selected Adult without replacement. In the mathematical expression, we first randomly sample Fk+1,0(1) F cells and draw Sk+1,0(1)=N0Fk+1,0(1) S cells from the selected Adult. Next, we sample Fk+1,0(2) F cells and Sk+1,0(2)=N0Fk+1,0(2) S cells from the remaining cells in the Adult. We repeat the process g times. Then the jpdf to choose Fk+1,0(1),,Fk+1,0(g) F cells, P(Fk+1,0(1),,Fk+1,0(g)), follows a multivariate hypergeometric distribution.

If we assume that the selected Adult size Nk=Sk+Fk is large enough compared to Newborn size N0, the consecutive sampling is well approximated to the independent binomial sampling (see Appendix 1—figure 2). Thus, we independently sample g numbers of F cells, {Fk+1,0(1),,Fk+1,0(g)}, from the binomial distribution. The probability mass function of each Fk+1,0(i) is given by

(36) PFk+1,0(i)(F)=N0!(N0F)!F!(fk)F(1fk)N0F.

After sampling, the numbers of S cells are set to be Sk+1(i)(0)=N0Fk+1,0(i) for each collective. We can now start cycle k+1 with these Newborn collectives. By repeating the above three steps (maturation, selection, and reproduction), we run the simulation until F frequency reaches a stationary state.

Appendix 1—figure 2
Congruence between consecutive sampling (MHG for multivariate hypergeometric distribution) and independent binomial (BN) sampling.

The initial number of cells are S=8000 and F=2000 for the left panel, and S=20 and F=5 for the right panel. 10,000 samples are drawn for each distribution. Here, a parent collective is divided into 10 collectives.

Simulation result

Appendix 1—figure 3 presents the composition trajectories of all collectives using the tau-leaping algorithm in the maturation step. The selected adults have the closest composition to the target composition f^. The selected Adult can have smaller F frequency than its parent Adult, so F frequency can be lowered after cycles.

Appendix 1—figure 3
Trajectories of F frequency for 10 collectives (g=10) over time.

(a) The collective whose frequency is closest to the target value is selected in every cycle (black lines). The gray lines denote the other collectives. For parameters, we used S growth rate rS=0.5, F growth advantage ω=0.03, mutation rate μ=0.0001, maturation time τ4.8, and N0=1000. (b) Comparison between frequency trajectories with selection (the chosen one Adult producing all offspring; black) and without selection (each Adult producing one offspring; blue) clearly shows the effect of artificial selection. The black line indicates F frequency of the selected collective fk at each cycle in (a). The blue line indicates the average trajectory without selection fk+1 (the average of g=10 individual lineages without inter-collective selection at the end of each cycle).

In Appendix 1—figure 4, we plot the absolute error d between the target frequency f^ and f (i.e. d=|ff^|) at the end of simulations (1000 cycles). Since the computing time for the Tau-leaping algorithm (individual-based simulation) to reach 1000 cycles is very long, we used the sampling scheme in the above subsection. In the colormap, errors higher than 0.15 are marked with gray, which indicates selection failure. The dashed lines indicate the same boundary in Figure 2e in the main text.

Appendix 1—figure 4
Color map of the absolute error d=|ff^| averaged selected collectives at the end of simulations (k=1000) and the target frequency f^.

The solid and dashed lines are drawn by the arguments in the main text. For parameters, we used rS=0.5, ω=0.03, μ=0.0001, N0=1000, g=10 and τ4.8. The result is the average of 300 independent simulations. Compared to Figure 2e, this figure has a higher resolution.

Appendix 2

Conditional probability distribution of the selected collective frequency f and selection progress Δf

In the main text, we identify the region of success by using selection progress Δf=Median[Ψ(fk+1|fk)]fk, which is obtained from the conditional pdf of fk+1 (the F frequency of the selected Adult at cycle k+1) given the selected fk at cycle k, written as Ψ(fk+1=f|fk). We consider the challenging case where fk is above the target value (fk>f^), and therefore the Adult with minimum F frequency will be selected. To get an analytical expression of Ψ(fk+1=f|fk), we first find the conditional pdf of f of Adults in cycle k+1 given fk at cycle k. Then, we find Ψ(fk+1|fk) from the minimum value distribution of F frequencies among GS(τ) Adults. Below, we describe the mathematical details of this process.

Let us start from the reproduction step from the selected Adult in cycle k. We reproduce g Newborns in the next cycle k+1. Then the probability distribution of the F cell numbers in Newborn collectives is given in Equation 36. If the total number of cells in a Newborn collective N0 is large enough, Equation 36 is approximated by the Gaussian distribution Fk+1,0(i)N(N0fk,N0fk(1fk)). Then, the probability density function that fk+1,0(i) to be ζ in Newborn collective i is

(37) Pfk+1,0(i)(ζ|fk)=N02πfk(1fk)eN0(ζfk)22fk(1fk).

The Newborn collective i has initial cell numbers Sk+1,0(i)=N0(1ζ) and Fk+1,0(i)=N0ζ. From here, we ignore cycle index k+1 in subscript and i superscript for convenience.

Next, we write the conditional pdf of Adults’ F frequency with given Newborn F frequency ζ. We assume that cell numbers in Adult S(τ) and F(τ) follow Gaussian distributions as in Equations 13; 14. Based on Equations 18; 19; 28; 33, we have

(38) S(τ)S¯(τ)+σS(τ)GS(τ),
(39) F(τ)F¯(τ)+σF(τ)GF(τ),

where GS(τ) and GF(τ) are random variables following the standard distribution N(0,1). Note that each Gaussian is sharp if Newborn size N0 is sufficiently large (S¯(τ)σs(τ) and F¯(τ)σF(τ)). Then, we can approximately write f(τ) as

(40) f(τ)F¯(τ)+σF(τ)GF(τ)S¯(τ)+σS(t)GS(τ)+F¯(τ)+σF(τ)GF(τ)f¯(τ)+σf(t)Gf(τ).

The mean of f is given by

(41) f¯(τ)=F¯(τ)S¯(τ)+F¯(τ)=ζWτ+μω(1ζ)(Wτ1)1ζ+ζWτ+μω(1ζ)(Wτ1),

and the variance is

(42) σf2(τ)=(1f¯(τ))2σF2(τ)+f¯(τ)2σS2(τ)N¯(τ)2=1N0Rτ((1ζ)+ζWτ+μω(1ζ)(Wτ1))4×[(1ζ)2(ζWτ(RτWτ1)+μω(1ζ)(2ω/rS1+2ω/rSRτWτ2Wτ+11+2ω/rS))+(ζWτ+μω(1ζ)(Wτ1))2(1ζ)(Rτ1)].

where Rτ=erSτ and Wτ=eωτ. The average Adult size is N¯(τ)=S¯(τ)+F¯(τ). Thus, the Adult’s F frequency f(τ)=fτ follows the Gaussian distribution N(f¯τ,σfτ2) whose pdf is given by

(43) Pfτ(f|ζ)=12πσfτe(ffτ¯)22σfτ2.

Next, we get the conditional pdf of fk+1,τ (offspring Adult’s F frequency in cycle k+1) given fk . We multiply Equations 37 and 43 and take the integral over ζ:

(44) Pfk+1,τ(f|fk)=01dζ Pfk+1,τ(f|ζ) Pfk+1,0(ζ|fk).

After maturation in cycle k+1, the Adult with the smallest frequency is selected among g Adult collectives, denoted as fk+1min . The pdf of fk+1min is obtained by the theory of extreme value statistics (Gumbel, 1958). The cumulative distribution function (cdf) of the minimum value fk+1min is given by

(45) Cfk+1min(f|fk)=Prob[fk+1minf|fk] =1Prob[fk+1,τminf|fk]=1Prob[(fk+1,τ(1)f)(fk+1,τ(2)f)(fk+1,τ(g)f)|fk].

Since frequencies are independent and identically distributed, Cfk+1min(f|fk)=1[Prob(fk+1,τf)|fk]g. Note that Prob[fk+1,τf|fk] =f1dfPfk+1,τ(f|fk) =10fdfPfk+1,τ(f|fk) =1Cfk+1,τ(f|fk) , and Equation 45 becomes

(46) Cfk+1min(f|fk)=1(1Cfk+1,τ(f|fk))g.

Then, the probability density function Ψ(fk+1|fk) is obtained by differentiating Equation 46 with respect to f and replacing ffk+1,

(47) Ψ(fk+1|fk)=g(1Cfk+1,τ(fk+1|fk))g1Pfk+1,τ(fk+1|fk).

We compute the probability density function Equation 47 by using numerical integration and compare it with the stochastic simulation results in Appendix 2—figure 1. The two distributions are similar.

To get the analytic approximation of the median of Equation 47, we assume that the Adult’s F frequency distribution is Gaussian. Then we only need to calculate the mean f¯(τ) and variance σf2(τ) of Adult’s F frequency. Instead of calculating the integral with respect to ζ in Equation 44, we put a set of initial values from Newborn’s F frequency distribution N(fk,fk(1fk)/N0) in Equations 20, 21, 28 and 34: S¯0=N0(1fk), F¯0=N0fk, σS,02=N0fk(1fk), σF,02=N0fk(1fk), and SF¯0S¯0F¯0=N0fk(1fk). Then we have

(48) f¯(τ)=F¯(τ)S¯(τ)+F¯(τ)=fkWτ+μω(1fk)(Wτ1)1fk+fkWτ+μω(1fk)(Wτ1),
(49) σf2(τ)=(1fk)N0Rτ(1fk+fkWτ+μω(1fk)(Wτ1))4×[fkWτ{(22fk+2fk2)RτWτ(1fk)fkWτ}+μω(1fk){(1fk)((2ωrS+2ω2fk)RτWτ2Wt+rSrS+2ω+2fkRτWτ)+2fk((1+fk)Rτ1)Wτ(Wτ1)}+O(μ2)],

which give rise to Equation 3 and Equation 4 in the main text, respectively. The functional form of Equations 48; 49 are plotted in Appendix 2—figure 2a.

The median (Median[Ψ(fk+1|fk)]f~) of Equation 47 satisfies Cfk+1min(f~|fk)=12, which means f~=Cfk+1,τ1(ln2g). If we assume that the distribution Equation 47 is Gaussian, then the inverse function Cfk+1,τ1(ln2g) can be written as

(50) Median[Ψ(fk+1|fk)]=Cfk+1,τ1(ln2g)=f¯(τ)+[Φ1(ln2g)]σf(τ),

where Φ1(y) is an inverse cumulative density function (CDF) of the normal distribution with mean i in Equation 49 and standard deviation σf(τ), a square root of Equation 49. Subtracting fk from Equation 50 gives the selection progress

(51) Δf=Median[Ψ(fk+1|fk)]fk=f¯(τ)+[Φ1(ln2g)]σf(τ)fk

which is Equation 2 in the main text.

Furthermore, we get an asymptotic expression of Φ1(ln2/g) when g is large (or Φ1(y) with small y). Here, we introduce a method from Phllip, 1960. We start from the CDF of the standard normal distribution, Φ(x)=erfc(x/2)/2 where the function x is the complementary error function. To get the expression of Φ1, we need an asymptotic expression of the inverse of y=erfc(x) function (x=erfc1(y)) as the inverse CDF Φ1(y)=2erfc1(2y). The known asymptotic expansion of y=erfc(x) for large Φ1(ln2/g) is erfc(x)ex2/xπ. By taking the logarithm of both sides, we have

(52) x2lny12lnπx2.

Replacing x2 on the right-hand side in Equation 52 into the expression itself, we get a continued logarithmic form of

(53) x2lny12lnπ(lny12lnπ(lny12ln).

Inserting x=erfc1(y) (square root of Equation 53) into the inverse CDF Φ1(y)=2erfc1(2y), we have Φ1(y)2ln2ylnπ(ln2y). So, the asymptotic expression of N(fk,fk(1fk)/N0) is given by

(54) Φ1(ln2g)2lngln[lng]+.
Appendix 2—figure 1
The probability density functions of the selected Adult’s F frequency fk+1 subtracted by fk.

For simulations (blue), at each fk, we performed 1000 stochastic simulations. The orange distribution represents Equation 47 computed by numerical integration. The median values of the distributions are shown in Figure 3a in the main text.

Appendix 2—figure 2
Effect of experimental parameters in the distribuiton of Adult's F frequency.

(a) Mean (Equation 41) and variance (Equation 42) of f values of Adult collectives with respect to the Newborn frequency f0. (b) Scaling relation of F frequency variance (Equation 49) with Newborn collective size N0. The initial F frequency is 0.5. The parameters are rS=0.5, ω=0.03, μ=0.0001, and τ4.8. (c) Relation of F frequency variance (Equation 49) with maturation time τ. Other parameters are the same as b.

Appendix 2—figure 3
Median (orange) and mean (violet) have similar distributions.

We performed 1000 simulations to get probability density. (a) g=10, (b) g=100, and (c) g=1000. Initial F frequency is fk=0.5. The parameters are rS=0.5, ω=0.03, μ=0.0001 and τ=ln[1000]/rS.

Appendix 3

Critical newborn size N˘0 to allow all target frequencies

First, we note that σf2 in Equation 49 is proportional to N01 for the following reasons. Variance σS2(t) in Equation 28 scales linearly with N0 since both S¯0 and σS,02 scale linearly with N0. Variance σF2(t) in Equation 33 also scales linearly with N0 because σF,02, F¯0, S¯0, and covariance SF¯0S¯0F¯0 all scale linearly with N0. The mean adult size N¯(τ)=S¯(τ)+F¯(τ) is also proportional to N0 because the average cell numbers in Equations 18; 19 are linear with respect to N0. Thus, the scaling relation of Equation 49 is given by σf2(τ) =[(1f¯(τ))2σF2(τ)+f¯(τ)2σS2(τ)]/N¯(t)2 N0/N021/N0.

Small N0 makes all target frequencies achievable, as shown in Figure 4a in the main text. That is because small N0 induces large σf, and thus N0 smaller than a certain critical value N˘0 makes the selection progress Δf always negative, regardless of the value of fk (i.e.Δf =f¯(τ) +[Φ1(ln2g)]σf(τ) fk<0). That means the inter-collective selection overcomes intra-collective selection in any target frequencies. To get an analytical approximation of the critical newborn size N˘0, we simply assume that selection progress Δf is maximum at fk=1fk=12 where the changes in f¯ and σf2 are fastest. If the maximum value of Δf is zero, all other values of Equation 50 are negative, which naturally states that all targets are achievable. Putting fk=12, Equations 48; 49 become

(55) f¯(τ)|fk=12=F¯(τ)S¯(τ)+F¯(τ)=Wτ+μω(Wτ1)1+Wτ+μω(Wτ1),

and

(56) σf2(τ)|fk=122N0Rτ(1+Wτ+μω(Wτ1))4×[Wτ(3RtWτ1Wτ)+μω{((2ωrS+2ω1)RtWτ2Wt+rSrS+2ω+RtWτ)+(3Rt2)Wτ(Wτ1)}]

So, by setting Δf|fk=12=f¯(τ)|fk=12+[Φ1(ln2g)]σf(τ)|fk=1212=0 with N0=N˘0, we get a solution of

(57) N˘0=[Φ1(ln2g)]28Rτ[1+Wτ+μω(Wτ1)]2[1Wτμω(Wτ1)]2×[Wτ(3RτWτ1Wτ)+μω{((2ωrS+2ω1)RτWτ2Wτ+rSrS+2ω+RτWτ)+(3Rτ2)Wτ(Wτ1)}]

Thus, all target frequencies are successfully selected with Newborn size N0 smaller than N˘0. If the mutation rate is zero, the critical value becomes

(58) N˘0=[Φ1(ln2g)]28Wτ(3RτWτWτ1)Rτ(Wτ21)2.

Appendix 4

Selection without mutation μ=0

When the mutation rate is zero, two genotypes behave as two distinct species. The compositional change is provided by Equation 50 with setting μ=0. Corresponding f¯ in Equation 48 and σf2 in Equation 49 become

(59) f¯(τ)=fkWτ1fk+fkWτ,
(60) σf2(τ)=fk(1fk)Wτ[(22fk+2fk2)RτWτ(1fk)fkWτ]N0Rτ(1fk+fkWτ)4.

Equations 59; 60 suggest that when a community consists of two competing species, we obtain similar conclusions on the accessible region for target composition. The stochastic simulation results are presented in Appendix 4—figure 1.

Appendix 4—figure 1
Simulation with zero mutation rate.

Color map of the absolute error d=|ff^| between frequency f of the averaged selected collectives at the end of simulations (k=1000) and the target frequency f^. For parameters, we used rS=0.5, ω=0.03, μ=0, N0=1000, g=10, and τ4.8.

Appendix 5

Stronger or weaker advantages ω

The solution of Equation (2) in main text provides the boundary values with varying the ω, the fitness advantage of F over S. We numerically calculate the solutions and plot in Appendix 5—figure 1.

Appendix 5—figure 1
Change of success region in varying selective advantage ω.rs, ω=0.03, μ=0.0001, N0=1000, g=10, and τ4.8.

Appendix 6

Deleterious mutation ω<0

In the main text, we show that the target composition can be achieved in some ranges of initial and target values when the mutation is beneficial to growth. The same analogy can be applied when the mutation is deleterious. Since the F cells grow slower than the S cells (ω<0), the F frequency naturally decreases in the maturation step. Then, the challenging case is selecting a larger F frequency against the intra-collective selection. So the conditional probability distribution Ψ(fk+1|fk) that we consider now is a maximum value distribution of Equation 44. Thus, instead of Equation 45, we look for the cumulative distribution function of the maximum value fk+1max such that

(61) Cfk+1max(f|fk)=Prob[fk+1maxf|fk]=Prob[(fk+1(1)(τ)f)(fk+1(2)(τ)f)(fk+1(g)(τ)f)|fk].

If all frequencies are independent and identically distributed random variables, the cumulative distribution function becomes

(62) Cfk+1max(f|fk)=(Cfk+1,τ(f|fk))g.

Likewise in the previous section, we get the conditional probability density function by differentiating Equation 62 with respect to f and replacing ffk+1 as

(63) Ψ(fk+1|fk)=g(Cfk+1,τ(fk+1|fk))g1Pfk+1,τ(fk+1|fk).

The distribution in Equation 63 is evaluated for various fk in Appendix 6—figure 1a with numerical simulations, and the median values of distributions are presented in Appendix 6—figure 1b. In the case of ω=0.03, the target frequency is lower than around 0.3 and larger than around 0.7 can be selected. Since the sign of ω is opposite to the result in the main text, the diagram is reversed from Figure 2e in the main text.

Appendix 6—figure 1
Artificial selection also works for deleterious mutation.

(a) Conditional probability density functions of fk+1fk for various fk values. The left-hand side distribution is obtained from simulations and the right-hand side distribution is numerically obtained by evaluating Equation 63. Small triangles inside indicate the median values of the distributions. (b) The median value of distributions at a given fk. The points where the shifted median becomes zero, Median[Ψ(fk+1fk|fk)]=0 are denoted as fL and fU, respectively. (c) The relative error between the target frequency f^ and the ensemble averaged selected frequency fk is measured after 1000 cycles starting from the initial frequency f¯1,0. Either the lower target frequencies or the higher target frequencies starting from the high initial frequencies can be achieved. The black dashed lines indicate the predicted boundary values fU and fL in a.

Appendix 7

Selecting more than one collective

In the main text, we choose one collective which has the closest frequency to the target among g collectives. Such a ‘top 1’ strategy allows us to apply extreme value theory. However, ‘top 1’ may be too restrictive (Xie et al., 2019). Thus, we test the ‘top-tier’ strategy by choosing the top five among 100 Adults (Appendix 7—figure 1). The top-tier strategy is shown to be inefficient in our system. This is because in Xie et al., 2019, nonheritable variations – such as stochastic fluctuations in species composition introduced by pipetting – caused nonheritable variations in collective function. Nonheritable variations could potentially mask desired mutations if these mutations happened to occur in an ‘unlucky’ environment that yielded lower collective functions. Hence, lenient selection would allow the preservation of these mutations. In contrast here, stochastic fluctuations in genotype composition are heritable: a parent Newborn with lower F frequency f will tend to have offspring Newborns with lower f values. Hence, top-1 is more effective in this study.

Appendix 7—figure 1
Selecting top 5% outperforms selecting top 1.

We bred 100 collectives and chose either top-1 collective (solid line) or top-5 collectives (dashed line) with f closest to the target value f^ (black dotted line).

Appendix 8

Extension to three-population system

We assume that collectives consist of three genotypes with slow-growing (S), fast-growing (F), and faster-growing (FF) types. The growth rate of S is rS. Each mutation adds ω to the growth rate. Thus, the F and FF types have growth rates rS+ω and rS+2ω, respectively. The mutation rate is μ. So, the birth and mutation events are written by the chemical reactions:

(64) SrSS+S,
(65) Fω>0rS+ωF+F,
(66) FFω>0rS+2ωFF+FF,
(67) SμF,
(68) FμFF

We write a master equation of the processes for P(S,F,FF,t) which is the probability to have S, F, and FF numbers of S, F, and FF cells at time t, respectively.

(69) dP(S,F,FF,t)dt=rS(S1)P(S1,F,FF,t)rSSP(S,F,FF,t)+(rS+ω)(F1)P(S,F1,FF,t)(rS+ω)FP(S,F,FF,t)+(rS+2ω)(FF1)P(S,F,FF1,t)(rS+2ω)FFP(S,F,FF,t)+μ(S+1)P(S+1,F1,FF,t)μSP(S,F,FF,t)+μ(F+1)P(S,F+1,FF1,t)μFP(S,F,FF,t).

The composition of collective i in cycle k is now represented with two frequencies (fk(i)(t),hk(i)(t))pk(i)(t) where the F frequency is fk(i)(t)=Fk(i)(t)/(Sk(i)(t)+Fk(i)(t)+FFk(i)(t)) and the FF frequency is hk(i)(t)=FFk(i)(t)/(Sk(i)(t)+Fk(i)(t)+FFk(i)(t)). Then, the target composition is set to be (f^,h^). The composition of the selected Adult in cycle k is (fk,hk)pk. We apply the processes used in the above Appendix 2 to obtain the conditional probability Ψ(pk+1|pk) by using the master Equation 69.

At the reproduction step in cycle k, we choose N0 cells from the selected Adult whose composition is (fk,hk)pk. Then, newborn collectives are independently sampled from a multinomial distribution. For convenience, we drop the collective index (i). Then, the conditional joint probability mass function of Fk+1,0,FFk+1,0 cells is represented by

(70) P(Fk+1,0,FFk+1,0|fk,hk)=N0!(N0Fk+1,0FFk+1,0)!Fk+1,0!FFk+1,0!×(1fkhk)N0Fk+1,0FFk+1,0(fk)Fk+1,0(hk)FFk+1,0,

where the number of S Sk+1,0 is automatically set to be Sk+1,0=N0Fk+1,0FFk+1,0. Then, the approximated multivariate normal distribution is N(N0pk,N0Mk+1) where the mean distribution is pk=(fk,hk) and covariance matrix is Mk+1. The diagonal terms of Mk+1 are variances σX2=X2¯[X¯]2 and the off-diagonal terms are covariances σXY=XY¯X¯Y¯. The matrix is given by

(71) Mk+1=[fk(1fk)N02fkhkN02fkhkN02hk(1hk)N02][σfk+1,02σfk+1,0hk+1,0σfk+1,0hk+1,0σhk+1,02].

Then a Newborn’s composition ρ(ζ,η) follows the multivariate Gaussian distribution (ζ,η)N(pk,Mk+1) whose joint probability distribution is given by

(72) Pρk+1,0(ρ|pk)=12π2detMk+1e12(ρpk)Mk+11(ρpk)T.

At the beginning of cycle k, a newborn collective starts from (S0,F0,FF0) cells (for convenience, cycle index k is dropped.) In terms of (ζ,η), each initial numbers are S0=N0(1ζη), F0=N0ζ, and FF0=N0η. Their initial covariance matrix is N02Mk+1. By using Equation 69, we can write ordinary differential equations up to the second moment.

(73) dS¯(t)dt=rSS¯(t)μS¯(t),
(74) dF¯(t)dt=(rS+ω)F¯(t)+μ(S¯(t)F¯(t)),
(75) dFF¯(t)dt=(rS+2ω)FF¯(t)+μF¯(t),
(76) dS2¯dt=2(rSμ)S2¯+(r+μ)S¯,
(77) dF2¯dt=2(rS+ωμ)F2¯+(rS+ω+μ)F¯+μ(2SF¯+F¯),
(78) dFF2¯dt=2(rS+2ω)FF2¯+(rS+2ω)FF¯+μ(2FFF¯+F¯),
(79) dSF¯dt=(2rS+ω2μ)SF¯+μ(S2¯S¯),
(80) dFFF¯dt=(2rS+3ωμ)FFF¯+μ(SFF¯+F2¯F¯),
(81) dSFF¯dt=(2rS+2ωμ)SFF¯+μSF¯.

The initial conditions of the system in coupled Equations 73–81 are obtained by the mean and (co)variances of Equation 70. By solving equations numerically, we obtain a set of mean cell numbers (S¯,F¯,FF¯) and a set of variances (σS2,σF2,σFF2) as well as covariances (σSF,σFFF,σSFF). We assume that the covariances are smaller than the variances. We consider S,F, and FF as Gaussian random variables

(82) S(t)S¯(t)+σS(t)GS(t),
(83) F(t)F¯(t)+σF(t)GF(t),
(84) FF(t)FF¯(t)+σFF(t)GFF(t).

Then, the F frequency becomes

(85) f(t)F¯(t)+σF(t)GF(t)S¯(t)+σS(t)GS(t)+F¯(t)+σF(t)GF(t)+FF¯(t)+σFF(t)GFF(t)f¯(t)+σf(t)Gf(t),

where f¯=F¯/(S¯+F¯+FF¯) and σf2=(f¯2(σS2+σFF2)+(1f¯2)σF2)/(S¯+F¯+FF¯). Similarly, the FF frequency is

(86) h(t)FF¯(t)+σFF(t)GFF(t)S¯(t)+σS(t)GS(t)+F¯(t)+σF(t)GF(t)+FF¯(t)+σFF(t)GFF(t)h¯(t)+σh(t)Gh(t),

where h¯=FF¯/(S¯+F¯+FF¯) and σh2=(h¯2(σS2+σF2)+(1h¯2)σFF2)/(S¯+F¯+FF¯). The dynamic flow of F and FF frequencies during maturation is shown in Appendix 8—figure 1a. If the covariances are small enough, we can approximate the joint probability distribution of Adult’s composition (fτ,hτ)=pτ as

(87) Ppτ(p|ρ)=12πσfτe(ff¯τ)22σfτ212πσhτe(hh¯τ)22σhτ2.

With cycle index k, we get the conditional probability of matured collectives Ppk+1,τ(p|pk) by

(88) Ppk+1,τ(p|pk)=01dζ01dηPpk+1,τ(p|ζ,η)Ppk+1,0(ζ,η|pk).

We select the Adult collective among g Adult collectives such that the change in frequencies during maturation could be compensated. During maturation, a frequency distribution moves in different directions in (f,h) space depending on the initial composition (fk,hk) So, we take different directions to obtain the extreme value distributions. Considering only the sign of the frequency changes in f and h, we take either maximum or minimum. The mean change in h is always positive in the whole (f,h) space since dFF¯/dt is always positive in Equation 75. Thus, we choose the minimum value hk+1min in every selection step.

If the mean fk+1,τ(i)¯=01df01dh fPpk+1,τ(f,h|pk) is larger than fk, the minimum value among fk+1,τ(1),fk+1,τ(2),,fk+1,τ(g) will be chosen in the selection step to compensate for the frequency change in the maturation step. Let us denote the selected valued of f and h as fk+1=min(fk+1,τ(1),fk+1,τ(2),,fk+1,τ(g)) and hk+1=min(hk+1,τ(1),hk+1,τ(2),,hk+1,τ(g)). We temporarily drop the time index τ for simplicity. Then, the joint cumulative distribution function Cpk+1(f,h|pk)=Pr[fk+1<fhk+1<h|pk] is

(89) Cpk+1(f,h|pk)=0fdf0hdhPpk+1(f,h|pk)=(01f1)df(01h1)dhPpk+1(f,h|pk)=01df01dhPpk+1(f,h|pk)h1df01dhPpk+1(f,h|pk)01dfh1dhPpk+1(f,h|pk)+f1dfh1dhPpk+1(f,h|pk)=1Pr[fk+1f|pk]Pr[hk+1h|pk]+Pr[fk+1fhk+1h|pk].

The probability Pr[fk+1fhk+1h|pk] can be converted as

(90) Pr[fk+1fhk+1h|pk]=Pr[fk+1(1)fhk+1(1)hfk+1(2)fhk+1(2)hfk+1(g)fhk+1(g)h|pk]=[Pr[fk+1fhk+1h|pk]]g=[1Pr[fk+1<f|pk]Pr[hk+1<h|pk]+Pr[fk+1<fhk+1<h|pk]]g=[1Cfk+1(fpk)Chk+1(hpk)+Cpk+1(f,hpk)]g,

where Cpk+1(f,h|pk)=0fdf0hdhPpk+1(f,h|pk) is a conditional joint cumulative distribution function of (f(i),h(i)). The marginal cumulative distribution functions are

(91) Cfk+1(f|pk)=0fdf01dhPpk+1(f,h|pk),
(92) Chk+1(h|pk)=01df0hdhPpk+1(f,h|pk).

Similarly, the probabilities Pr[fk+1f|pk] and Pr[hk+1h|pk] are converted into Pr[fk+1f|pk]=[1Cfk+1(f|pk)]g and Pr[hk+1h|pk]=[1Chk+1(h|pk)]g. Thus, the joint cumulative distribution function is

(93) Cpk+1(f,h|pk)=1[1Cfk+1(f|pk)]g[1Chk+1(h|pk)]g+[1Cfk+1(f|pk)Chk+1(h|pk)+Cpk+1(f,h|pk)]g.

Then, the conditional probability of the selected collective is given by

(94) Ppk+1(f,h|pk)=2fhCpk+1(f,h|pk)=g(g1)[1Cfk+1(f|pk)Chk+1(h|pk1)+Cpk+1(f,h|pk)]g2×(Pfk+1(f|pk1)fCpk+1(f,h|pk))(Phk+1(h|pk1)hCpk+1(f,h|pk))+g[1Cfk+1(f|pk)Chk+1(h|pk1)+Cpk+1(f,h|pk)]g1Ppk+1(f,h|pk),

where fCpk+1(f,h|pk) =hCpk+1(f,h|pk) =0hdhPpk+1(f,h|pk) and hCpk+1(f,h|pk) =hCpk+1(f,h|pk) =0fdfPpk+1(f,h|pk).

If the mean fk+1,τ¯ is smaller than fk, the chosen collective is likely to have maximum f values among g matured collectives. Then, the definition of f is written by fk+1=max(fk+1(1),fk+1(2),,fk+1(g)). We rewrite the joint cumulative distribution function Cpk+1(f,h|pk) to be a little different from Equation 89 because now we have to utilize the condition f<f instead of f>f,

(95) Cpk+1(f,h|pk)=0fdf0hdhPpk+1(f,h|pk)=0fdf(01h1)dhPpk+1(f,h|pk)=0fdf01dhPpk+1(f,h|pk)0fdfh1dhPpk+1(f,h|pk).=Pr[fk+1<f|pk]Pr[fk+1<fhk+1h|pk].

The probability Pr[fk+1<fhk+1h|pk] is converted as

(96) Pr[fk+1<fhk+1h|pk]=Pr[fk+1(1)<fhk+1(1)hfk+1(2)<fhk+1(2)hfk+1(2)<fhk+1(2)h|pk]=[Pr[fk+1<fhk+1h|pk]]g=[Pr[fk+1<f|pk]Pr[fk+1<fhk+1<h|pk]]g.
(97) =[Cfk+1(f|pk)Cpk+1(f,h|pk)]g.

Thus, the joint cumulative distribution function is given by

(98) Cpk+1(f,h|pk)=[Cfk+1(f|pk)]g[Cfk+1(f|pk)Cpk+1(f,h|pk)]g.

In this case, the conditional probability distribution function is given by

(99) Ppk+1(f,h|pk)=2fhCpk+1(f,h|pk)=g(g1)[Cfk+1(f|pk)Cpk+1(f,h|pk)]g2×(Pfk+1(f|pk)fCpk+1(f,h|pk))hCpk+1(f,h|pk)+g[Cfk+1(f|pk)Cpk+1(f,h|pk)]g1Ppk+1(f,h|pk).

By replacing (f,h) to pk+1, we finally obtain the conditional probability distribution Ψ(pk+1|pk),

(100) Ψ(pk+1|pk)={g(g1)[1Cfk+1(fk+1|pk)Chk+1(hk+1|pk)+Cpk+1(pk+1|pk)]g2×(Pfk+1(fk+1|pk)fCpk+1(pk+1|pk))(Phk+1(h)hCpk+1(pk+1|pk))+g[1Cfk+1(fk+1|pk)Chk+1(hk+1|pk)+Cpk+1(pk+1|pk)]g1Ppk+1(pk+1|pk) for fk+1,τ¯fk0 and hk+1,τ¯hk0g(g1)[Cfk+1(fk+1|pk)Cpk+1(pk+1|pk)]g2(Pfk+1(fk+1|pk)fCpk+1(pk+1|pk))hCpk+1(pk+1|pk)+g[Cfk+1(fk+1|pk)Cpk+1(pk+1|pk)]g1Ppk+1(pk+1|pk) for fk+1,τ¯fk<0 and hk+1,τ¯hk0.

Using Equation 100, we get the mean values of f and h as

(101) fk+1¯=01df01dh f×Pfk+1,hk+1(f,h|fk,hk),
(102) hk+1¯=01df01dh h×Pfk+1,hk+1(f,h|fk,hk).

We define the accessible region in frequency space where the signs of the changes in both F frequency and FF frequency after a cycle are opposite to that of maturation (see Appendix 8—figure 1),

(103) sign(fk+1¯fk)×sign(fk+1,τ¯fk)0 and sign(hk+1¯hk)×sign(hk+1,τ¯hk)0,

where fk+1,τ¯ and fk+1¯ are the mean values of F frequencies after the maturation step in cycle k+1 before and after selection, respectively, and h values are defined similarly for FF. Or, if the condition is not met, the composition of the selected collective may diverge from the target composition after several cycles. The accessible regions are marked in the gold-colored area in Appendix 8—figure 1b. Similar to the two-population case, the accessible region is shaped by the flow velocity of the composition during the maturation step, as depicted in the flow diagram in Appendix 8—figure 1a. Both F and FF frequencies tend to increase, and the inter-collective selection can compensate for these changes if the composition changes slowly when the F and FF frequencies are small. However, if the changes occur too rapidly when the FF frequency is intermediate, the frequency cannot be stabilized. So the accessible region is limited to the regions where the composition changes slowly.

This is explainable by projecting the three-population problem into the two-population problem. The selective advantage of FF relative to the rest of the collective mainly determines the accessible region. The growth rate of the rest varies from rS to rS+ω according to F frequency, so the mean growth rate of the rest is written by rS¯=rS+fω where f is F frequency in S+F. Then, the corresponding selective advantage of FF is ω¯=(2f)ω which varies between ω to 2ω. Using rS¯ and ω¯ similar to Appendix 2, we get bounds of the accessible region (see dashed line in Appendix 8—figure 1b). The boundary from the projected problem agreed well with the original three-population problem.

Appendix 8—figure 1
Accessible regions in the three-population system.

(a) The flow of composition change in fast-growing (F) and faster-growing (FF) frequencies at each composition (f,h). Top corner indicates that FF cells fix in the collective. Right bottom corner means collectives with only F cells, while collectives contain S cells only at left bottom corner. Arrow length means the speed of change. (b) The accessible regions are marked by the gold area. If the signs of changes in both F frequency and FF frequency after inter-collective selection are opposite to those during maturation, then the given composition is accessible. Otherwise, the composition is not accessible and will change after cycles. Dashed lines are the boundary of the accessible region by projecting the collective into a two-population problem (FF vs. S+F). The figures are drawn using the mpltern package (Ikeda et al., 2019).

Appendix 9

Derivation of equations

In this section, we go over the derivation of Equations 18–42 for readers not equipped with advanced mathematics training.

Assumptions: μω,rS

Equations 18 and 19

Equation 18 is straightforwardly solved by integrating Equation 16. Equation 19 is obtained from Equation 17 using integration factor e(rS+ω)t:

(104) [dF¯(t)dt(rS+ω)F¯(t)]e(rS+ω)t=μS¯oe(rSμ)te(rS+ω)t,
(105) d[F¯(t)e(rS+ω)t]dt=μS¯oe(ω+μ)t.

Integrating both sides, we get F¯(t)e(rS+ω)tF¯(0)=μS¯oω+μ(1e(ω+μ)t). Thus,

(106) F¯(t)=F¯oe(rS+ω)t+μS¯oω+μ(e(rS+ω)te(rSμ)t)F¯oe(rS+ω)t+μS¯oω(e(rS+ω)terSt).

Equations 24 and 25

Applying Equation 12, we have

(107) dS2¯dt=S=0F=0S2dP(S,F,t)dt=S=0F=0[rSS2(S1)P(S1,F,t)+S2(rS+ω)(F1)P(S,F1,t)+μ(S+1)S2P(S+1,F1,t)(rS+(rS+ω)F+μS)S2P(S,F,t)].

We collect the two purple-colored terms and change the order of summation. Note that the first purple-colored term does not change regardless of whether S starts from 0 or 1 because the term is zero for S=0. Thus, the first purple-colored term is equivalent to F=0S=1[rSS2(S1)P(S1,F,t)]. Let α=S1 , and this becomes F=0α=0[rS(α+1)2αP(α,F,t)]. We reassign α as S, and obtain:

(108) F=0S=0[rS(S+1)2SP(S,F,t)]F=0S=0rSS3P(S,F,t)=rSF=0S=0[(2S+1)S]P(S,F,t)=rS(2S2¯+S¯).

We collect the two blue terms, and similarly obtain:

(109) S=0F=0[S2(rS+ω)(F1)P(S,F1,t)(rS+ω)FS2P(S,F,t)]=(rS+ω)S=0F=0[S2FP(S,F,t)FS2P(S,F,t)]=0.

Finally, we collect the two red terms. For the first red term, the sum is the same regardless of whether we start from S=0 or -1. Let S start from -1, and we have

(110) F=0S=1[μ(S+1)S2P(S+1,F1,t)].

Let α=S+1, then the term becomes F=0α=0[μα(α1)2P(α,F1,t)]. We reassign α as S, and additionally apply index change on F1:

(111) μF=0S=0[S(S1)2P(S,F,t)S3P(S,F,t)]=μF=0S=0[(2S2+S)P(S,F,t)]=μ(2S2¯+S¯).

Now, add the three parts together, and we have

(112) dS2¯dt=μ(2S2¯+S¯)+rS(2S2¯+S¯)=2S2¯(rSμ)+S¯(rS+μ),

which is Equation 24. Likewise,

(113) dF2¯dt=S=0F=0F2dP(S,F,t)dt=S=0F=0[rSF2(S1)P(S1,F,t)+F2(rS+ω)(F1)P(S,F1,t)+μ(S+1)F2P(S+1,F1,t)(rSS+(rS+ω)F+μS)F2P(S,F,t)]=S=0F=0[rSF2S+(F+1)2(rS+ω)F+μS(F+1)2(rSS+(rS+ω)F+μS)F2]P(S,F,t)=S=0F=0[(F+1)2(rS+ω)F+μS(F+1)2((rS+ω)F+μS)F2]P(S,F,t)=S=0F=0[(rS+ω)F(2F+1)+μS(2F+1)]P(S,F,t)
(114) =2(rS+ω)F2¯+(rS+ω)F¯+2μS¯F¯+μS¯

which is Equation 25.

Equation 26 and 28

Using integration factor e2(rSμ)t and Equation 24, we have:

(115) [dS2¯dt2(rSμ)S2¯]e2(rSμ)t=(rS+μ)S¯oe(rSμ)te2(rSμ)t,
(116) d[S2¯e2(rSμ)t]dt=(rS+μ)S¯oe(μrS)t,
(117) S2¯e2(rSμ)t=rS+μμrSS¯o[e(μrS)t1]+S2¯o.

Since μrS, we have

(118) S2¯=rS+μμrSS¯o[e(μrS)t1]e2(rSμ)t+S2¯oe2(rSμ)tS¯oe2rSt[1erSt]+S2¯oe2rSt,

where S2¯o is the expected S2 at time 0. For Equation 28,

(119) σS2(t)=S2¯(t)[S¯(t)]2S¯oe2rSt[1erSt]+S2¯oe2rSt(S¯oerSt)2=S¯oe2rSt[1erSt]+e2rSt(S2¯o(S¯o)2)=S¯oe2rSt[1erSt]+e2rStσS2(0).

Equation 30 and 31

Since S F¯=S=0F=0SFP(S,F,t),

(120) dSF¯dt=S=0F=0[SFrS(S1)P(S1,F,t)+SF(rS+ω)(F1)P(S,F1,t)+SFμ(S+1)P(S+1,F1,t)SF(rSS+(rS+ω)F+μS)P(S,F,t)]=S=0F=0[(S+1)FrSS+S(F+1)(rS+ω)F+(S1)(F+1)μSSF(rSS+(rS+ω)F+μS)]P(S,F,t)=S=0F=0[2rSSF+ωSFμSF+μS2μS)]P(S,F,t)=(2rS+ωμ)SF¯+μ(S2¯S¯).

We can solve this, again using the integration factor technique above:

(121) d[SF¯e(2rS+ωμ)t]dt=μ(S2¯S¯)e(2rS+ωμ)t.

Thus, we have

(122) SF¯e(2rS+ωμ)tSF¯o=μ(S2¯S¯)e(2rS+ωμ)tdt=μ(rS+μμrSS¯o[e(μrS)t1]e2(rSμ)t+S2¯oe2(rSμ)tS¯oe(rSμ)t)e(2rS+ωμ)tdt=μ(rS+μμrSS¯o[e(μrS)te2(rSμ)t]e(2rS+ωμ)t+S2¯oe2(rSμ)te(2rS+ωμ)tS¯oe(rSμ)t(2rS+ωμ)t)dt=μ(rS+μμrSS¯oe(rS+ω)trS+μμrSS¯oe(μ+ω)t+S2¯oe(μ+ω)tS¯oe(rS+ω)t)dt=μ{2rSμrSS¯oe(rS+s)t1(rS+ω)+(S2¯orS+μμrSS¯o)e(μ+ω)t1(μ+ω)},

which results in

(123) SF¯(t)=SF¯oe(2rS+ωμ)t+μ{2rSμrSS¯oe(rS+ω)t1(rS+ω)+(S2¯orS+μμrSS¯o)e(μ+ω)t1(μ+ω)}e(2rS+ωμ)t=SF¯oe(2rS+ωμ)t+2rSμμrSS¯oe(rSμ)te(2rS+ωμ)t(rS+ω)+μ(S2¯orS+μμrSS¯o)e2(rSμ)te(2rS+ωμ)t(μ+ω)SF¯oe(2rS+ω)t2μS¯orS+ω(e(2rS+ω)terSt)+μω(S2¯o+S¯o)(e(2rS+ω)te2rSt).

Equation 32

From Equation 25, we have

(124) (dF2¯dt2(rS+ω)F2¯)e2(rS+ω)t=d(F2¯e2(rS+ω)t)dt=((rS+ω)F¯+2μSF¯+μS¯)e2(rS+ω)t.

The right-hand side becomes

(125) (rS+ω)(F¯oe(rS+ω)t+μS¯oω+μ(e(rS+ω)te(rSμ)t))e2(rS+ω)t+2μSF¯oe(ω+μ)t+μS¯oe(rS+2ω+μ)t+o(μ2)=(rS+ω)(F¯oe(rS+ω)t+μS¯oω+μ(e(rS+ω)te(rS+2ω+μ)t))+2μSF¯oe(ω+μ)t+μS¯oe(rS+2ω+μ)t+o(μ2)=(rS+ω)(F¯oω+N0μω+μe(rS+ω)tμS¯oω+μe(rS+2ω+μ)t)+2μSF¯oe(ω+μ)t+μS¯oe(rS+2ω+μ)t+o(μ2).

Note that we have checked that the second and third terms of SF¯ can be ignored after we compare the full calculation with this simpler version. Integrate both sides:

(126) F2¯e2(rS+ω)tF2¯o(rS+ω)(F¯oω+N0μ(ω+μ)(rS+ω)(1e(rS+ω)t)+μS¯o(ω+μ)(rS+2ω+μ)(e(rS+2ω+μ)t1))+2μSF¯oω+μ(1e(ω+μ)t)+μS¯orS+2ω+μ(1e(rS+2ω+μ)t)+o(μ2)

Then, we have

(127) F2¯(t)F2¯oe2(rS+ω)t+(rS+ω)(F¯oω+N0μ(ω+μ)(rS+ω)(e2(rS+ω)te(rS+ω)t)+μS¯o(ω+μ)(rS+2ω+μ)(e(rSμ)te2(rS+ω)t))+2μSF¯oω+μ(e2(rS+ω)te(2rS+ωμ)t)+μS¯orS+2ω+μ(e2(rS+ω)te(rSμ)t)+o(μ2)=F2¯oe2(rS+ω)t+(F¯oω+N0μω(e2(rS+ω)te(rS+ω)t)+(1ω1rS+2ω)μS¯o(erSte2(rS+ω)t))+2μSF¯oωe(rS+ω)t(e(rS+ω)terSt)+μS¯orS+2ω(e2(rS+ω)terSt)+o(μ2)=F2¯oe2(rS+ω)t+2μSF¯oωe(rS+ω)t(e(rS+ω)terSt)+F¯oe(rS+ω)t(e(rS+ω)t1)+μS¯oω[(e(rS+ω)t)+F¯oS¯o(e2(rS+ω)te(rS+ω)t)+2ωrS+2ωe2(rS+ω)t+rSrS+2ωerSt]+o(μ2)=F2¯oe2(rS+ω)t+2μSF¯oωe(rS+ω)t(e(rS+ω)terSt)+F¯oe(rS+ω)t(e(rS+ω)t1)+μS¯oω[(e(rS+ω)t)+2ωrS+2ωe2(rS+ω)t+rSrS+2ωerSt]+μF¯oω(e2(rS+ω)te(rS+ω)t)+o(μ2)=F2¯oe2(rS+ω)t+2μSF¯oωe(rS+ω)t(e(rS+ω)terSt)+F¯o(1+μω)e(rS+ω)t(e(rS+ω)t1)+μS¯oω[(e(rS+ω)t)+2ωrS+2ωe2(rS+ω)t+rSrS+2ωerSt]+o(μ2)=F2¯oe2(rS+ω)t+2μSF¯oωe(rS+ω)t(e(rS+ω)terSt)+F¯oe(rS+ω)t(e(rS+ω)t1)+μS¯oω[(e(rS+ω)t)+2ωrS+2ωe2(rS+ω)t+rSrS+2ωerSt]+o(μ2).

Equation 33

(128) σF2(t)=F2¯(t)[F¯(t)]2F2¯oe2(rS+ω)t+2μSF¯oωe(rS+ω)t(e(rS+ω)terSt)+F¯oe(rS+ω)t(e(rS+ω)t1)+μS¯oω[(e(rS+ω)t)+2ωrS+2ωe2(rS+ω)t+rSrS+2ωerSt]+o(μ2)[F¯oe(rS+ω)t+μS¯oω(e(rS+ω)terSt))]2=F2¯oe2(rS+ω)t+2μSF¯oωe(rS+ω)t(e(rS+ω)terSt)+F¯oe(rS+ω)t(e(rS+ω)t1)+μS¯oω[(e(rS+ω)t)+2ωrS+2ωe2(rS+ω)t+rSrS+2ωerSt]+o(μ2)F¯o2e2(rS+ω)t2F¯oe(rS+ω)tμS¯oω(e(rS+ω)terSt))[μS¯oω(e(rS+ω)terSt))]2=σF,02e2(rS+ω)t+2μ(SF¯oS¯oF¯o)ωe(rS+ω)t(e(rS+ω)terSt)+F¯oe(rS+ω)t(e(rS+ω)t1)+μS¯oω[e(rS+ω)t+2ωrS+2ωe2(rS+ω)t+rSrS+2ωerSt]+o(μ2).

Equations 40-42

To derive this equation, we use the fact that 1/(1+x)1x for small x. We will omit (t) for simplicity. Also, note that we are considering relatively large populations so that the standard deviation is much smaller than the mean.

(129) fF¯+σFGFS¯+σSGS+F¯+σFGF=F¯(1+σFF¯GF)(S¯+F¯)(1+σSGS+σFGFS¯+F¯)F¯S¯+F¯(1+σFF¯GF)(1σSGS+σFGFS¯+F¯)f¯(1+σFF¯GFσSGS+σFGFS¯+F¯)=f¯(1+(σFF¯σFS¯+F¯)GFσSS¯+F¯GS).

Recall that if AN(μA,σA), BN(μB,σB), then ABN(μAμB,σA2+σB2). Thus, f is distributed as a Gaussian with the mean of

(130) f¯(t)=F¯(t)S¯(t)+F¯(t)F¯oe(rS+ω)t+μS¯oω(e(rS+ω)terSt)S¯oerSt+F¯oe(rS+ω)t+μS¯oω(e(rS+ω)terSt)=F¯oeωt+μS¯oω(eωt1)S¯o+F¯oeωt+μS¯oω(eωt1)=f¯oeωt+μω(1f¯o)(eωt1)(1f¯o)+f¯oeωt+μω(1f¯o)(eωt1).

Note that the initial value of mean f¯(0) is equal to the mean of the binomial distribution Equation 37, fk. The variance is

(131) σf2(t)=f¯2((S¯(S¯+F¯)F¯)2σF2(t)+(1S¯+F¯)2σS2(t))=f¯2((1f¯(t))2N¯(t)2(f¯(t))2σF(t)+1N¯(t)2σS2(t))=(1f¯(t))2σF2(t)+f¯(t)2σS2(t)N¯(t)2.

Data availability

Data and source code of stochastic simulations are available in https://github.com/schwarzg/artificial_selection_collective_composition (copy archived at Lee, 2025).

References

  1. Book
    1. Penn A
    (2003) Modelling artificial ecosystem selection: a preliminary investigation
    In: Banzhaf W, Ziegler J, Christaller T, Dittrich P, Kim JT, editors. Advances in Artificial Life. Berlin, Heidelberg: Springer. pp. 659–666.
    https://doi.org/10.1007/978-3-540-39432-7_71
  2. Conference
    1. Penn A
    2. Harvey I
    (2004) The role of non-genetic change in the heritability, variation, and response to selection of artificially selected ecosystems
    Artificial Life IX: Proceedings of the Ninth International Conference on the Simulation and Synthesis of Artificial Life.
    https://doi.org/10.7551/mitpress/1429.003.0059
    1. Phllip JR
    (1960) The Function Inverfc?
    Australian Journal of Physics 13:PH600013.
    https://doi.org/10.1071/PH600013
    1. Rainey PB
    (2023) Major evolutionary transitions in individuality between humans and AI
    Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences 378:20210408.
    https://doi.org/10.1098/rstb.2021.0408

Article and author information

Author details

  1. Juhee Lee

    1. Department of Physics, Inha University, Incheon, Republic of Korea
    2. Asia Pacific Center for Theoretical Physics, Pohang, Republic of Korea
    Present address
    Integrated Science Lab, Department of Physics, Umeå university, Umeå, Sweden
    Contribution
    Conceptualization, Formal analysis, Validation, Investigation, Visualization, Methodology, Writing – original draft, Writing – review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-3318-6377
  2. Wenying Shou

    Centre for Life’s Origins and Evolution, Department of Genetics, Evolution and Environment, University College London, London, United Kingdom
    Contribution
    Conceptualization, Supervision, Writing – original draft, Writing – review and editing
    For correspondence
    wenying.shou@gmail.com
    Competing interests
    Reviewing editor, eLife
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-5693-381X
  3. Hye Jin Park

    1. Department of Physics, Inha University, Incheon, Republic of Korea
    2. Asia Pacific Center for Theoretical Physics, Pohang, Republic of Korea
    Contribution
    Conceptualization, Supervision, Validation, Writing – original draft, Writing – review and editing
    For correspondence
    hyejin.park@inha.ac.kr
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-3552-6275

Funding

National Research Foundation of Korea (RS-2023-00214071)

  • Juhee Lee
  • Hye Jin Park

National Research Foundation of Korea (RS-2024-00460958)

  • Juhee Lee
  • Hye Jin Park

Asia Pacific Center for Theoretical Physics (JRG Program)

  • Juhee Lee
  • Hye Jin Park

Academy of Medical Sciences (Professorship)

  • Wenying Shou

Royal Society (Wolfson Fellowship)

  • Wenying Shou

Inha University (Research grant)

  • Hye Jin Park

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

J Lee and HJ Park were supported by the National Research Foundation of Korea grant funded by the Korean government (MSIT), Grant No. RS-2023–00214071 and RS-2024–00460958 and by an appointment to the JRG Program at the APCTP through the Science and Technology Promotion Fund and the Lottery Fund of the Korean Government. W Shou was supported by the Academy of Medical Sciences Professorship and a Royal Society Wolfson Fellowship. This was also supported by the Korean Local Governments–Gyeongsangbuk-do Province and Pohang City and INHA UNIVERSITY Research Grant. We thank Su-Chan Park, Li Xie, Alex Yuan, and Botond Major for constructive comments and discussions.

Version history

  1. Preprint posted:
  2. Sent for peer review:
  3. Reviewed Preprint version 1:
  4. Reviewed Preprint version 2:
  5. Version of Record published:

Cite all versions

You can cite all versions using the DOI https://doi.org/10.7554/eLife.97461. This DOI represents all versions, and will always resolve to the latest one.

Copyright

© 2024, Lee et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 713
    views
  • 27
    downloads
  • 0
    citations

Views, downloads and citations are aggregated across all versions of this paper published by eLife.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Juhee Lee
  2. Wenying Shou
  3. Hye Jin Park
(2025)
The success of artificial selection for collective composition hinges on initial and target values
eLife 13:RP97461.
https://doi.org/10.7554/eLife.97461.3

Share this article

https://doi.org/10.7554/eLife.97461