Schematic for artificial selection on collectives.

Each selection cycle begins with a total of g Newborn collectives (black open circles), each with N0 total cells of slow-growing type (S, red-colored dots) and fast-growing type (F, blue-colored dots). During maturation (over time τ), S and F cells divide at rates r and r + ω (ω > 0), respectively. S mutates to F at rate μ. In the selection stage, the Adult collective with F frequency f closest to the target composition is chosen to reproduce g Newborns for the next cycle. Newborns are sampled from the chosen Adult (yellow star) with N0 cells each Newborn. The selection cycle is then repeated until the F frequency reaches a steady state, which may or may not be the target composition. To denote a quantity x of i-th collective in cycle k at time t (0 ≤ tτ), we use notation where x ∈ {S, F, s, f }. Note that time t = 0 is for Newborns and t = τ is for Adults.

Initial and target compositions determine the success of artificial selection on collectives.

(a-c) Mutant frequency of the selected Adult collective (f ) over cycles. The target frequency is marked as a dotted line, while f H (black solid line) and f L (black dashed line) delineate boundary values that define the region of successful selection. a A high target F frequency ( ; magenta dotted line) can be achieved from any initial frequency (black dots). b An intermediate target frequency ( ; green dotted line) is never achievable, as all initial conditions converge to near f H . c A low target freqeuncy is acheiveable, but only from initial frequencies below f L. For initial frequencies at f L, stochastic outcomes (grey curves) are observed: while some replicates reached the target frequency, other reached f H. For parameters, we used wildtype growth rate r = 0.5, F growth advantage ω = 0.03, mutation rate μ = 0.0001, maturation time τ ≈ 4.8, and N0 = 1000. The number of collective g = 10. Each black line is averaged from independent 300 realizations. d Two accessible regions (gold). Either high (region 2) or low starting from low initial f (region 1) can be achieved. We theoretically predict (by numerically integrating Eq. 1) the boundaries of success regions, f H (black solid line) and f L (black dashed lines), which agree with simulation results (gold regions). e Example trajectories from initial compositions (black dots) to the target compositions (dashed lines). The gold areas indicate the region of initial frequencies where the target frequency can be achieved. f The tension between intra-collective selection and inter-collective selection creates a “waterfall” phenomenon. See the main text for details.

Intra-collective selection and inter-collective selection jointly set the boundaries for selection success.

a The change in F frequency over one cycle. When is sufficiently low or high, inter-collective selection can lower the F frequency to below . The points where Median are denoted as f L and f H, corresponding to the boundaries in Fig. 2. b The distributions of frequency differences obtained by 1000 numerical simulations. The cyan, purple, and black box plots indicate the changes in F frequency after intra-collective selection during maturation, after inter-collective selection, and over one selection cycle, respectively. The box ranges from 25% to 75% of the distribution, and the median is indicated by a line across the box. The upper and lower whiskers indicate maximum and minimum values of the distribution. ***: P < 0.001 in an unpaired t-test.

Expanding the success region for artificial collective selection.

a Reducing the population size in Newborn N0 expands the region of success. In gold area, the probability that becomes smaller than in a cycle is more than 50%. We used g = 10 and τ ≈ 4.8. Figures 2-3 correspond to N0 = 1000. b Increasing the total number of collectives g also expands the region of success, although only slightly. We used a fixed Newborn size N0 = 1000. The maturation time τ (τ = log(100)/r) is set to be long enough so that an Adult can generate at least 100 Newborns.

In higher dimensions, the success of artificial selection requires the entire evolutionary trajectory remaining in the accessible region.

a During collective maturation, a slow-growing type (S) (with growth rate r; dark red) can mutate to a fast-growing type (F) (with growth rate r + ω; blue), which can mutate further into a faster-growing type (FF) (with growth rate r + 2ω; purple). Here, the rates of both mutational steps are μ, and ω > 0. b Evolutionary trajectories from various initial compositions (open circles) to various targets. Intra-collective evolution favors FF over F (vertical blue arrow) over S (horizontal blue arrow). The accessible regions are marked gold (see Sec. 1 in Supplementary Information). We obtain final compositions starting from several initial compositions while aiming for different target compositions in i, ii, and iii. The evolutionary trajectories are shown in dots with color gradients from the initial to final time. (i) A target composition with a high FF frequency is always achievable. (ii) A target composition with intermediate FF frequency is never achievable. (iii) A target composition with low FF frequency is achievable only if starting from an appropriate initial composition such that the entire trajectory never meanders away from the accessible region. The figures are drawn using mpltern package (35).

Comparison between the calculated Gaussian distribution (“Gauss”, with the mean and variances computed from Eq. [7,8,15,19]) and simulations using tau-leaping (“tau”) and sampling (“samp”) methods. The simulations run 500 times. The initial number of cells are S0 = 200 and F0 = 800. The parameters are r = 0.5, ω = 0.03, μ = 0.0001, and τ = 4.8.

Comparison between consecutive sampling and independent binomial sampling. A parent collective is divided into 10 collectives. The histogram labeled with ‘MHG’ is the probability mass function of F of the fifth collective sampled via multivariate hypergeometric distribution. The independent binomial sampling is labeled with ‘BN’. The initial numbers of cells are S = 8000 and F = 2000 for the left panel, and S = 20 and F = 5 for the right panel. 10000 samples are drawn for each distribution.

Color map of the absolute error between frequency ⟨ f *⟩ of the averaged selected collectives at the end of simulations (k = 1000) and the target frequency . The dashed lines are drawn by the arguments in the main text. For parameters, we used r = 0.5, ω = 0.03, μ = 0.0001, N0 = 1000, g = 10 and τ ≈ 4.8.

a Trajectories of F frequency for 10 collectives (g = 10) over time. The collective whose frequency is closest to the target value is selected in every cycle (black lines). The gray lines denote the other collectives. For parameters, we used S growth rate r = 0.5, F growth advantage ω = 0.03, mutation rate μ = 0.0001, maturation time τ ≈ 4.8, and N0 = 1000. b Comparison between frequency trajectories with (black) and without (blue) selection clearly shows the effect of artificial selection. The black line indicates F frequency of the selected collective at each cycle in a. The blue line indicates the average trajectory without selection (the average of g = 10 individual lineages without collective-level selection at the end of each cycle).

The probability density functions of the selected collective’s frequency with the offset of f *. The blue distribution is obtained using 300 stochastic simulation results. The orange distribution represents Eq. [32] computed by numerical integration. The median values of the distributions are shown in Fig. S3a in the main text.

Scaling relation of F frequency variance (Eq. [27]) with Newborn collective size N0. The initial F frequency is 0.3. The parameters are r = 0.5, ω = 0.03, μ = 0.0001, and τ ≈ 4.8.

Color map of the absolute error between frequency ⟨f *⟩ of the averaged selected collectives at the end of simulations (k = 1000) and the target frequency . For parameters, we used r = 0.5, ω = 0.03, μ = 0, N0 = 1000, g = 10 and τ ≈ 4.8.

Comparison of selecting Top-tier 5 with Top 1. We breed 100 collectives and choose 5 collectives with the closest to the target value.

Artificial selection also works for deleterious mutation. a Conditional Probability density functions of for various values. The left-hand side distribution is obtained from simulations and the right-hand side distribution is numerically obtained by evaluating Eq. [43]. Small triangles inside indicate the median values of the distributions. b the median value of distributions at a given . The points where the shifted median becomes zero, Median are denoted as f L and f U, respectively. c The relative error between the target frequency and the ensemble averaged selected frequency is measured after 1000 cycles starting from the initial frequency . Either the lower target frequencies or the higher target frequencies starting from the high initial frequencies can be achieved. The black dashed lines indicate the predicted boundary values f U and f L in a.

a The schematic procedure to determine accessible regions by collective selection. From the given parent composition (blue dot), the probability distribution of offspring Adults (Eq. [68]) is computed (marked in the orange-colored area). From Adult compositions, the probability distribution of the selected collective (Eq.[79]) is computed (marked in the green area). If the signs of the changes in both F frequency and FF frequency after the selection (from blue dot to green dot) are opposite to that of maturation (from blue dot to orange dot), the given composition is accessible. Otherwise, the composition is not accessible and will change after cycles. b The accessible regions are marked by the gray area. The vector field is the flow of compositions during maturation. The length and color of the arrows indicate the speed of composition changes. The figures are drawn using mpltern package (7).