The success of artificial selection for collective composition hinges on initial and target values

  1. Juhee Lee
  2. Wenying Shou  Is a corresponding author
  3. Hye Jin Park  Is a corresponding author
  1. Department of Physics, Inha University, Republic of Korea
  2. Asia Pacific Center for Theoretical Physics, Republic of Korea
  3. Centre for Life’s Origins and Evolution, Department of Genetics, Evolution and Environment, University College London, United Kingdom
17 figures, 1 table and 1 additional file

Figures

Schematic for artificial selection on collectives.

Each selection cycle begins with a total of g Newborn collectives, each with N0 total cells of slow-growing S population (light gray dots) and fast-growing F population (dark gray dots). During maturation (over time τ), S and F cells divide at rates rS and rS+ω (ω>0), respectively, and S mutates to F at rate μ. During inter-collective selection, the Adult collective with F frequency f closest to the target composition f^ is chosen to reproduce g Newborns for the next cycle. Newborns are sampled from the chosen Adult (yellow star) with N0 cells per Newborn. The selection cycle is then repeated until the F frequency reaches a steady state, which may or may not be the target composition. To denote a variable x of i-th collective in cycle k at time t (0tτ), we use notation xk,t(i) where x{S,F,s, f}. Note that time t=0 is for Newborns and t=τ is for Adults.

Initial and target compositions determine the success of artificial selection on collectives.

(a–c) F frequency of the selected Adult collective (f) over cycles at different target f^ values (long dashed lines). f^ between fL and fH (orange dotted and solid line segments) is inaccessible where selection will fail. (a) A high target F frequency (e.g.f^=0.9>fH; magenta) can be achieved from any initial frequency (black dots). (b) An intermediate target frequency (e.g.fL<f^=0.5<fH; green) is never achievable, as all initial conditions converge to fH. (c) A low target frequency (e.g. f^=0.1<fL; dark blue) is achievable, but only from initial frequencies below fL. For initial frequencies at fL, stochastic outcomes (gray curves) are observed: while some replicates reached the target frequency, others reached fH. For parameters, we used S growth rate rS=0.5, F growth advantage ω=0.03, mutation rate μ=0.0001, maturation time τ4.8, and N0=1000. The number of collectives g=10. Each black line is averaged from independent 300 realizations. (d) Inter-collective selection opposes intra-collective selection. We plot probability density distributions of F frequency f during two consecutive cycles when selection is successful. Data correspond to cycles 31 and 32 from the second lowest initial point in c. Δf is the selection progress within a cycle (see Box 1). Black triangle: median. (e) Two accessible regions (gold). Either high f^ (f^>fH; region 2) or low f^ starting from low initial f (f^<fL and f¯1,0<fL; region 1) can be achieved. We theoretically predict (by numerically integrating Equation 1) fH (orange solid line) and fL (orange dotted line), which agree with simulation results (gold regions). (f) Example trajectories from initial compositions (black dots) to the target compositions (dashed lines). The gold areas indicate the region of initial frequencies where the target frequency can be achieved. (g) The tension between intra-collective selection and inter-collective selection creates a ‘waterfall’ phenomenon. See the main text for details.

Intra-collective selection and inter-collective selection jointly set the boundaries for selection success.

(a) The change in F frequency over one cycle. When fk is sufficiently low or high, inter-collective selection can lower the F frequency to below fk (Δf<0). The points where Δf=0 (in the orange line) are denoted as fL and fH, corresponding to the boundaries in Figure 2. (b) The distributions of frequency differences obtained by 1000 numerical simulations. The cyan, purple, and black box plots respectively indicate the changes in F frequency after intra-collective selection (the mean frequency among the 100 Adults minus the mean frequency among the 100 Newborns during maturation), after inter-collective selection (the frequency of the 1 selected Adult minus the mean frequency among the 100 Adults), and over one selection cycle (the frequency of the selected Adult of one cycle minus that of the previous cycle). The box ranges from 25% to 75% of the distribution, and the median is indicated by a line across the box. The upper and lower whiskers indicate maximum and minimum values of the distribution. ***p<0.001 in an unpaired t-test.

Expanding the region of success for artificial collective selection.

(a) Reducing the population size in Newborn N0 expands the region of success. In the gold area, the probability that fk+1 becomes smaller than fk in a cycle is more than 50%. We used g=10 and τ4.8. Figures 23 correspond to N˘0=1000 in this graph. Black dotted line indicates the critical Newborn size below which all target frequencies can be achieved. (b) Increasing the total number of collectives g also expands the region of success, although only slightly. We used a fixed Newborn size N0=1000. The maturation time τ=log(100)/rS9.2 is set to be long enough so that an Adult can generate at least 100 Newborns. (c) Increasing the maturation time shrinks the region of success. We used a fixed Newborn size N0=1000 and number of collectives g=10.

In higher dimensions, the success of artificial selection requires the entire evolutionary trajectory remaining in the accessible region.

(a) During collective maturation, a slow-growing population (S) (with growth rate rS; light gray) can mutate to a fast-growing population (F) (with growth rate rS+ω; medium gray), which can mutate further into a faster-growing population (FF) (with growth rate rS+2ω; dark gray). Here, the rates of both mutational steps are μ, and ω>0. (b) Evolutionary trajectories from various initial compositions (open circles) to various targets (filled triangles). Intra-collective evolution favors FF over F (vertical blue arrow) over S (horizontal blue arrow). The accessible regions are marked gold (see Appendix 1). We obtain final compositions starting from several initial compositions while aiming for different target compositions in i, ii, and iii. The evolutionary trajectories are shown in dots with color gradients from initial time (light grey) to final time (dark grey). (i) A target composition with a high FF frequency is always achievable. (ii) A target composition with intermediate FF frequency is never achievable. (iii) A target composition with low FF frequency is achievable only if starting from an appropriate initial composition such that the entire trajectory never meanders away from the accessible region. The figures are drawn using the mpltern package (Ikeda et al., 2019). (c) The accessible region in the three-population problem is interpreted as an extension of the two-population problem. First, the accessible region between FF and S+F is given, and then the S+F region is stretched into S and F.

Appendix 1—figure 1
Comparison between the calculated Gaussian distribution (‘Gauss,’ with the mean and variances computed from Equations 18; 19; 28; 33) and simulations using tau-leaping (‘tau’).

The simulations run 3000 times. The initial number of cells are (S0,F0)=(990,10),(500,500), and (10,990) for each column. The parameters r=0.5, ω=0.03, μ=0.0001, and τ=4.8 are used.

Appendix 1—figure 2
Congruence between consecutive sampling (MHG for multivariate hypergeometric distribution) and independent binomial (BN) sampling.

The initial number of cells are S=8000 and F=2000 for the left panel, and S=20 and F=5 for the right panel. 10,000 samples are drawn for each distribution. Here, a parent collective is divided into 10 collectives.

Appendix 1—figure 3
Trajectories of F frequency for 10 collectives (g=10) over time.

(a) The collective whose frequency is closest to the target value is selected in every cycle (black lines). The gray lines denote the other collectives. For parameters, we used S growth rate rS=0.5, F growth advantage ω=0.03, mutation rate μ=0.0001, maturation time τ4.8, and N0=1000. (b) Comparison between frequency trajectories with selection (the chosen one Adult producing all offspring; black) and without selection (each Adult producing one offspring; blue) clearly shows the effect of artificial selection. The black line indicates F frequency of the selected collective fk at each cycle in (a). The blue line indicates the average trajectory without selection fk+1 (the average of g=10 individual lineages without inter-collective selection at the end of each cycle).

Appendix 1—figure 4
Color map of the absolute error d=|ff^| averaged selected collectives at the end of simulations (k=1000) and the target frequency f^.

The solid and dashed lines are drawn by the arguments in the main text. For parameters, we used rS=0.5, ω=0.03, μ=0.0001, N0=1000, g=10 and τ4.8. The result is the average of 300 independent simulations. Compared to Figure 2e, this figure has a higher resolution.

Appendix 2—figure 1
The probability density functions of the selected Adult’s F frequency fk+1 subtracted by fk.

For simulations (blue), at each fk, we performed 1000 stochastic simulations. The orange distribution represents Equation 47 computed by numerical integration. The median values of the distributions are shown in Figure 3a in the main text.

Appendix 2—figure 2
Effect of experimental parameters in the distribuiton of Adult's F frequency.

(a) Mean (Equation 41) and variance (Equation 42) of f values of Adult collectives with respect to the Newborn frequency f0. (b) Scaling relation of F frequency variance (Equation 49) with Newborn collective size N0. The initial F frequency is 0.5. The parameters are rS=0.5, ω=0.03, μ=0.0001, and τ4.8. (c) Relation of F frequency variance (Equation 49) with maturation time τ. Other parameters are the same as b.

Appendix 2—figure 3
Median (orange) and mean (violet) have similar distributions.

We performed 1000 simulations to get probability density. (a) g=10, (b) g=100, and (c) g=1000. Initial F frequency is fk=0.5. The parameters are rS=0.5, ω=0.03, μ=0.0001 and τ=ln[1000]/rS.

Appendix 4—figure 1
Simulation with zero mutation rate.

Color map of the absolute error d=|ff^| between frequency f of the averaged selected collectives at the end of simulations (k=1000) and the target frequency f^. For parameters, we used rS=0.5, ω=0.03, μ=0, N0=1000, g=10, and τ4.8.

Appendix 5—figure 1
Change of success region in varying selective advantage ω.rs, ω=0.03, μ=0.0001, N0=1000, g=10, and τ4.8.
Appendix 6—figure 1
Artificial selection also works for deleterious mutation.

(a) Conditional probability density functions of fk+1fk for various fk values. The left-hand side distribution is obtained from simulations and the right-hand side distribution is numerically obtained by evaluating Equation 63. Small triangles inside indicate the median values of the distributions. (b) The median value of distributions at a given fk. The points where the shifted median becomes zero, Median[Ψ(fk+1fk|fk)]=0 are denoted as fL and fU, respectively. (c) The relative error between the target frequency f^ and the ensemble averaged selected frequency fk is measured after 1000 cycles starting from the initial frequency f¯1,0. Either the lower target frequencies or the higher target frequencies starting from the high initial frequencies can be achieved. The black dashed lines indicate the predicted boundary values fU and fL in a.

Appendix 7—figure 1
Selecting top 5% outperforms selecting top 1.

We bred 100 collectives and chose either top-1 collective (solid line) or top-5 collectives (dashed line) with f closest to the target value f^ (black dotted line).

Appendix 8—figure 1
Accessible regions in the three-population system.

(a) The flow of composition change in fast-growing (F) and faster-growing (FF) frequencies at each composition (f,h). Top corner indicates that FF cells fix in the collective. Right bottom corner means collectives with only F cells, while collectives contain S cells only at left bottom corner. Arrow length means the speed of change. (b) The accessible regions are marked by the gold area. If the signs of changes in both F frequency and FF frequency after inter-collective selection are opposite to those during maturation, then the given composition is accessible. Otherwise, the composition is not accessible and will change after cycles. Dashed lines are the boundary of the accessible region by projecting the collective into a two-population problem (FF vs. S+F). The figures are drawn using the mpltern package (Ikeda et al., 2019).

Tables

Table 1
Nomenclature.
VariablesRepresenting
SNumber of slower-growing (S) cells
FNumber of faster-growing (F) cells
NTotal cell numbers in a collective, N=S+F
sFrequency of S cells, s=S/(S+F)
fFrequency of F cells, f=F/(S+F)=1s
fF frequency of the selected collective in a cycle
ParametersRepresenting
rSGrowth rate of S
ω>0Growth rate advantage of F over S
μMutation rate from S to F
gTotal number of collectives
τMaturation time
N0Total number of cells in Newborn, or Newborn size
Target frequency in s or f.
fL,fHLow and High thresholds of inaccessible f^
RτFold-growth of S cells over time τ, Rτ=erSτ
WτFold ratio change of F cells over S cells over time τ, Wτ=eωτ

Additional files

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Juhee Lee
  2. Wenying Shou
  3. Hye Jin Park
(2025)
The success of artificial selection for collective composition hinges on initial and target values
eLife 13:RP97461.
https://doi.org/10.7554/eLife.97461.3