Cellular organization in labevolved and extant multicellular species obeys a maximum entropy law
Abstract
The prevalence of multicellular organisms is due in part to their ability to form complex structures. How cells pack in these structures is a fundamental biophysical issue, underlying their functional properties. However, much remains unknown about how cell packing geometries arise, and how they are affected by random noise during growth  especially absent developmental programs. Here, we quantify the statistics of cellular neighborhoods of two different multicellular eukaryotes: labevolved ‘snowflake’ yeast and the green alga Volvox carteri. We find that despite large differences in cellular organization, the free space associated with individual cells in both organisms closely fits a modified gamma distribution, consistent with maximum entropy predictions originally developed for granular materials. This ‘entropic’ cellular packing ensures a degree of predictability despite noise, facilitating parentoffspring fidelity even in the absence of developmental regulation. Together with simulations of diverse growth morphologies, these results suggest that gammadistributed cell neighborhood sizes are a general feature of multicellularity, arising from conserved statistics of cellular packing.
Editor's evaluation
This work uncovers a simple but farreaching statistical principle that describes the geometry of cell packing in snowflake yeast and green algae. It draws on ideas from granular physics to offer new insight into universal rules of multicellular geometry that are otherwise easily obscured by the cellscale idiosyncrasies of the different biological systems.
https://doi.org/10.7554/eLife.72707.sa0Introduction
The evolution of multicellularity was transformative for life on Earth, occurring in at least 25 separate lineages (Grosberg and Strathmann, 2007). The success of multicellular organisms is due in part to their ability to assemble cells into complex, functional arrangements. Selfassembly, however, is fundamentally subject to random noise (Zeravcic and Brenner, 2014; SzavitsNossan et al., 2014; Damavandi and Lubensky, 2019) that affects the final emergent structure (Michel and Yunker, 2019). The physiology of multicellular organisms can depend sensitively on the geometry of cellular packing (Bi et al., 2015b; Drescher et al., 2016; Jacobeen et al., 2018b; Brunet et al., 2019; Schmideder et al., 2021), and such noise may therefore have direct consequences on organismal fitness. Understanding the evolution of multicellularity, and the subsequent evolution of multicellular complexity (Bell and Mooers, 1997), requires understanding the impact of random noise on multicellular selfassembly. How do organisms accurately assemble functional multicellular components in the presence of noise?
Recent work has shown that extant multicellular organisms can either suppress (Hong et al., 2016) or leverage (Haas et al., 2018) variability in the process of reliably generating structures, and their tissues can change function based on cellular packing geometry (Bi et al., 2015a). This occurs through a coordinated developmental process involving genetic (Davidson, 2001), chemical (Sampathkumar, 2020), mechanical (Deneke and Di Talia, 2018), and bioelectric (Levin, 2004) feedbacks between interacting cells. However, even with coordinated developmental processes, noise during selfassembly results in deviations from perfectly regular structures. Further, as these developmental processes have not yet evolved in nascent multicellular organisms, it is unclear how unregulated assembly can reliably result in reproducible packing geometries and multicellular structures.
Multicellular organisms also exhibit diverse growth morphologies; for example, cells can remain attached through incomplete cytokinesis (Bonner, 1998; Grosberg and Strathmann, 2007; Knoll, 2011), they can adhere through aggregative bonds (Claessen et al., 2014), and they can assemble multicellular groups through successive cell division within a confining membrane (Angert, 2005; Herron et al., 2019). These growth morphologies can have distinct intercellular connection topologies (Yanni et al., 2020), changing how randomness is manifested. For instance, groups that grow with persistent motherdaughter bonds maintain the same intercellular connections, ‘freezing’ in place any structural randomness that arises during reproduction. In contrast, cells in aggregates can rearrange, so their final structure emerges from a combination of reproduction and intercellular interactions and noise (Delarue et al., 2016; Hartmann et al., 2019). Further, the dimensionality of multicellular groups can vary, from filaments (Herrero et al., 2016) and quasitwodimensional sheets (Besson and Dumais, 2011; Brunet et al., 2019) to groups that grow equally in three dimensions (Ratcliff et al., 2012; Tang et al., 2020; Butterfield, 2016). While the impact of noise on systems in thermal equilibrium is well known to depend sensitively on spatial dimensionality (Mermin and Wagner, 1966; Hohenberg, 1967; Vivek et al., 2017), no such information is yet at hand for biological development, which is intrinsically out of equilibrium. The growth morphology, connection topology, and dimensionality therefore altogether determine a multicellular architecture. Randomness resulting from many sources, such as stochastic cell division, variability in cell growth, intercellular interactions, and more, subsequently occurs as perturbations to this idealized form. It would appear that noise manifests in a unique, contextdependent manner in each of these different multicellular systems.
Here, we provide experimental evidence that, rather than being contextdependent, fluctuations in cell packing geometry instead follow a universal distribution, independent of the presence or absence of developmental regulation. We quantify the distributions of cellular space in two different types of organisms: experimentally evolved multicellular yeast (Ratcliff et al., 2012) and wildtype multicellular green algae (Goldstein, 2015). In both cases, maximum entropy considerations (Aste and Di Matteo, 2008) (see inset box) accurately predict the cell packing distribution. Building on these observations, we use computational models of diverse prescribed growth rules, mimicking extant biological morphologies, to show that cells are ubiquitously packed according to the maximum entropy principle. Detailed analysis of the case of green algae shows that correlations, that is, the lack of structural randomness, produce deviations from maximum entropy predictions, but that even a relatively small amount of randomness is sufficient to generate cellular packings that largely follow maximum entropy predictions. Next, we explore the evolutionary consequences of cell packing. We use the cell packing distribution to predict the distribution of snowflake yeast group sizes, an emergent multicellular trait that arises from cell crowding (Jacobeen et al., 2018b). Then, we use a theoretical analysis to show that the effects of fluctuations in intercellular space on the motility of green algae are small. These findings together suggest that, rather than impeding innovation, fluctuations in cell packing are highly repeatable, and may play a fundamental role in the origin and subsequent evolution of multicellular organisms.
Results
Maximum entropy
Within statistical physics, the maximum entropy principle relates randomness in lowlevel units (e.g. cells) to the properties of the assembly (e.g. a multicellular group). It works by enumerating all lowlevel configurations that conform to a set of constraints. Any particular grouplevel property can be generated by many different lowlevel configurations, but some grouplevel properties may correspond to more lowlevel configurations than others. Those that are generated by many configurations are more likely to be observed than those that correspond to relatively few configurations; in this way, the maximum entropy principle allows one to calculate the probability of observing different group properties, given a set of constraints. Multicellular groups obey a simple but universal constraint: each group has some total volume, $V$. This volume can be divided into $N$ pieces, where $N$ is the total number of cells. Each piece is associated with a particular cell, and the $N$ pieces must sum to the total volume of the group, $V={\sum}_{i}{v}_{i}$, for $i=1,2,\mathrm{\dots},N$. Using this constraint, and assuming no correlations, one can predict the most likely distribution of volumes for the $N$ pieces. This approach has been successfully used to predict the distribution of free volumes within granular materials and foams (Aste and Di Matteo, 2008; Katgert and van Hecke, 2010). Here, we use it to predict the distribution of cellular free volumes in the absence of spatial correlations in cell positions.
Consider the ensemble of all possible cellular configurations in a simple group. As first derived by Aste and Di Matteo, 2008 and Aste et al., 2007 for granular materials, the maximum entropy probability distribution $p(v)$ of cell neighborhood volumes within $V$ is the modified gamma distribution
where $\overline{v}$ is the mean cell neighborhood volume, v_{c} is the minimum cell neighborhood volume, $\mathrm{\Gamma}(k)$ is the gamma function, and $k\equiv {(\overline{v}{v}_{c})}^{2}/{\sigma}_{v}^{2}$ is a shape parameter that is defined by v_{c}, $\overline{v}$, and the variance of the cell neighborhood volumes, ${\sigma}_{v}^{2}$. This distribution is expected if cell neighborhood volumes are determined independently of each other (while still conforming to the total volume constraint). In other words, volumes must be set randomly; correlations between the size of separate volumes will lead to deviations from maximum entropy predictions. If this condition holds, then maximum entropy volume distribution predictions should be valid, regardless of other geometric or structural details. For example, maximum entropy statistics hold in granular materials, despite the fact that they must obey strict force and torque balance conditions (Aste and Di Matteo, 2008; Snoeijer et al., 2004; Bi et al., 2015a). Further, the same approach applies to groups with a constraint on total area or length; this does not change the result, and $V$ can be replaced by $A$ or $L$ without other modifications. However, we would not expect this prediction to necessarily hold for confluent cell layers (i.e. packing fraction $\varphi \approx 1$), since in general cell size is a highly regulated process; when cells tile space, the cell neighborhood volume distribution will strongly correlate with the the cell size distribution.
In practice, we divide the total group volume or area into $N$ pieces via a Voronoi tessellation. The size of the space associated with cell $i$ includes the cell itself and the portion of intercellular space closer to its center than to the center of any other cell. As cells must have nonzero size, we therefore set v_{c} to be the volume of a single cell without any intercellular space (or a_{c}, the area of a single cell).
Experimental tests of multicellular maximum entropy predictions
To test whether different kinds of multicellular groups pack their cells according to the maximum entropy principle, we investigated cell packing in two different multicellular organisms. First, we used experimentally evolved ‘snowflake’ yeast (Ratcliff et al., 2012), a model system of undifferentiated multicellularity. Second, we used the green microalga Volvox carteri, a member of the volvocine algae that first evolved multicellularity in the Triassic (Starr, 1969; Herron et al., 2009).
Snowflake yeast
Snowflake yeast grow via incomplete cytokinesis, generating branched structures in which motherdaughter cells remain attached by permanently bonded cell walls (Figure 1A). New buds appear on ellipsoidal cells at a polar angle $\u27e8\theta \u27e9=42\mathrm{\%}\pm 23\mathrm{\%}$ and azimuthal angle $\varphi $ that is randomly distributed ($\u27e8\varphi \u27e9=180\mathrm{\%}\pm 104\mathrm{\%}$, Figure 1—figure supplement 1). Therefore, even though snowflake yeast structure is limited to configurations in which daughter cells remain attached at their base to their mother, these new cells bud in random orientations. Each snowflake yeast cluster has a different specific configuration of cells; the ensemble of snowflake clusters will include many different configurations. We expect that this structural randomness produces predictable distributions of cellular neighborhood volumes. Conversely, if there are strong correlations in the locations of daughter cells, then we will observe deviations from maximum entropy predictions regarding the cell neighborhood volumes.
To determine the distribution of cell neighborhood volumes, we first must measure the position of every cell in a cluster. It is difficult to image individual cells within snowflake yeast clusters due to excessive light scattering. Instead, we used a serial block face scanning electron microscope equipped with a microtome to scan and shave thin ($50\mathrm{nm}$) layers off a resin block with embedded yeast clusters with stained cytoplasms. This process allowed us to determine the 3D structure of $N=20$ snowflake yeast clusters and locate cell centers with nanometer precision.
We define the group volume as the smallest convex hull that surrounds all cells in the cluster and computed the 3D Voronoi tessellation of cell centers within that (Figure 1A). The distribution of cellular Voronoi volumes closely matched the predicted kgamma distribution (Figure 1C, $k=2.88$). This agreement is quantified via ‘PP plots’ of the empirical cumulative distribution function (CDF) plotted against the predicted kgamma CDF. We find a rootmeansquare residual ${r}_{RMS}=\sqrt{\u27e8{(F(v){F}_{i})}^{2}\u27e9}=0.02$, where ${F}_{i}$ is the empirical CDF and $F(v)$ is the predicted kgamma CDF.
The influence of the convex hull on these results was investigated by using an alternative procedure in which the Voronoi volumes were binned into shells centered at the cluster’s center of mass (Figure 1C and E). We binned cells into shells with shell edges of $[0,6.2]$, $[6.2,9.7]$, and $[9.7,20.4]$ away from the center of mass. We found that the distribution of Voronoi volumes within each shell matched the predicted kgamma distribution, with ${r}_{RMS}=\{0.037,0.020,0.014\}$, $k=\{3.45,3.08,4.63\}$ in the shells shown in Figure 1C(iiiv).
Volvocine algae
To test if cell neighborhood volumes in extant multicellular organisms are consistent with maximum entropy cell packing predictions, we examined cell packing within the green microalgae Volvox carteri. Development in V. carteri, which evolved over millions of years, is highly regulated, occurring through a stereotyped morphological progression (Kirk, 2005). V. carteri embryos arise as a spherical cellular monolayer from palintomic cell divisions with incomplete cytokinesis, which leaves the cells attached via cytoplasmic bridges. These bridges disappear when ECM is secreted by the cells, filling the entire sphere, and eventually moving the cells apart. The approximately 1000 somatic cells remain embedded on the surface of a translucent sphere of extracellular matrix (Figure 1B). While sixfold coordination is the most frequent local arrangement of somatic cells, the fact that the cells are embedded in a surface with spherical topology requires there to be ‘defects’ with differing coordination number (e.g. 5,7), and these are found interspersed around the spheroid. Thus, despite their developmental regulation, somatic cells exhibit a degree of disorder with respect to coordination number. From a physics perspective, the local hexatic order in the somatic cell arrangement is low (see Materials and methods).
To determine the distribution of Volvox cell neighborhood sizes, we imaged somatic cells using their chlorophyll autofluorescence in a light sheet microscope. Since the somatic cells are arranged around a surface embedded in 3D space, we constructed a 2D Voronoi tessellation of somatic cells on the surface. Each organism imaged had a different size, and therefore had a different mean Voronoi area $\u27e8A\u27e9$. To compare distributions across organisms, we removed the systematic area differences by recording the solid angle ${\mathrm{\Omega}}_{i}=4\pi {A}_{i}/S$ subtended by each somatic cell, where $S={\sum}_{i}{A}_{i}$ is the total surface area of the organism. We found that the kgamma distribution largely matched the distribution of solid angles (Figure 1D, $k=2.40$, ${r}_{RMS}=0.04$). However, there are systematic deviations between the data and maximum entropy predictions (Figure 1F).
We next investigated if maximum entropy predictions are more accurate within subregions with similar mean solid angles; specifically we examine regions whose mean is $\u27e8\mathrm{\Omega}\u27e9=0.0185\pm 0.0003$, obtained across six organisms. The distribution of Voronoi solid angles within these subregions closely follows the kgamma distribution (Figure 1F, $k=10.66$, ${r}_{RMS}=0.01$). This observation suggests that while there are systematically correlated subregions of cells, within these subregions cells are largely arranged randomly. Thus, the organization of Volvox carteri somatic cells is consistent with maximum entropy predictions.
Simulations of different growth morphologies
We next used simulations to investigate the impact on cell packing of four different growth morphologies: growth via incomplete cell division (cf. snowflake yeast), cells distributed on a spherical surface (cf. Volvox), aggregation, and palintomy. The goal of these studies was to determine if morphological details and constraints impact entropic packing using simplified models that capture the essential features of the growth and behavior of these varied organisms. These simulations captured a couple basic properties of multicellularity: organisms may grow in both two and three dimensions, and they may assemble with two different classes of bonds: bonds that are reformable, and bonds that are not reformable. Nonreformable, or ‘fixed’ bonds (like those formed from incomplete cytokinesis) are common in plants, algaes, and fungi; on the other hand, reformable bonds (such as those formed via sticky proteins) are common in animal tissues and cellular aggregates, like biofilms and slime molds. Finally, palintomy, or growth confined inside a maternal membrane, is also common from algae to animals.
First, we performed geometric simulations of multicellular groups that grow via incomplete cell division; these simulations were inspired by previous simulations of snowflake yeast (Jacobeen et al., 2018a; Jacobeen et al., 2018b). Daughter cells bud from mother cells with experimentally determined polar angle and random azimuthal angle, and remain attached to mother cells with rigid bonds. We ran 9, 100 simulations starting from a single cell, each of which underwent seven generations of division, and calculated the Voronoi tessellation of the final structure from each simulation. The distribution of Voronoi volumes closely matched the kgamma distribution across four orders of magnitude (Figure 2A, $k=2.26$, ${r}_{RMS}=0.007$).
Inspired by Volvox, we simulated cells distributed across the twodimensional surface of a sphere through a random Poisson point process. We completed 10 simulations, each with 1000 cells, and computed the distribution of solid angles subtended by Voronoi cells. As shown in Figure 2B, the distribution of Voronoi solid angles is consistent with maximum entropy predictions ($k=9.29$, ${r}_{RMS}=0.009$).
Next, we simulated organisms that stick together via reformable cellcell adhesions, a mechanism of group formation that is common in biofilms and extant aggregative multicellular life (Claessen et al., 2014) (i.e. Dictylostelium and Myxococcus; Figure 2C). In these simulations, multicellular aggregates were grown from a single cell. Seven generations of cell division occurred, in which new cells appear on the surface of existing cells at random positions, and steric interactions force cells to separate after division and occupy space. Aggregative bonds were modeled through harmonic interactions of the cell centers. The observed Voronoi volume distributions were consistent with maximum entropy predictions ($k=7.84$ and ${r}_{RMS}=0.007$).
Finally, we modeled cells undergoing palintomic division within a maternal cell wall, as is common in green algae (Lürling and Van Donk, 1997; Boraas et al., 1998; Ratcliff et al., 2013; Fisher et al., 2016; Herron et al., 2019), and is reminiscent of both baeocyte production in Stanieria bacteria (Angert, 2005) and neoproterozoic fossils of early multicellularity (Xiao et al., 1998; Figure 2D). The details of these simulations remained similar to the simulations of aggregative multicellularity, with the important difference being that instead of harmonic interactions between cell centers enforcing groups to stay together, cells interacted with a spherical maternal wall acting as a corral. The Voronoi volume distributions for these simulations were also consistent with maximum entropy predictions ($k=15.16$ and ${r}_{RMS}=0.013$).
Taken together, the results of these simulations suggest that a broad distribution of cell neighborhood sizes is a general feature of multicellular growth morphologies. In particular, when cell locations are random under these rules, cell neighborhood size distributions closely follow the kgamma distribution.
The role of spatial correlations
While we have shown that the distribution of cell neighborhood volumes closely follows the kgamma distribution in two very different organisms, we have also seen that in some cases maximum entropy predictions are more accurate in subsections of an organism than across its entirety. For instance, in Volvox we observed that ${r}_{RMS}$ is much smaller within subregions with similar mean solid angles than across the whole organism. This observation suggests that correlations exist in the arrangement of V. carteri somatic cells. Such a correlation is not wholly unexpected; V. carteri have a wellknown anteriorposterior polarity which they use to orient and swim toward light sources (Drescher et al., 2010b). This polarity may affect the somatic cell packing distribution in different subregions of the organism, leading to deviations from maximum entropy predictions.
The spatial correlations in the cellular areas in Volvox were studied first by plotting a 3D heatmap of Voronoi solid angle sizes (Figure 3A). It is apparent that extended spatial regions have welldefined and nonrandom mean Voronoi solid angles. We quantified this feature by calculating the spatial correlation function $C(Q)$ of the solid angle
where ${Y}_{Q}=J{(Q)}^{1}{\sum}_{j}({\mathrm{\Omega}}_{j}\u27e8\mathrm{\Omega}\u27e9)$ is the average deviation of the solid angle of a given polygon’s neighbors at a neighbor distance $Q$ from the mean. Here, the number of neighbors is $J(Q)$, a function of $Q$, which enumerates the network distance from the polygon of interest (i.e. $Q=1$ calls the nearest neighbors, of which there are $J(1)$, $Q=2$ calls the next nearest neighbors, of which there are $J(2)$, and so on). The standard deviation of the solid angle across the population is ${\sigma}_{\mathrm{\Omega}}$, and ${\sigma}_{{Y}_{Q}}$ is the standard deviation of ${Y}_{Q}$ across the population. We find that Volvox Voronoi solid angles are positively correlated at distances as large as $Q=10$ (Figure 3A). This analysis suggests that there are systematic differences in Volvox group structure in different spatial regions. We therefore should expect to observe deviations from the kgamma distribution, which was derived under the assumption that there are no correlations in the division of space among cells.
A natural question is whether maximum entropy predictions are more accurate within correlated subregions of an organism. We measured the Voronoi distribution in subregions with similar mean solid angles across six organisms and, for each subregion, a central node and its neighbors up to Q_{0} were identified. We varied Q_{0} from 3 (corresponding to, on average, $38\pm 2$ cells in the subregion) to ${Q}_{0}=20$ ($928\pm 10$ cells on average in the subregion) to measure the Voronoi solid angle distributions in subregions of different sizes. Spatial correlations were weaker within smaller domains (Figure 3A), and deviations from maximum entropy predictions smaller as well (Figure 3B) with the minimum ${r}_{RMS}$ at ${Q}_{0}=6$ (Figure 3C, average of $133\pm 5$ cells). These observations suggest that while there are systematic correlations between subregions, cell neighborhood sizes are largely randomly distributed within subregions.
The crucial role of randomness
How much randomness is necessary for the kgamma distribution to predict cell neighborhood size distributions? Our analysis of the solid angle distribution of Volvox cells demonstrates that maximum entropy principle predictions are relatively accurate (${r}_{RMS}=0.04$) even in the presence of some spatial correlations. However, it is unclear how much randomness is necessary for the kgamma distribution to be predictive. Developmental processes are known to incur random effects (Hong et al., 2016; Haas et al., 2018), but, at the same time, fluctuations must be sufficiently suppressed for complex structures to assemble. This dilemma is augmented because there are few methods for measuring the degree of randomness in developmental processes. Therefore, it is difficult to disentangle exactly which reproducible qualities and traits stem from regulated processes, and which ones stem from random processes. In this section, we address this gap by assessing the stability of maximum entropy packing distributions to correlative perturbations (Figure 4). We do so by simulating three different sources of correlations: (i) size polydispersity, (ii) precisely defined growth patterns, and (iii) coordinated cellular apoptosis, each of which is a common multicellular developmental process. In each scenario, we varied the relative strength of correlations and noise, and monitored how closely the cell neighborhood size distributions agreed with the kgamma distribution via PP plots and the ${r}_{RMS}$. We do not attempt to exactly recapitulate naturallyoccurring levels of randomness strength; rather, we investigate the limits of how well maximum entropy predictions can measure the impact of correlations.
The impact of heritable size polydispersity was investigated by simulating aggregative groups consisting of large and small cells. All simulations were seeded with one small cell and one large cell. We then varied the probability $\xi $ that a new cell is the same size as its mother from 0.5 to 1.0. When $\xi =1$, cells always produce offspring with the same radius; for $\xi =0.5$ it is equally likely that a small cell produces a small or large daughter (and vice versa for large cells). Therefore, groups with $\xi =1$ have correlated regions of cell size, but the degree of correlations decreases with decreasing $\xi $. While groups with $\xi =1$ deviate significantly from the kgamma distribution (${r}_{RMS}=0.09$), we observed that even a small amount of randomness results in excellent agreement between simulated groups and the kgamma distribution (in order from $\xi =1$ to $\xi =0.5$, ${r}_{RMS}=\{0.09,0.04,0.03,0.03,0.02\}$).
Next we investigated groups with varying amounts of noise on top of defined growth patterns. In these simulations, new cells bud in precise positions; the first daughter at the position ($\theta =0\pm \eta $, $\varphi =0\pm \eta $ in spherical coordinates), the second at $\theta =90\pm \eta $, $\varphi =0\pm \eta $, the third at $\theta =90\pm \eta $, $\varphi =180\pm \eta $, etc., where the noise is uniformly distributed with zero mean and width $\eta $. For $\eta =0$ (no noise), the distribution of Voronoi volumes was discontinuous, since cells could only access a finite number of local configurations. As expected, as $\eta $ increases ($\eta =\{0,5,30,60,90\}$), ${r}_{RMS}$ decreases (${r}_{RMS}=\{0.07,0.05,0.02,0.02,0.02\}$).
Finally, we investigated groups with localized and random cell death. In these simulations, 50 cells were confined to the surface of a sphere of unit radius following the protocol described above. One cell is randomly selected to die. Centered at this cell, a spherical region of radius $R$ is defined, and then 10 cells in this region were randomly selected to die (and disappear, thereby not contributing to the Voronoi tessellation). For small $R$, cell death is highly localized, and thus spatially correlated. As $R$ increases, cell death events become less localized, and therefore more random. We find that highly correlated cell death resulted in large deviations from maximum entropy predictions. Conversely, as $R$ increases dead cells become less localized, the observed distribution becomes more accurately described by the kgamma distribution; as $R$ increases from $R=\{0.75,1,1.25,1.5,1.75,2\}$, we find ${r}_{RMS}=\{0.10,0.07,0.03,0.02,0.01\}$.
In summary, absent randomness, spatial correlations lead to large deviations from the kgamma distribution. Yet, with even a small amount of randomness, the kgamma distribution holds significant predictive power. These simulations suggest that maximum entropy predictions are likely to be robust against even moderate correlations.
Parentoffspring fidelity via maximum entropy packing
So far, we have shown that randomness in cellular packing leads to highly predictable packing statistics. Here we show that maximum entropy statistics can directly impact the emergence of a highly heritable multicellular trait, organism size.
Prior work has shown that the size of snowflake yeast at fragmentation is remarkably heritable – higher, in fact, than the traits of most clonally reproducing animals (Ratcliff et al., 2015). The size to which snowflake yeast grow depends strongly on the aspect ratio of its constituent cells; more elongated cells allow the growth of larger clusters before strain from cellular packing causes group fragmentation (Jacobeen et al., 2018a; Jacobeen et al., 2018b). Recently, experiments with engineered yeast showed that this emergent multicellular trait, group size, was in fact more heritable than the underlying cellular trait upon which it was based (cellular aspect ratio), despite the fact that the mutations engineered in this system only affected cellular aspect ratio directly (ZamaniDahaj et al., 2021). Simulations of multicellular chemotaxis observed a similar effect (Colizzi et al., 2020). While at first glance this may seem surprising, we show below that the high heritability of snowflake yeast group size arises from the direct dependence of size on the robust maximum entropy distribution of volume within groups.
Before addressing how fracture impacts the distribution of cluster sizes by impacting the number of cells within a group, we first must address fluctuations in size among clusters with the same number of cells. Given a number of cells $N$ in the cluster, variation in cell packing fraction results in variation of the total volume. The arguments given above for predicting the distribution of individual cell volumes also applies to the distribution of total volume (Aste and Di Matteo, 2008); the distribution of total volume for clusters with the same number of cells should follow the kgamma distribution. To generate enough clusters with identical $N$ to test this prediction, we used simulations. We generated 3000 snowflake yeast clusters, each with 100 cells, and measured their total volumes. The distribution of volumes is consistent (${r}_{RMS}=0.0043$, $k=23.0$) with the kgamma distribution as shown in Figure 5A. Further, these fluctuations in size are small compared to the differences in size gained via reproduction of cells or lost via fracture.
To predict the group size distribution, we consider the probability of fragmentation via a weakestlink model of fracture. As the location of new cells is random (see Figure 1—figure supplement 1), each new cell has a chance of causing intercellular bond fracture. It was previously observed that bonds only break if cells are highly confined, that is they have smaller Voronoi volumes; otherwise flexible cellular branches simply bend (Jacobeen et al., 2018b). We model fracture as occurring when a cell’s Voronoi volume is below a critical value denoted by ${v}^{*}$ (Figure 5B) such that its motion is completely restricted. We measure ${v}^{*}$ from simulations that determine the maximum local packing density for groups with same cell size and shape distributions as seen in experiments (see Materials and methods). The probability that a particular cell is confined to a Voronoi volume $v\le {v}^{*}$ is the integral
As each cell in a cluster of $N$ cells independently has probability ${p}^{*}$ of having $v\le {v}^{*}$ (and thus causing fracture), the probability of a cluster with $N$ cells not fragmenting is
As we do not model the fate of products of fragmentation (i.e. the size of the separate pieces postfracture), we expect the weakest link model to be more accurate for larger clusters than it is for smaller clusters.
We measured group size for approximately $10,000$ snowflake clusters, all descendants of a single isolate, using a particle multisizer, and found strong agreement between the experimentally observed cluster size distribution and the weakestlink prediction (the coefficient of determination is ${r}^{2}=0.97$ for $\mathrm{log}(Counts)$ vs $N$) (Figure 5C). Hence, the predictable statistics of entropic cell packing guides the distribution of group size among offspring of a single isolate.
For context, we compared the distribution of group size in snowflake yeast to that of flocculating yeast, which forms multicellular groups via aggregation. The multicellular size of flocculating yeast depends on the rate of collisions with other cells and groups of cells. The growth rate of aggregates is thus typically proportional to their size, as larger aggregates are more likely to contact more cells (Pentz et al., 2020). In fact, the maximum size of a flocculating yeast aggregate is bounded by the duration of aggregation, an extrinsic parameter, while the minimum size can be a single cell (Stratford, 1992). Using data from Pentz et al., 2020, we compared the group size distributions of snowflake yeast and flocculating yeast grown in the same environmental conditions. We find that flocculating yeast groups exhibit a much larger coefficient of variation in size compared to snowflake yeast groups (Pentz et al., 2020; Figure 5D). These results demonstrate that randomly assembled groups can exhibit more reproducible group traits than groups assembled with correlations.
Multicellular motility is robust to cellular area heterogeneity
One of the issues arising from the existence of the broad distribution of somatic cell areas in Volvox is the extent to which colony motility is affected by that heterogeneity. Each of the somatic cells at the surface of a Volvox colony has two flagella that beat at $\sim 30$ Hz, in planes that are primarily oriented in the anteriorposterior (AP) direction but with a slight lateral tilt that makes each colony spin around its AP axis. A longstanding focus in biological fluid mechanics of multicellular flagellates has been to understand the connection between the beating of the carpet of flagella that cover their surface and their selfpropulsion. Measurements of the flow fields around micropipetteheld (Short et al., 2006) and freelyswimming colonies (Drescher et al., 2010a) have shown that despite the discreteness of the flagella, the flow is remarkably smooth, albeit often displaying metachronal waves (Brumley et al., 2015), longwavelength phase modulations of the beating pattern. However, the effect of heterogeneity in cell area on motility is as yet unknown; naively, we might expect that heterogeneities in flagella packing may lower the motility of the organism in comparison to a highly regular arrangement. Here, we show in fact that Volvox motility is largely unaffected by fluctuations in cell area, indicating that area heterogeneity may not be detrimental to the organism’s swimming fitness. The absence of such a barrier may represent a scenario where trait optimization can be achieved without first evolving sophisticated developmental genes.
A heuristic explanation for the smoothness of the flows can be developed by noting first that the flow arising from each flagellum, beating close to the noslip surface of the colony, will fall off only as an inverse power of distance $r$ from the flagellum. Thus, the superposition of the flows from many flagella will be sensitive to contributions from distant neighbors (Boselli et al., 2021) and will tend to wash out local variations in flagellar actuation. This argument can be made quantitative using two different models for the motility of such flagellates. The first is the ‘squirmer’ model (Lighthill, 1952), in which the flagellate is characterized by a tangential ‘slip’ velocity $u(\theta )$ on the surface, which can be thought of as corresponding to the mean motion of the flagella tips. Here, $\theta \in [0,\pi ]$ is the polar angle with respect to the AP axis. In this approach the details of the fluid velocity profile below the tips are not resolved, and in particular the noslip condition at the surface of the ECM is ignored. In the second approach (Ishikawa et al., 2020), which builds on earlier work (Short et al., 2006) that specified a force density at the colony surface instead of a slip velocity, there is a specified force density applied at some small distance above the noslip colony surface, and the flow field below that locus is resolved. This approach, termed the ‘shear stress, noslip’ model, captures the very large viscous dissipation that occurs in the region between the ECM and the locus of forcing. Within either of these two approaches above the effects of area inhomogeneities can be investigated by coarsegraining the flagella dynamics; either the local slip velocity $u(\theta )$ or the local tangential force density $f(\theta )$ has noise.
In the squirmer model, the swimming speed $U$ is (Lauga, 2020)
where
${P}_{n}$ is the Legendre polynomial, and the prime indicates differentiation with respect to its argument. If we represent the effects of area inhomogeneities as noise in the slip velocity, then it is most natural to use ${V}_{n}$ as the basis functions for the tangential slip velocity, expressed as
where ${V}_{n}(0)={V}_{n}(\pi )=0$, guaranteeing that the slip velocity vanishes at the anterior and posterior poles (Short et al., 2006). Accurate experimental measurements of the azimuthal velocity field of Volvox (Drescher et al., 2010b) show that it is wellcaptured by that lowest mode, leading to a modest anteriorposterior asymmetry. From the orthogonality relation for the ${V}_{n}$,
we see immediately that the contributions from all modes $n>1$ vanish identically, and thus the swimming speed is given identically by the amplitude of the lowest mode ${V}_{1}(\theta )=\mathrm{sin}\theta $,
Thus, within the squirmer model, motility is essentially insensitive to area inhomogeneities. This result does not preclude effects of those higher modes, only that such effects will be on quantities other than the swimming speed, such as the nutrient uptake rate (Magar et al., 2003).
In the shearstress, noslip model, the velocity field in the region between the colony radius $R$ and the radius $R(1+\u03f5)$ at which the shear stress is applied is solved separately from that for $r>R(1+\u03f5)$ and the two flow fields are matched at $R(1+\u03f5)$ through boundary conditions of continuity in velocity and normal stress and the specified discontinuity in shear stress. Analogously to the expansion of the slip velocity in the squirmer model (7), noise in that discontinuity can be expressed by assuming that the coarsegrained shear force applied by the flagella has spatial variations, and can be expanded in the form
The swimming speed again depends only on the lowestorder mode in this expansion,
and we again have insensitivity of $U$ to inhomogeneities in the area per somatic cell.
Discussion
In this paper, we demonstrated that universal cellular packing geometries are an inevitable consequence of noisy multicellular assembly. We measured the distribution of Voronoi polytope sizes in both nascent and extant multicellular organisms, and showed that they are consistent with the kgamma distribution, which arises via maximum entropy considerations. Using simulations, we demonstrated that kgamma distributions arise in many different growth morphologies, and do so requiring only a relatively small amount of structural randomness. Further, we showed that the distribution of cell neighborhood sizes can be used to distinguish the effects of randomness from the effects of developmental patterning. Finally, we demonstrated that consistent packing statistics can lead to highly reproducible, and thus heritable, multicellular traits, such as group size in snowflake yeast. Altogether, these results indicate that entropic cell packing is a general organizing feature of multicellularity, applying to multicellular organisms with varying growth morphologies, connection topologies, and dimensionalities.
One of the strengths of the packingbased maximum entropy framework employed here is its simplicity. We have demonstrated that the distribution of cell neighborhood sizes can be predicted with high accuracy, in many different multicellular morphologies, from only the first two moments of the distribution. Deviations from maximum entropy predictions therefore encode important information about additional correlations that can arise via a variety of sources, such as developmental regulation or interactions with the environment. These additional correlations could be explored via, for example, higher order structures to the maximum entropy model (Schneidman et al., 2003). Relatedly, it would be interesting to extend our analysis to include topological information, which has recently been applied to theoretical and experimental systems (Skinner et al., 2021; Jeckel et al., 2021). Developing methods to incorporate these higherorder effects may prove especially useful when assessing the importance of random fluctuations in organisms with structures more complex than those studied here.
The effect of random noise has been an important area of research in developmental biology (Tsimring, 2014; Lander, 2011). During development, cellular growth, reproduction, differentiation, and patterning combine to form a multicellular organism. Random noise introduced at any stage in this process can result in phenotypic variability, which may affect an organism’s fitness (Waddington, 1957). But while some multicellular traits exhibit high variability, others are tightly conserved, leading to a wide body of research addressing the origin of mechanisms underlying robustness and stability, and the nature of feedback mechanisms that must be present to manage the large number of stochastic fluctuations in gene expression and growth (Gregor et al., 2007; Haas et al., 2018; Hong et al., 2016; Sampathkumar, 2020; Deneke and Di Talia, 2018). In this context, our results demonstrate that random noise can itself lead to highly reproducible multicellular traits such as the cell packing distribution.
Our observation that heritable properties can arise from random processes is reminiscent of the reproducible structures and phenomena generated by random noise in a wide range of physical (Shinbrot and Muzzio, 2001; Manoharan, 2015) and biological systems (Tsimring, 2014; Lander, 2011). While it may be surprising that the distribution of free space in snowflake yeast and Volvox follow the same kgamma distributions despite the many differences between these organisms, this universality actually extends beyond multicellular organisms to nonliving materials, such as those seen in granular materials and foams (Katgert and van Hecke, 2010; Varadan and Solomon, 2002; Aste and Di Matteo, 2008). This broad universality likely arises due to the simple requirements for application of the maximum entropy principle to packing; specifically, there must be a total volume, individual volumes cannot overlap, and volumes must be determined independently (subject to the total volume constraint). It is thus important to note that entropic packing is not necessarily adaptive; it can readily emerge as a consequence of random cellular reproduction or interactions. While entropic packing statistics may produce advantages in some cases, they could be neutral or detrimental in others.
An example of one possible advantage granted by entropic packing is the parentoffspring fidelity that arises from its ensemble statistics. Since both parents and their offspring are assembled through similar noisy processes, they achieve similar cell packing distributions. This statistical similarity therefore details at least one heritable multicellular trait that does not rely on genetically regulated multicellular development. Other multicellular traits that build on the cell packing distribution are similarly affected by this emergent process and could become heritable as well. Such parentoffspring heredity could play a crucial role in the evolutionary transition to multicellularity, providing a mechanism for nascent multicellular organisms to participate in the evolutionary process without first having to possess genetically regulated development. Over time, developmental innovation may arise via multicellular adaptation, modifying or replacing entropic cell packing as a mechanism of multicellular heredity. Consistent with this hypothesis, maximum entropy retains considerable predictive power in extant multicellular organisms such as Volvox, animal embryos (Alsous et al., 2018), and epithelial tissue monolayers (Atia et al., 2018), each of which have canalized development. There may be other examples of organisms with highly regulated development which pack cells according to maximum entropy predictions, and future work could address cell packing in, for example, animal embryos, brain tissue, and more. Finally, as fragmentation is a common mode of multicellular reproduction (Larson et al., 2020; Prakash et al., 2021; Angert, 2005; Keim et al., 2004; Koyama et al., 1977), fracture driven by maximum entropy packing statistics may be relevant to organisms other than snowflake yeast.
The broad distributions in cellular volumes we have found in two very different types of organisms, with two very different modes of reproduction and growth, suggest that noise in developmental geometry may be an inevitable consequence of almost any microscopic mechanism. In this sense, they may be just as unavoidable in biological contexts as thermal fluctuations are in systems that obey the rules of equilibrium statistical physics. As an example, we recall the ‘flicker phenomenon’ of erythrocytes, in which the red blood cell membrane exhibits stochastic motions around its equilibrium biconcave discoid shape. Thought for many years to be a consequence of specific biochemical processes associated with living systems, flickering was eventually shown by quantitative video microscopy (Brochard and Lennon, 1975) to be consistent with equilibrium thermal fluctuations of elastic biomembranes immersed in water. This was later confirmed by similar studies of shape fluctuations exhibited by large lipid vesicles (Schneider et al., 1984). The generalization of these considerations to homeostatic tissues with cell division, rearrangements and apoptosis has also been considered (Risler et al., 2015; Kalziqi et al., 2018). While such membrane systems may differ greatly in the specific values of their elastic modulus (and, indeed, of their microscopic membrane constituents), the viscosity of the surrounding fluid, and their physical size, the spacetime correlation function of fluctuations about the equilibrium shape adopts a universal form in appropriately rescaled length and frequency variables.
These results on equilibrium fluctuations provide a conceptual precedent for the results reported here. A central issue that then arises from our results is how to connect any given stochastic biochemical growth process defined at the microscopic level to the more macroscopic probability distribution function observed for cellular volumes. Mathematically, this is the same question that arises in the theory of random walks, wherein a Langevin equation defined at the microscopic level leads, through suitable averaging, to a FokkerPlanck equation for the probability distribution function of displacements. Can the same procedure be implemented for growth laws?
Materials and methods
Yeast genotypes and growth morphology
Snowflake yeast genotypes
Request a detailed protocolMulticellular yeast groups were constructed from initially unicellular Saccharomyces cerevisiae. Petite yeast groups (P) were used in all experiments except those noted below. Snowflake yeast were engineered by replacing a functional copy of ace2 with a nonfunctional version as described in Ratcliff et al., 2015 (these modified genotypes will be referred to as either snowflakes or Ace2KO). Under daily selection for large size through settling in liquid media, groups can arise via a single mutation in the ace2 gene (Ratcliff et al., 2012; Ratcliff et al., 2015). When the ace2 gene is not expressed, the final stage of cell division is not completed, and motherdaughter cells remain attached at the chitinous bud site. Since all cells are attached directly to their mothers, snowflake groups form a fractallike branched tree collective. To measure bud scar size, we used a unicellular strain of Y55 yeast; these measurements were only used to pick parameters for snowflake yeast simulations.
Yeast growth morphology
Request a detailed protocolS. cerevisiae cells reproduce by budding, a type of asexual reproduction where a new cell extrudes from the surface of the parent cell. During budding, mother and daughter cells remain attached via a rigid chitinous bond; in unicellular yeast, chitinase will degrade this bond as the last step in cell division, releasing the daughter cell and leaving behind a ‘bud scar’ on the mother surface and a ‘birth scar’ defining the proximal hemisphere on the daughter’s cell surface. In all experiments, we use yeast expressing bipolar budding patterns (Chant and Pringle, 1995). The bipolar budding pattern is characterized by bud sites that typically do not form along the equator of the cell. Usually, the first daughter buds near the distal pole. Subsequent budding sites are typically positioned along a budding ring defined by a polar angle $\theta $ ( Figure 1—figure supplement 1). Some buds will ‘backbud’ toward the mother cell (i.e. on the proximal end of the cell), but most buds are placed on the distal side. By contrast, the azimuthal positions of all buds appears to be randomly distributed.
Growth conditions
Request a detailed protocolAll experiments were performed on yeast grown for approximately 24 hr in 10 mL of yeast peptone dextrose (YPD, 10 g/L yeast extract, 20 g/L peptone, and 20 g/L dextrose) liquid medium at 30 C, and shaken at 250 rpm in a Symphony Incubating Orbital Shaker model 3,500I. All cultures were therefore in the stationary phase of growth at the time of experiments.
Scanning electron microscopy to measure group structure
Since yeast cells have thick cell walls that limit the effectiveness of optical microscopy, we used a Zeiss Sigma VP 3View scanning electron microscope (SEM) equipped with a Gatan 3View SBF microtome installed inside a Gemini SEM column to obtain high resolution images of the internal structure of snowflake yeast groups and locate the positions of all cells. All SEM images were obtained in collaboration with the University of Illinois’s Materials Research Laboratory at the Grainger College of Engineering. Snowflake yeast clusters were grown overnight in YPD media, then fixed, stained with osmium tetroxide, and embedded in resin in an eppendorf tube. A cube of resin $200\mathrm{\mu}\mathrm{m}$ x $200\mathrm{\mu}\mathrm{m}$ x $200\mathrm{\mu}\mathrm{m}$ (with an isotropic distribution of yeast clusters) was cut out of the resin block for imaging. The top surface of the cube was scanned by the SEM to acquire an image with resolution $50\mathrm{nm}$ per pixel (4000 × 4000 pixels). Then, a microtome shaved a 50nmthick layer from the top of the specimen, and the new top surface was scanned. This process was repeated until 4000 images were obtained so that the data cube had equal resolution in $x,y,z$ dimensions.
Custom image analysis scripts were written for the SEM datasets. First, a local adaptive threshold was used to binarize the image. A distance transform was used to identify the center of each cell slice in a particular 2d image. A watershed algorithm was then seeded with the cell slice centers, followed by a particle tracking algorithm to label cells across image slices. After labeling, the boundary for each cell was found, resulting in a point cloud of the exterior of each cell. Each cell was then fitted with an ellipsoid with nine fit parameters: $({x}_{0},{y}_{0},{z}_{0})$ cell center, $(a,b,c)$ cell radii, and $(\theta ,\phi ,\psi )$ for cell orientation. The net rotation matrix $R$ was then found, where each column of $R$ corresponds to the direction vector of one principal axis of the ellipsoid. We consider the radii of the principal axes $(a,b,c)$ to be part of a diagonal scaling matrix $S$ which sets the ellipsoid size. Since the SEM images only capture the cell cytoplasm, each principal axis was increased in size by an additional $100\mathrm{nm}$ to account for the cell wall during visualization. Last, although there is no possible 3d 3 × 3 translation matrix, a 4 × 4 translation matrix $T$ can capture the position of the cell center $({x}_{0},{y}_{0},{z}_{0})$. Adding one additional column and row to the matrices $R$ and $S$ with the diagonal element being 1 and all other elements being 0 then means that a unit sphere centered at the origin can be mapped to any specific cell by a surface matrix $M=TRS$, and furthermore any point on the cell’s surface can be mapped back to the unit sphere by the inverse of $M$. Then, the surface matrices are the only information that must be stored. From this dataset, 20 clusters of $105\pm 51$ cells in each cluster were identified along with their intercellular motherdaughter chitin bonds. We chose to stop at $N=20$ clusters due to the expensive nature of handidentifying motherdaughter bonds, and since $N=20$ clusters gave over 2000 measurements of single cells in the clusters. In following experiments with different organisms and in following simulations of different growth morphologies, we always made at least as many measurements as for these experiments of snowflake yeast.
Petite yeast cell size and shape
Request a detailed protocolWe measured cellular volumes from SEM images by ellipsoid fits. The average cellular volume of petite yeast was ${v}_{c}=17.44\mathrm{\mu}{\mathrm{m}}^{3}\pm 7.33\mathrm{\mu}{\mathrm{m}}^{3}$. This measurement was used in our Voronoi distribution derivations. We measured the mean cellular aspect ratio to be $\alpha \equiv a/b=1.28\pm 0.20$.
Bud scar size
Request a detailed protocolWe next measured the typical size of bud scars on the surface of Y55 yeast cells. Single cells were stained with calcafluor to highlight the chitinous bud scars (Figure 1—figure supplement 1). Confocal $z$stacks were obtained on a Nikon A1R confocal microscope equipped with a 40× oil immersion objective. These images were visualized using the image processing software FIJI, and the 3d volume viewer plugin. To track the location and size of bud scars, a custom MatLab script was written to map the strongest calcafluor signals, since calcafluor makes bud scars brighter than other portions of the cell wall. Brightness isosurfaces then isolated the bud scars from the cell wall. Next, the isosurface points were rotated to the $xy$ plane by finding its principal components in a principal component analysis. The rotated surface points were then fit with an ellipse, returning the major and minor axes. The average of the major and minor axes returned an average interior bud scar diameter of $1.2\mathrm{\mu}\mathrm{m}$. This value was later used in simulations of yeast groups.
Bud scar locations
Request a detailed protocolWe measured bud scar positional distributions for petite yeast Ace2KO. Since the SEM does not image chitinous bud scars, we approximated bud scar positions as the closest point on a mother cell’s surface to the corresponding daughter cell’s proximal pole. We recorded 1990 bud scar positions in polar coordinates, as defined in Figure 1—figure supplement 1. There is a clearly defined polar angle for the budding ring, while the azimuthal angle is uniformly distributed. The mean and standard deviations of the two angular coordinates were $\theta =42\mathrm{\%}\pm 23\mathrm{\%}$, and $\phi =180\mathrm{\%}\pm 104\mathrm{\%}$.
Imaging Volvox
Cultivation and selective plane illumination microscopy
Request a detailed protocolThe V. carteri f. nagariensis strain HK10 (UTEX 1885) was obtained from the Culture Collection of Algae at the University of Texas at Austin and cultured as previously described (Brumley et al., 2014). To visualise somatic cells, V. carteri spheroids were embedded in 1% lowmeltingpoint agarose, suspended in liquid medium and imaged using a custombuilt Selective Plane Illumination Microscope (Haas et al., 2018). Each somatic cell is mostly filled with a single chloroplast. Chlorophyll autofluorescence was excited at $\lambda =561\mathrm{nm}$ and detected at $\lambda =570\mathrm{nm}$. To increase the accuracy with which we identify somatic cell positions, $z$stacks of six spheroids were acquired from three different angles (0, 120, 240 degrees) and fused as described in the following paragraph. We used data from $N=6$ different Volvox organisms, which gave us $\sim 6000$ measurements of singlecell positions. This gave us more data than what we measured for snowflake yeast clusters.
Registration of cell positions
Request a detailed protocolPositions of cells were registered based on fluorescence intensity using custom Matlab scripts. This was achieved by carrying out a 2D convolution of each frame of the $z$stack with a basic kernel modeling the appearance of a cell – this was set to be an asymmetric double sigmoidal function. Cell segmentation was corrected manually. $Z$stacks taken from different angles were roughly aligned using Fiji and the Matlab function fminsearch to minimise distances between the reproductive cells. This alignment was used as starting point for alignment of the somatic cells again using fminsearch. The positions of somatic cells were merged and averaged.
Voronoi tessellation
We used a Voronoi tessellation algorithm to measure the distribution of cell neighborhood sizes in groups. We computed both 3D and 2D Voronoi tessellations.
3D voronoi tessellations
Request a detailed protocolFirst, we computed 3D Voronoi tessellations within a defined boundary. These tessellations were performed for experimental snowflake yeast data from the SEM and simulations of 3D groups using the opensource Voronoi code Voro++ (Rycroft, 2009), wrapped in a custom MatLab script. Voro ++ takes as input the Cartesian coordinates of the cell centers and the boundary of the shape within which to compute the tessellation. Without a boundary, all of the Voronoi cells located on the periphery would extend to infinity. We started the tessellation process by setting the input boundary to be a sphere; the Voronoi algorithm tessellated space within the spherical boundary. Then, pieces of the sphere were pared away until a Voronoi tessellation within the group’s convex hull was obtained, as described in the next paragraph.
The boundary sphere was centered on the cluster’s center of mass. Its radius was the distance to the farthest cell center plus an additional $5\mathrm{\mu}\mathrm{m}$. Upon tessellation within the sphere, each Voronoi polyhedron is defined by Cartesian vertices $\mathbf{r}}_{\mathbf{j}$. We group these preliminary vertices by the cells to which they correspond, so that $Q}_{i}=\{{\mathbf{r}}_{\mathbf{1}},{\mathbf{r}}_{\mathbf{2}},...,{\mathbf{r}}_{\mathbf{m}}\$ is a list of the $m$ vertices corresponding to cell $i\in [1,N]$, $N$ being the total number of cells in the organism. We next computed the cluster’s convex hull, which is the smallest convex polyhedron that contains all cell centers. We then extended the vertices of the convex hull by $3\mathrm{\mu}\mathrm{m}$ outwards from the cluster center of mass so the boundary contained the entirety of each cell. This new boundary polyhedron, whose vertices are labeled $B$, defines the cluster boundary. We then found the intersection polyhedron, ${Z}_{i}={Q}_{i}\cap B$ by taking the union of the dual of their vertices. This process thereby trims all Voronoi polyhedra to lie exclusively within the cluster’s convex hull. The polyhedra ${Z}_{i}$ were the final Voronoi polyhedra used for the remaining data analysis.
Voronoi tessellation on a sphere
Request a detailed protocolVoronoi tessellation on nonspherical surfaces
Request a detailed protocolWe also computed 2D Voronoi tessellations on surfaces embedded in 3D space using customwritten MatLab functions. This approach was used for Volvox experimental data. Performing this computation with Volvox experiments presented a challenge as Volvox are roughly spherical, but with varying local curvature. It was therefore necessary to compute a Voronoi tessellation on an arbitrary surface.
The first step toward generating the proper Voronoi tessellation was computing the Delaunay triangulation of the cells on the surface (the Voronoi tessellation is the dual of the Delaunay triangulation). First, we found the Cartesian coordinates of each somatic cell (as described above), and normalized these coordinates so that all cell centers laid on the unit sphere. Then, a Delaunay triangulation of the normalized points was calculated. Edges of the triangulation that cut through the unit sphere were eliminated, and edges that laid along the sphere surface were kept. This Delaunay triangulation therefore mapped out the connectivity of the somatic cells. We then projected that triangulation onto the lumpy surface. The Voronoi polygon vertices are the circumcenters of each Delaunay triangle. Further, any edge shared between two Delaunay triangles denotes an edge shared between the Voronoi vertices associated with those two triangles. We found all edges connecting the Voronoi vertices. Next, connected edges were flattened so that each Voronoi cell was a 2D polygon. This step eliminates the curvature associated with the surface of the organism. However, we found that the distribution of Voronoi areas was unaffected by taking either the planar approximation or by approximating the area by taking the local curvature into account – the average difference between Voronoi areas when approximating the surface as a plane ${A}_{p}$ vs. approximating the surface as a spherical cap ${A}_{s}$ was found to be $\u27e8\frac{{A}_{p}{A}_{s}}{{A}_{p}}\u27e9=0.001$, measured for one organism. Therefore, we used the flattened Voronoi polygons as the final tessellation shapes.
Data analysis of voronoi measurements
In all cases, the output of the Voronoi algorithm is a list of Voronoi polytope sizes: in 3D, the measurements were the final Voronoi polyhedron volumes, while in 2D the measurements were polygon areas. Histograms of these sizes were generated to compare with the kgamma distribution. As we observe cells in direct contact with each other, the minimum size of a Voronoi volume or area was defined by single cell measurements. For petite yeast cells, the mean cell size was calculated from the ellipsoid fits described above to be ${v}_{c}={17.44\mathrm{\mu}\mathrm{m}}^{3}\pm 7.33\mathrm{\mu}{\mathrm{m}}^{3}$. In simulations, the minimum volume was set by the defined cell radius; in bidisperse simulations, the minimum size was set by the volume of the smallest cells.
We then calculated the expected maximum entropy distribution using only the mean and variance of the observed Voronoi volumes, $\overline{v}$ and ${\sigma}^{2}$, as inputs. Together with the minimum volume v_{c}, these measurements define $k={(\overline{v}{v}_{c})}^{2}/{\sigma}^{2}$, a dimensionless shape parameter (Aste and Di Matteo, 2008). The maximum entropy distribution was therefore not fit to the data using, for example, a least squares method, but inferred from the first two moments of the distribution.
Volvox
Request a detailed protocolAlong the surface of the Volvox organisms, there are gaps between some of the somatic cells due to the gonidia that lie beneath, but near the surface of the organism. These gonidia effectively occupy space on the surface, making it inaccessible to somatic cells. We excluded all Voronoi cells that intersected these gonidia gaps. We identified gaps in the soma cells by flagging Delaunay triangles with exceptionally high aspect ratios. Any Voronoi polygons that intersect the flagged Delaunay triangles were then flagged and later excluded from the dataset. The polygons were generally spatially clustered, indicating that the gonidial gaps were being correctly isolated. Roughly 90 polygons were excluded from each organism.
In Volvox organisms, each cell is surrounded by extracellular matrix, so cells do not contact each other. Furthermore, each of the six organisms studied varied in diameter (standard deviation in diameter was $28.2\mathrm{\mu}\mathrm{m}$), yet all contained roughly the same number of somatic cells, leading to systematic differences in average surface area per cell across the organisms. Quantitatively, the coefficient of variation of the diameter of the groups was $C{V}_{D}=0.05$, while the coefficient of variation in the number of cells in each group was roughly 10 times smaller, $C{V}_{N}=0.006$. To counter the systematic size differences between organisms, we converted the Voronoi polygon areas into solid angles by dividing by the total surface area of each organism, ${\mathrm{\Omega}}_{i}={A}_{i}/S$; we then grouped all six organisms together into one histogram. We allowed the minimum solid angle, used in the kgamma equation, to be a fit parameter in a least squares minimization procedure. There was one outlier cell with solid angle $\mathrm{\Omega}=0.0048$ steradians; the next two smallest cells had solid angles 0.0068 and 0.0069 steradians. We removed the outlier; the least squares minimization procedure then fit a minimum solid angle ${\mathrm{\Omega}}_{c}=0.0070$ steradians. We used this value for all further calculations. Just as in the 3D case, the mean and variance of the solid angle were measured to set the expected maximum entropy distribution.
Goodnessoffit analysis
To test how well the kgamma distribution performed as a predictive distribution, we systematically compared its performance to three other distributions: the normal (Gaussian) distribution, the lognormal distribution, and the beta prime distribution. We used two of our datasets for this comparison: experimental values of Voronoi volumes for snowflake yeast, and simulations of snowflake yeast (which provided us with more datapoints for comparison). First, we used the measured mean $\overline{V}$ and standard deviation $s$ of the datasets to estimate each distribution’s parameters. In Table 1 are the two parameters of each distribution and how they are calculated from the measured mean and variance of the dataset.
The three distributions that we used to compare to the kgamma distribution were chosen based upon their properties. For instance, all three comparison distributions have two parameters; also, all three chosen distributions have either semiinfinite or infinite domain. We chose the Normal distribution because it is the centrallylimiting distribution for a process with summative random errors. We chose the Lognormal distribution because it is the centrallylimiting distribution for a process with multiplicative random errors. Last, we chose the beta prime distribution arbitrarily; the goal was to make comparisons to a distribution which we have no reason to believe would accurately describe this dataset.
To systematically compare the distributions, we sampled $N$ datapoints, with replacement, from the original datasets. We varied $N$ to test the effect of finite datasets. Each time we subsampled the dataset, we calculated the mean and variance of the new data sample, then used those measured values to estimate the parameters of the four distributions. We then recorded the rootmeansquare residual for all four cases. We plotted the rootmeansquare residual for each distribution as a function of the data sample size, $N$, in Figure 2. We observed that the kgamma distribution outperformed the other distributions at every sample size.
Next, we used the four distributions to predict the skewness of the simulation data. We empirically determined the first two moments of the dataset: we take these to be the first two moments for each of our distributions. Then, we used the first two moments to predict the skewness, which is related to the third moment of the distribution. We then compared this value to experimentallymeasured skewness, finding that the kgamma distribution estimated the experimentally observed skewness with only a 7% error; the other three distributions estimated the skewness with percent errors of $100\mathrm{\%},96\mathrm{\%},\phantom{\rule{thinmathspace}{0ex}}\phantom{\rule{thinmathspace}{0ex}}\mathrm{a}\mathrm{n}\mathrm{d}\phantom{\rule{thinmathspace}{0ex}}\phantom{\rule{thinmathspace}{0ex}}174\mathrm{\%}$ for the normal, lognormal, and betaprime distributions, respectively (Figure 2).
As an additional test, we used leastsquares fitting to test the performance of the kgamma distribution compared to the other three distributions for simulation data. In this case, we let the leastsquares fitting algorithm find the best parameters to fit each distribution; then, we compared the leastsquaresachieved mean and standard deviation of the fit distribution to the measured mean and standard deviation of the population. We found that the kgamma fit reproduced the closest values to the measured mean and standard deviation (Figure 2).
Cluster size distribution measurements
Request a detailed protocolCluster sizes were measured using a Beckman Coulter Multisizer 4e particle analyzer in the Cellular Analysis and Cytometry Core of the Shared User Management System located at the Georgia Institute of Technology. Petite Ace2KO clusters were taken from steady state concentration in YPD and then submerged in electrolytic fluid and passed through a $100\mathrm{\mu}\mathrm{m}$ aperture tube. The volume measured on the multisizer corresponds to the volume of electrolyte displaced by the cluster. The number of cells in each cluster was then estimated by $N=V/{v}_{c}$, where $V$ is the volume of organism measured by the Coulter Counter, and v_{c} is the average cell volume from SEM measurements, ${v}_{c}={17.44\mathrm{\mu}\mathrm{m}}^{3}$.
Cumulative distribution function statistics
Request a detailed protocolTo quantify goodnessoffit for predicted maximum entropy distributions, we compared the predicated cumulative distribution function (CDF), $F(x)$, to the empirical CDF, ${F}_{i}$, using PP plots. Exactly predicted points will lie on the line $y=x$ in these plots. We measured the rootmeansquare residual from the line
Measurements of ${\mathrm{\Psi}}_{6}$ in V. carteri
From the light sheet images of Volvox, we obtained the Cartesian coordinates of each somatic cell. From Delaunay triangulation, we then obtained a list of every cell’s closest neighbors. Each cell and its $NN$ nearest neighbors did not generally lie in a plane due to local curvature of the Volvox surface. We therefore calculated inplane and outofplane components using principal component analysis. The inplane components were then used to write the positions of each nearest neighbor in polar coordinates. The formula for calculating ${\mathrm{\Psi}}_{6}$ is
where ${\theta}_{j}$ defines the polar angle coordinates around the cell of interest and $\u27e8\mathrm{\dots}\u27e9$ denotes averaging over all cells. We calculated ${\mathrm{\Psi}}_{6}$ separately for each of six different organisms; we report ${\mathrm{\Psi}}_{6}=0.03\pm 0.01$.
Correlation of voronoi areas
Request a detailed protocolIn Volvox organisms, we calculated the spatial correlation of polygon areas. First, we extracted the list of cell neighbors from the Delaunay triangulation of the organism surface. Nearest neighbors were designated as living a network distance of 1 away from a cell of interest; next nearest neighbors live a network distance of 2 away from the cell of interest, etc. The number of neighbors a network distance of $Q$ away is then $J(Q)$, which is empirically determined. The network correlation function is then
where ${Y}_{Q}=J{(Q)}^{1}{\sum}_{j}({\mathrm{\Omega}}_{j}\u27e8\mathrm{\Omega}\u27e9)$ is the average deviation of the solid angle of a given polygon’s neighbors from the mean. The standard deviation of the solid angle across the population is ${\sigma}_{\mathrm{\Omega}}$, and ${\sigma}_{{Y}_{Q}}$ is the standard deviation of ${Y}_{Q}$ across the population.
Simulation methods
Simulations of snowflake yeast groups
Request a detailed protocolSimulations of snowflake yeast groups were adapted from previously published work by Jacobeen et al., 2018a; Jacobeen et al., 2018b that found simulations of snowflake yeast growth morphology accurately replicated experimentally measured cellular packing fractions and average group sizes. In the present work, cells were modeled as prolate ellipsoids of revolution with a semimajor axis $a=2.88\mathrm{\mu}\mathrm{m}$ and semiminor axis $b=2.29\mathrm{\mu}\mathrm{m}$, characterized by the aspect ratio $\alpha \equiv a/b=1.26$. Each generation, every cell attempted to reproduce; however, if new cells closely overlapped with existing cells (i.e. their bud scars are closer than $1.2\mathrm{\mu}\mathrm{m}$), they were eliminated. Setting the number of generations (for example, 7) sets the maximum possible number of cells in the group at the end of the simulation (${2}^{7}=128$), and roughly sets the expected number of cells in the group ($\sim 100$). In our simulations, cells were 80% likely to bud first from the distal pole (i.e. $\theta =0\pm 10$ degrees). Subsequent cells budded at a polar angle $\theta $, and with an azimuthal angle randomly chosen from a uniform distribution $\phi \in [0,2\pi ]$; in other words, after the first bud, cells generally appeared along a ‘budding ring’. There was a 20% chance that the first bud would appear along this budding ring instead of exactly at the pole. After 3 bud scars, there was a 50% chance that new cells bud on the proximal side ($\pi \theta $) instead of the distal side. The orientation of the new cell is determined by the surface normal to the mother cell at the position of the bud site; the major axis of the new cell lies along the surface normal.
To compare exhaustively the distribution of Voronoi volumes between simulations and the kgamma distribution, we simulated $9,100$ clusters. In each simulation, clusters were allowed to grow for 7 generations of cell division, corresponding to an average of $94.2\pm 10.9$ cells per cluster. The budding ring was defined by the polar angle $\theta =45\mathrm{\%}$, a close approximation to the experimentally measured mean polar angle. These simulations did not include intercellular forces. The cell centers were recorded and then Voronoi tessellations were made within each cluster’s convex hull.
Simulations of V. carteri
Request a detailed protocolWe simulated 10 Volvoxlike groups with $N=1000$ cells confined to the surface of a sphere. Cells were placed on the surface of a sphere of unit radius by randomly selecting polar and azimuthal coordinates in a Poisson point process. The process proceeds as follows: each new cell was randomly placed, and its distance from all other cells was calculated. If the new cell is within a threshold distance $d$ from any existing cell, it was removed and a new cell was placed elsewhere on the spherical surface. This process was iterated until all 1000 cells were placed. We chose a minimum separation distance of $d=0.088$, which allowed reasonably rapid convergence. We then calculated the Voronoi tessellation and the correlation function as described above. The $N=10$ groups, each with 1000 cells, returned 10000 single cell measurements.
Simulations of two additional growth morphologies
Request a detailed protocolWe next sought to model two additional classes of growth morphologies: sticky aggregates and cells contained within a maternal membrane. In both simulations, cells were modeled as spheres with unit radius.
Aggregative groups
Request a detailed protocolFirst, we considered a multicellular model of sticky aggregates, mimicking group formation in, for example, flocculating yeast and bacterial aggregates. In our simulations, groups were grown from a single cell. New spherical daughters appeared at a polar angle $\theta $ and azimuthal angle $\phi $. Within each step, there was stochasticity in the budding location: cells would appear at $\theta ={\theta}_{0}\pm 15\mathrm{\%}$. The azimuthal angle was always drawn from a uniform distribution on the interval $[0\mathrm{\%},360\mathrm{\%}]$. We simulated $N=50$ aggregative groups, each with 64 cells, for a total of gt_{3000} measurements of singlecell volume.
Cells interacted with both steric and attractive interactions in overdamped dynamics. Steric interactions were modeled through a harmonic potential when two cells overlapped, with a cutoff once cells were no longer overlapping. That is, for two cells $i$ and $j$ (radii ${R}_{i}$ and ${R}_{j}$) separated by the vector $\mathbf{r}}_{ij}={\mathbf{r}}_{j}{\mathbf{r}}_{i$, the steric force acting on cell $i$ from cell $j$ is
Attractive interactions (i.e. sticky, aggregative bonds) were also modeled through a harmonic potential, but these interactions had both a lower bound and upper bound cutoff.
where $a$ sets the location of the attractive well minimum. We used $a=0.9$, so that the attractive interactions allow a small amount of cell overlap.
Size polydispersity
Request a detailed protocolIn simulations in which we introduced size polydispersity, cells were allowed to reproduce into two separate sizes, ${R}_{1}=1$ and ${R}_{2}=2$. The probability of budding cells of the same size as the mother cell is denoted $\xi $. When $\xi =1$, the mother cell always produces cells of the same size, while when $\xi =0.5$, there is a 50% chance that the mother cell produces a cell of size R_{1} or R_{2}, independent of the radius of the mother. Simulations were seeded with a pair of contacting cells, one each of the two radii. The simulation then proceeded with subsequent rounds of cell division and mechanical relaxation. We varied the polydispersity parameter $\xi $ in even increments from $\xi =0.5$ to $\xi =1.0$ for a total of 6 different simulations. In each simulation, we simulated $N=50$ organisms, each with 64 cells, for a total of 3200 cells per simulation, and almost 20,000 cells total.
Groups confined within a membrane
Request a detailed protocolIn another common mode of group formation, cells divide repeatedly within a confining membrane. This type of group formation has been observed in experimentallyevolved multicellular algae derived from unicellular Chlamydomonas reinhardtii (Herron et al., 2019), and is reminiscent of both baeocyte production in Stanieria bacteria (Angert, 2005), and neoproterozoic embryo fossils (Xiao et al., 1998). In a simulation model, we adopted the essential components of this class of growth: groups grow from a single spherical cell, cells divide stochastically, and cells interact sterically with both a maternal cell wall and each other. Typically, palintomic cell division occurs rapidly, meaning that the packing fraction remains the same within the maternal cell wall. We simulated this by increasing the radius of the cell membrane after each cell division, but before allowing any mechanical relaxations.
Steric forces between a cell and the maternal cell wall were modeled as being proportional to the nonoverlapping volume of the cell and the maternal cell wall. In other words, if a cell is not contacting the membrane, there is no force acting on it. However, if the cell is contacting the membrane, the force is proportional to how much of the cell volume lies outside the membrane. Each cell was assigned volume ${v}_{c}=4/3*pi$. The overlapping volume of the cell and the membrane is labeled v_{i}. The force the cell experiences from the membrane is then
where $\hat{\mathbf{r}}}_{i$ is a unit vector pointing to the center of the maternal membrane. Additionally, steric interactions between cells were calculated as described above for aggregative groups.
We simulated $N=200$ groups growing in a confining membrane, each with 64 cells in the group, for a total of 12800 measurements of single cells.
Groups confined to a spherical surface
Request a detailed protocolSome groups form by arranging cells around a central core of extracellular matrix (ECM). To simulate such groups, we modeled a sphere of ECM with cells arranged randomly along the surface. Cell positions were chosen by selecting a position in spherical coordinates from uniform polar $\theta \in [0,\pi ]$ and uniform azimuthal $\phi \in [0,2\pi ]$ distributions. The only rule implemented in cell placement is that no two cells can be located closer than two cell radii from one another. If a new cell is chosen to be located too close to any existing cells, it is eliminated and a new position is chosen. We iterated this process until $N$ cells were placed on the ECM surface.
First, we chose to place $N=50$ cells on the surface. Therefore, the maximum cell radius allowing all 50 cells to be placed is 0.283 units (where the total sphere has unit radius). We chose the cell radius to be 0.1980 units, which allowed for reasonably rapid random placement of all 50 cells (other choices of cell radii demonstrate qualitatively similar results).
Simulated cellular apoptosis
Request a detailed protocolIn simulations with apoptosis events, cell death occurred after group generation. Briefly, groups were generated by iterated generations of cell division starting from a single cell. After this process, one cell was chosen at random to die. Then, all cells within a localization radius $R$ were flagged. Of the flagged cells, 9 more were chosen at random to die. Therefore, small localization radii correspond to highly localized death events, where 10 juxtaposed cells may die together. As the localization radius increases, there are more flagged cells, and therefore more randomness in cell death. All other cells were unaffected by the cell death process. We simulated six different cell death radii. Each cell death radius simulation included 100 organisms simulated, each with 43 cells after the cell death process. Therefore, each cell death radius $R$ includes 4300 single cell measurements, for a total of 25,800 measurements.
Treelike groups with precisely defined cell placement/location
Request a detailed protocolWe also investigated groups with precisely defined growth patterns. The spherical cells were held together with fixed, chitinlike bonds. The first cell was placed at the origin. It then proceeded to bud 3 daughter cells, each of which also budded subsequent cells. The exact budding pattern is described below.
Daughter cells were placed as follows. In spherical coordinates on the surface of the mother cell, the first daughter cell was placed at $(\theta =0\pm \eta $, $\varphi =0\pm \eta )$, the second at $(\theta =90\pm \eta $, $\varphi =90\pm \eta )$, and the third at $(\theta =90\pm \eta $, $\varphi =270\pm \eta )$, where $\eta $ is the strength of random noise added.
The first daughter cell’s coordinate system was rotated $90\mathrm{\%}\pm \eta$ around the $z$axis from the mother cell; in other words, for the first daughter cell, $x\stackrel{}{\to}{x}^{\prime}$, $y\stackrel{}{\to}{y}^{\prime}$, and $z\stackrel{}{\to}{z}^{\prime}$, where $\mathbf{x}}^{\mathrm{\prime}}={\mathbf{R}}_{z}(\pi /2+\eta )\mathbf{x$, $\mathbf{y}}^{\mathrm{\prime}}={\mathbf{R}}_{z}(\pi /2+\eta )\mathbf{y$, and $z}^{\mathrm{\prime}}={\mathbf{R}}_{z}(\pi /2+\eta )\mathbf{z$, and $\mathbf{R}}_{z$ is the rotation matrix around the $z$axis. This daughter cell then proceeded to bud daughters in the exact same pattern as its mother; however, because its local coordinates were rotated, the budding positions were also rotated 90% with respect to the mother cell’s buds. This process was iterated for five generations of cell division. When $\eta =0$, this corresponds to only three cells overlapping three other cells. The three overlapping cells were then removed.
After each round of cell division, cells were allowed to relax mechanically in overdamped dynamics according to steric repulsive interactions and sticky, rigid bond interactions to their mother cell. The steric interactions were the same as described above. Fixed bond interactions were modeled as follows. When new cells appear, they incur a bud scar on the mother cell’s surface and a birth scar on the daughter cell’s surface. The positions $\mathbf{r}}_{bu$ and $\mathbf{r}}_{bi$ of the bud scar and the birth scar were recorded and tracked. The vector pointing from the bud scar (on the mother’s surface) to the birth scar (on the daughter’s surface) is $\mathbf{r}={\mathbf{r}}_{bi}{\mathbf{r}}_{bu}$. Then, the force acting on a cell from it’s mother cell is
where $\kappa $ was the chitin bond strength. In addition, cells experienced forces from all of their daughter buds (given by the same relationship and the same chitin bond strength). The initially seeded cell did not experience forces from a mother cell.
For $\eta =0$ (i.e. no noise), the distribution of Voronoi volumes was visually discontinuous, since cells could only access a finite number of local configurations. As the noise strength increased, the maximum entropy predictions were gradually recovered. We simulated 11 different noise strenths, each with 150 organisms, each with 28 different cells simulated, for a total of 46,800 measurements.
Appendix 1
It may appear surprising that the distribution of cell volumes is not governed by the Central Limit Theorem (CLT), i.e. the volumes are not distributed normally. After all, Voronoi polytope volumes are generated from many randomly interacting pieces  should not these many different random fluctuations sum to a CLTlike scenario? A simple comparison between the modified gamma distribution, a normal distribution, and a lognormal distribution shows in fact that both the normal distribution and the lognormal distribution fail to capture essential characteristics of the volume packing, while the kgamma distribution does. For snowflake yeast, the reason for this disagreement is that as each new cell is added to a cluster, it changes the entire volume distribution, since the new cell occupies space which was previously unoccupied. It therefore changes the volumes of all its nearest neighbors; if they flex to accommodate the new cell, then those neighbors change the Voronoi volumes of their neighbors, and so on. Therefore, adding a new cell does not sample the same distribution as before  the distribution itself changes, rendering the limit inapplicable.
In the case of the Volvox, the somatic cells are originally connected together only by cytoplasmic bridges, forming a small sphere. As the ECM is generated the sphere “inflates”. This process, in which many random fluctuations in the amount of ECM excreted by each cell over time can integrate together, seems appropriate for CLTlike arguments. However, it is worth noting that the cells are generally locally oriented with a hexagonal symmetry. In order to maintain a nonwrinkled surface, more ECM must be secreted in some local regions, such as the corners of the hexagons, than in other places, such as at the hexagon edges. Since there is no local wrinkling observed, the secretion of ECM from the somatic cells cannot be a completely random process orientationally. In other words, the ECM excretion process is controlled, which implies that the CLT does not properly capture the sampling space. Instead, the cells inevitably occupy positions on the surface of the sphere that vary from organism to organism; the maximum entropy distribution of their Voronoi areas is then the kgamma distribution.
Data availability
Figures 1 & 3 source data: Experimental data files enumerating the cell centers positions for each organism sampled. The folders are subdivided into those on snowflake yeast (from SEM studies) and Volvox (from lightsheet studies). Each subdirectory contains an explanatory README file. Figures 2, 4 & 5 source data: Simulation data (enumerating the cell center positions) for the six classes of numerical studies; aggregation, apoptosis, polydispersity, snowflake yeast growth, treelike growth, and Volvox growth. Each subdirectory contains an explanatory README file.
References

Entropic effects in cell lineage tree packingsNature Physics 14:1016–1021.https://doi.org/10.1038/s4156701802020

Alternatives to binary fission in bacteriaNature Reviews. Microbiology 3:214–224.https://doi.org/10.1038/nrmicro1096

An invariant distribution in static granular mediaEurophysics Letters 79:24003.https://doi.org/10.1209/02955075/79/24003

Emergence of Gamma distributions in granular materials and packing modelsPhysical Review. E, Statistical, Nonlinear, and Soft Matter Physics 77:1–8.https://doi.org/10.1103/PhysRevE.77.021309

Geometric constraints during epithelial jammingNature Physics 14:613–620.https://doi.org/10.1038/s4156701800899

Size and complexity among multicellular organismsBiological Journal of the Linnean Society 60:345–363.https://doi.org/10.1111/j.10958312.1997.tb01500.x

The statistical physics of athermal materialsAnnual Review of Condensed Matter Physics 6:63–83.https://doi.org/10.1146/annurevconmatphys031214014336

A densityindependent rigidity transition in biological tissuesNature Physics 11:1074–1079.https://doi.org/10.1038/nphys3471

The origins of multicellularityIntegrative Biology 1:27–36.https://doi.org/10.1002/(SICI)15206602(1998)1:1<27::AIDINBI4>3.0.CO;26

Fluid Mechanics of Mosaic Ciliated TissuesPhysical Review Letters 127:198102.https://doi.org/10.1103/PhysRevLett.127.198102

Frequency spectrum of the flicker phenomenon in erythrocytesJournal de Physique 36:1035–1047.https://doi.org/10.1051/jphys:0197500360110103500

Metachronal waves in the flagellar beating of Volvox and their hydrodynamic originJournal of the Royal Society, Interface 12:20141358.https://doi.org/10.1098/rsif.2014.1358

Patterns of budsite selection in the yeast Saccharomyces cerevisiaeThe Journal of Cell Biology 129:751–765.https://doi.org/10.1083/jcb.129.3.751

Bacterial solutions to multicellularity: a tale of biofilms, filaments and fruiting bodiesNature Reviews. Microbiology 12:115–124.https://doi.org/10.1038/nrmicro3178

BookChapter 1  regulatory hardwiring: A brief overview of the genomic control apparatus and its causal role in development and evolutionIn: Davidson EH, editors. Genomic Regulatory Systems. San Diego: Academic Press. pp. 1–23.

Selfdriven jamming in growing microbial populationsNature Physics 12:762–766.https://doi.org/10.1038/nphys3741

Chemical waves in cell and developmental biologyThe Journal of Cell Biology 217:1193–1204.https://doi.org/10.1083/jcb.201701158

Direct measurement of the flow field around swimming microorganismsPhysical Review Letters 105:168101.https://doi.org/10.1103/PhysRevLett.105.168101

Multicellular group formation in response to predators in the alga Chlorella vulgarisJournal of Evolutionary Biology 29:551–559.https://doi.org/10.1111/jeb.12804

Green algae as model organisms for biological fluid dynamicsAnnual Review of Fluid Mechanics 47:343–375.https://doi.org/10.1146/annurevfluid010313141426

The evolution of multicellularity: A minor major transition?Annual Review of Ecology, Evolution, and Systematics 38:621–654.https://doi.org/10.1146/annurev.ecolsys.36.102403.114735

The multicellular nature of filamentous heterocystforming cyanobacteriaFEMS Microbiology Reviews 40:831–854.https://doi.org/10.1093/femsre/fuw029

Existence of longrange order in one and two dimensionsPhysical Review 158:383–386.https://doi.org/10.1103/PhysRev.158.383

Stability of dancing volvoxJournal of Fluid Mechanics 903:A11.https://doi.org/10.1017/jfm.2020.613

Immotile active matter: activity from death and reproductionPhysical Review Letters 120:018101.https://doi.org/10.1103/PhysRevLett.120.018101

Multicellular life cycle of magnetotactic prokaryotesFEMS Microbiology Letters 240:203–208.https://doi.org/10.1016/j.femsle.2004.09.035

A twelvestep program for evolving multicellularity and a division of laborBioEssays : News and Reviews in Molecular, Cellular and Developmental Biology 27:299–310.https://doi.org/10.1002/bies.20197

The multiple origins of complex multicellularityAnnual Review of Earth and Planetary Sciences 39:217–239.https://doi.org/10.1146/annurev.earth.031208.100209

Formation of regular packets of Staphylococcus aureus cellsJournal of Bacteriology 129:1518–1523.https://doi.org/10.1128/jb.129.3.15181523.1977

BookThe Fluid Dynamics of Cell MotilityCambridge University Press.https://doi.org/10.1017/9781316796047

The embryonic origins of leftright asymmetryCritical Reviews in Oral Biology and Medicine 15:197–206.https://doi.org/10.1177/154411130401500403

On the squirming motion of nearly spherical deformable bodies through liquids at very small reynolds numbersCommunications on Pure and Applied Mathematics 5:109–118.https://doi.org/10.1002/cpa.3160050201

Morphological changes in Scenedesmus induced by infochemicals released in situ from zooplankton grazersLimnology and Oceanography 42:783–788.https://doi.org/10.4319/lo.1997.42.4.0783

Nutrient uptake by a selfpropelled steady squirmerThe Quarterly Journal of Mechanics and Applied Mathematics 56:65–91.https://doi.org/10.1093/qjmam/56.1.65

Absence of ferromagnetism or antiferromagnetism in one or twodimensional isotropic heisenberg modelsPhysical Review Letters 17:1133–1136.https://doi.org/10.1103/PhysRevLett.17.1133

Origins of multicellular evolvability in snowflake yeastNature Communications 6:1–9.https://doi.org/10.1038/ncomms7102

Homeostatic fluctuations of a tissue surfacePhysical Review Letters 115:258104.https://doi.org/10.1103/PhysRevLett.115.258104

Universal law for diffusive mass transport through mycelial networksBiotechnology and Bioengineering 118:930–943.https://doi.org/10.1002/bit.27622

Thermal fluctuations of large quasispherical bimolecular phospholipid vesiclesBiophysical Journal 45:891–899.

Network information and connected correlationsPhysical Review Letters 91:238701.https://doi.org/10.1103/PhysRevLett.91.238701

Topological metric detects hidden order in disordered mediaPhysical Review Letters 126:048101.https://doi.org/10.1103/PhysRevLett.126.048101

Force network ensemble: a new approach to static granular matterPhysical Review Letters 92:054302.https://doi.org/10.1103/PhysRevLett.92.054302

Structure, reproduction and differentiation in volvox carteri f. nagariensis lyengar, strains hk9 and 10Archiv Für Protistenkunde 111:204–222.

Yeast flocculation: a new perspectiveAdvances in Microbial Physiology 33:2–71.

Inherent variability in the kinetics of autocatalytic protein selfassemblyPhysical Review Letters 113:1–5.https://doi.org/10.1103/PhysRevLett.113.098101

A onebillionyearold multicellular chlorophyteNature Ecology & Evolution 4:543–549.https://doi.org/10.1038/s4155902011229

Noise in biologyReports on Progress in Physics. Physical Society 77:026601.https://doi.org/10.1088/00344885/77/2/026601

Direct visualization of longrange heterogeneous structure in dense colloidal gelsLangmuir : The ACS Journal of Surfaces and Colloids 19:509–512.https://doi.org/10.1021/la026303j
Decision letter

Irene GiardinaReviewing Editor; Università Sapienza, Italy

Aleksandra M WalczakSenior Editor; École Normale Supérieure, France
In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.
Decision letter after peer review:
Thank you for submitting your article "Cellular organization in labevolved and extant multicellular species obeys a maximum entropy law" for consideration by eLife. Your article has been reviewed by 2 peer reviewers, and the evaluation has been overseen by a Reviewing Editor and Aleksandra Walczak as the Senior Editor. The reviewers have opted to remain anonymous.
The reviewers have discussed their reviews with one another, and the Reviewing Editor has drafted this to help you prepare a revised submission.
Essential revisions:
Both reviewers praise the quality of the paper and the relevance of the results. There are not, in their opinion, critical aspects in the manuscript to be further addressed. However, they suggest a number of revisions, which would improve the clarity and presentation of the work. In particular, both reviewers think that more discussion is needed of the maximum entropy model, e.g. whether additional information on higher order structures or morphology related correlations might lead to more effective statistical models. Referee 2 also advises for more contextualization of the results, and a wider discussion about their generality.
Below the full reports, the authors are kindly invited to take into account the referees' comments in a revised version.
Reviewer #1:
The manuscript by Day and colleagues investigates the geometry of cell packing in two multicellular eukaryotes (snowflake yeast and green algae). Using a combination of experiments and models drawn from statistical physics, they show that the distribution of cellular neighborhood volumes follows a simple universal form – a modified gamma function – that arises from a maximum entropy argument. Using simulations of different growth processes, they then show that these universal distributions are ubiquitousarising, for example, even in correlated systems as long as there is a minimal level of noise. Finally, they show how these principles contribute to emergent evolutionary features (specifically group size distributions) in snowflake yeast, and use simple theoretical models to argue that fluctuations, while inherently stochastic, give rise to robust structures that do not depend sensitively on the microscopic and biological features of the system.
This paper is a beautiful example of how simple biophysical models can provide fundamental and unifying insight into complex biological systems. It is well written, addresses an important and timely topic, and raises intriguing questions about the balance between "regulated" biology and simple statistical physics as selective forces for evolution.
I have several comments for the authors to consider, at their discretion. Overall, I really enjoyed this paper and learned a great deal from it.
– The manuscript offers an interesting guiding principle that describes two considerably different biological systems. As the authors show in simulations, the principle is expected to hold over a broad range of conditions, but of course not universally (though even small levels of stochasticity broaden the range of applicability). I think the paper could be improved by expanding on the discussion of these limitations. In particular, it is not clear to me exactly how surprising it is to see "good" fits to a 2parameter distribution of this sort (or more generally, what level of "good" we should expect of the fits in finite data sets like these). The authors address this issue in part by showing fits to other distributions, which is nice. But I wonder if it would be helpful to also include (or at least discuss) more systematic model selection. To be clear, I find the analysis quite convincing as is. But I am trying to get my head around the limitations, and in particular, to get a feel for how likely one is to see similar "goodness of fit" results using other distributions with a relatively small number of parameters.
– Related to the previous point: one approach might be to construct a type of "null" model from the data, perhaps by systematically shuffling the data in some way and then bootstrapping to evaluate the likelihood of achieving fits of similar quality.
– Have the authors considered trying to systematically quantify the impact of including higherorder structures in the max ent model? For example, one could perhaps use multiinformation metrics (https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.91.238701) to evaluate the extent to which higherorder features of the data are relevant / necessary; the idea is essentially to construct maximum entropy models with various levels of data complexity "built in" and then evaluate (perhaps with an infotheoretic style metric) the extent to which that complexity improved the model. Perhaps something similar has been done for granular materials to capture higherorder statistics of packing? I ask this primarily out of curiosity, not as a serious criticism of the current approach. A discussion of this point might add to the paper.
Reviewer #2:
Day, Thomas C. et al. investigate the geometrical statistic of cell packing in multicellular organisms. Using a maximum entropy prediction originally developed in the study of granular materials, the authors show that the statistics of cell packing imposes a robust physical, entropic constraint on the geometrical arrangement of cells. Strikingly, the authors show that both snowflake yeast evolved under lab conditions and wildtype Volvox, which develop according to very different processes and have disparate overall morphology, both exhibit cell packing statistics consistent with the maximum entropy predictions. They then use simulations to show that entropic cellular packing can arise from various modes of multicellular development due to randomness in cell positions and that substantial deviations from entropic packing arise only in the case of low developmental noise (randomness) and strong correlations in cell positions. Finally, the authors use theory and measurements from experiments with yeast to show how maximum entropy statistics dictate size heritability in simple multicellular systems. Together, their results support the perhaps counterintuitive result that developmental randomness can actually underpin developmental reproducibility, in this case reproducibility in the geometry of cell packing in terms of the free space associated with individual cells within a multicellular structure. This work contributes the identification and new consideration of a fundamental physical constraint of particular relevance to the evolutionary origins of multicellularity and to multicellular morphogenesis in general.
The conclusions of this paper are well supported by the rigorous analysis of data and simulations.
Work on the evolution of multicellularity has traditionally focused on molecular and genetic mechanisms, but because multicellular morphogenesis is an inherently physical process, biophysical studies provide an important complementary perspective. A particular strength of this paper is that insights are derived from theory that requires few, but specific, conditions be met in order to be satisfied, and therefore stands to apply generally to diverse multicellular systems, irrespective of many differences between them. The combination of empirical results from disparate multicellular systems in conjunction with simulations encompassing an expanded set of multicellular morphologies and growth processes compellingly support the generality of the insights. Beyond simply speculating about the implications of entropic packing on the function of multicellular systems, the authors demonstrate impact or lack thereof on aspects of form and function in multicellular yeast and Volvox. Importantly, simulations allowed the authors to investigate in detail the robustness of theoretical predictions in terms of deviations from theory arising from developmental processes. In addition to providing new insight, this work lays the foundation for the exciting possibility of inferring aspects of developmental dynamics and regulation simply by observing the statistics of cell packing in an organism, which could be of great use in comparative evodevo studies where developmental processes are difficult or impossible to observe.
While the work is very strong overall, there are a few caveats to consider, primarily concerning the simulations. Multicellularity takes many forms by many different processes among eukaryotes. While simulations do cover a range of different morphologies and developmental processes found in nature, factors not explicitly addressed such as constraint or patterning by secreted extracellular matrix, differences in cell shape, cell migration, and others can lead to different kinds of multicellular form. The extent to which potential correlations imposed by diverse morphologies might lead to deviations from theoretical maximum entropy predictions, and how robust those deviations might be to noise is not entirely clear. Additionally, the randomness strength in simulations from high to low, while reasonable, does not appear to be grounded in empirical characterization of randomness strength in developmental processes across biological systems. Ultimately, although they leave some uncertainty as to the generality of the results, these limitations do not contradict or significantly diminish the key claims of the paper.
Comments for the authors:
1) The simulation results are compelling but left me with some questions. To what extent do the morphologies and processes investigated by simulations address the diverse forms of multicellularity encountered across eukaryotes? To what extent does the overall shape of the multicellular structure affect the cell packing distribution (e.g. multilobed structure as in Zoothamnium niveum, dichotomous branching as in Dinobryon, something with an undulating boundary)? Are there any examples of simple multicellular eukaryotes that might exhibit very strongly correlated cell positions? What is known about randomness strength or precision in developmental processes in biological systems, and if anything is known, how does this compare to values in simulations? Providing a bit more contextualization or motivation for specific choices in simulations could help address these questions and would support the generality of conclusions drawn from the simulations. Although I am convinced that the results hold for a broad range of multicellular architectures and do not think that the possible existence of a few edge cases contradicts the main conclusions of the work, it is not entirely clear to me that the effects of growth morphology, connection topology, and dimensionality have been accounted for.
2) The sections titled "Multicellular motility is robust to cellular area heterogeneity" starting on p. 11 is slightly perplexing. It is certainly interested, and I see that it addresses a question that may arise from analysis of Volvox cell packing, but in its current form, I do not believe it contributes substantially to the key points of the paper. The introduction section seemed to imply that the results would demonstrate that fluctuations in cell packing may play a role in the evolution of multicellular systems, but as I understood them, the results suggest that fluctuations do not affect motility, at least implying that there should be little to no effect on any aspect of fitness related to motility. It is possible that there could be other aspects of organismal fitness related to cell packing, so while these results are consistent with cell packing fluctuations not necessarily impeding or constraining the evolution of multicellularity, they do not strongly support that conclusion. Perhaps contextualizing the results a bit more in terms of key points of the paper while reporting them and referring to them in the Discussion section might help the reader better appreciate their significance within the context of the paper overall.
3) I might suggest removing or otherwise modifying the phrase "highlyevolved" (p.14) as its meaning is unclear, has connotations of evolutionary teleology, and clashes with the fact that all extant organisms share an evolutionary history of equal length. Maybe something such as "organisms with highlyregulated development" may be more appropriate.
4) Is anything known about the source of correlated subregions of cells in Volvox? Do the authors have any ideas about this? Either way, it would be interesting to know and may warrant at least a small comment in the text.
5) In the author list, SSH is missing an asterisk to denote corresponding authorship.
6) An "e" is missing in "surface" in the caption for Figure 2B.
7) I believe the dotted red line in Figure 4B should be a solid line to match those in panels A and C.
https://doi.org/10.7554/eLife.72707.sa1Author response
Reviewer #1:
[…] I have several comments for the authors to consider, at their discretion. Overall, I really enjoyed this paper and learned a great deal from it.
– The manuscript offers an interesting guiding principle that describes two considerably different biological systems. As the authors show in simulations, the principle is expected to hold over a broad range of conditions, but of course not universally (though even small levels of stochasticity broaden the range of applicability). I think the paper could be improved by expanding on the discussion of these limitations. In particular, it is not clear to me exactly how surprising it is to see "good" fits to a 2parameter distribution of this sort (or more generally, what level of "good" we should expect of the fits in finite data sets like these). The authors address this issue in part by showing fits to other distributions, which is nice. But I wonder if it would be helpful to also include (or at least discuss) more systematic model selection. To be clear, I find the analysis quite convincing as is. But I am trying to get my head around the limitations, and in particular, to get a feel for how likely one is to see similar "goodness of fit" results using other distributions with a relatively small number of parameters.
We completely agree that assessing goodness of fit is crucial, despite the fact that doing so is difficult for complex, nonlinear, nonmonotonic functions. Thanks to these comments, we have clarified our discussion of the topic and added a new analysis (described in detail in response to the next question). Thank you for this suggestion; we believe addressing it has strengthened the manuscript.
We agree that it is, in general, unclear how well one should expect 2parameter distributions to perform and what exactly distinguishes a good 2parameter distribution from a bad 2parameter distribution for our datasets. To rectify this, we consider four different 2parameter distributions that use the empirically measured mean and standard deviation (as well as the measured minimum cell size); they are the kgamma, normal, lognormal, and betaprime distributions (in the original text, we compared the data only to the normal and lognormal distributions). We do not calculate leastsquares fits for these distributions, but simply use the empirically measured values of the mean and variance to extract the two relevant parameters of each distribution. For both experiments and simulations, the kgamma distribution gives the best match to our data (Figure 2 Supplement 1). Further, as our simulations are less datalimited, we can confirm that the kgamma distribution is significantly more accurate than the other distributions at predicting the frequency of large Voronoi volumes.
These four distributions were purposefully chosen. The rationale for the kgamma distribution with regards to maximum entropy packing is detailed in the text. The normal, lognormal and betaprime choices are discussed below.
We chose the normal distribution as it is the maximum entropy distribution given only the mean and variance of a population; furthermore, it sits as the limiting distribution according to the central limit theorem. Therefore, should random fluctuations add together completely independently, we might expect to observe a (truncated) normal distribution of the volumes. The absence of agreement between the data and this distribution implies that there is an additional feature that must be considered; namely, that cell volumes are not completely independent of one another but rather must sum to match a total volume (this is, of course, a requirement for the kgamma maximum entropy distribution).
Similarly, the lognormal distribution was chosen for comparison due to its MaxEnt and CentralLimitlike properties. In essence, the lognormal distribution is the central limit theorem result for the logarithm of a variable. Further, volumes must be positive numbers, and the lognormal distribution only exists in the positive domain. Lognormal distributions are observed in many natural systems such as ecology, physiology, geology and more, and have recently been shown to describe organism swimming speeds quite well [6]. They define a good null model for any complex process with many interacting elements, particularly when there is multiplicative noise. By observing that the kgamma distribution outperforms the lognormal distribution, we recognize that the volumes are not multiplicatively independent of one another.
Last, we chose to compare the performance of the kgamma distribution to an arbitrary distribution with two parameters: we chose the betaprime distribution, which has two parameters (α and β) that can be computed from the mean and variance of the population. It also has a semiinfinite domain. As we might expect, the arbitrarily chosen betaprime distribution underperforms compared to the kgamma distribution.
Therefore, showing these four distributions together allows us to compare four “minimal” models, and to demonstrate that the kgamma distribution is the most accurate match for our experiments and simulations.
– Related to the previous point: one approach might be to construct a type of "null" model from the data, perhaps by systematically shuffling the data in some way and then bootstrapping to evaluate the likelihood of achieving fits of similar quality.
Thank you for this suggestion. We agree that this is an excellent approach to justify systematically the chosen distributions. For four 2parameter distributions (kgamma, normal, lognormal, and betaprime), we applied a bootstrapping/resampling approach to understand how much error one might expect to see in these fits given the finite data sets. We applied this approach to experimental and simulated snowflake yeast Voronoi volumes. We took variable number subsamples of these datasets and measured how these four 2parameter distributions perform. For each sample size, we computed the dataset’s mean and standard deviation, then used these parameters to calculate the necessary parameters for each of the distributions (i.e., we again did not do any “fitting” in a leastsquares sense). By calculating the rootmeansquare residual of the cumulative distribution functions, we found that the kgamma distribution always outperforms the others (see Figure 2 Supp 1c). Further, the kgamma distribution error continually decreases with increasing sample size (albeit slowly for large sample sizes), while the error plateaus at a larger value for all other distributions.
Next, we used the four distributions to predict the skewness of the simulation data. We empirically determined the first two moments of the dataset: we take these to be the first two moments for each of our distributions. Then, we used the first two moments to predict the skewness, which is related to the third moment of the distribution. We then compared this value to experimentallymeasured skewness, finding that the kgamma distribution estimated the experimentallyobserved skewness with an error of only 7 percent, while the other three distributions (normal, lognormal, and betaprime) estimated the skewness with percent errors of −100%, 96%, and 174%, respectively.
As an alternative approach, we test the ability of our selected distributions to perform in a leastsquares fit. In other words, we fit the kgamma, lognormal, Gaussian, and betaprime distributions to the observed dataset. Then, we compare the values of the fitted parameters (namely, the mean and variance) to the observed values. Distributions that more closely fit the observed data are expected to produce closer estimates of the observed moments. Indeed, upon leastsquares fitting the simulated dataset, we observe that the kgamma distribution fit to the data most closely matches both the observed mean and the observed standard deviation, indicating that it is not only the distribution that best describes the tail of the distribution (as visualized in Supplementary Figure 2) but also the best fit for the distribution when considering the first two moments.
Figure additions: We revised and appended two supplemental figures for this response and the previous one. We included one figure on the probability distribution and cumulative distribution functions for all datapoints (Figure 2—figure supplement 1a,b), and then an additional panel on the bootstrapping analysis (Figure 2—figure supplement 1c). Then, we added a figure detailing the comparison of the estimated skewness from each distribution to the true measured values, and then of the measured mean and variance to the leastsquares fit mean and variance for all four distributions (Figure 2—figure supplement 2).
Text additions: We added a subsection to the Methods section titled “Goodnessoffit analysis” in which we present these results. In this section, we describe our reasons for choosing the three comparison distributions, indicate the parameters involved in each, and finally compare the performance of each.
– Have the authors considered trying to systematically quantify the impact of including higherorder structures in the max ent model? For example, one could perhaps use multiinformation metrics (https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.91.238701) to evaluate the extent to which higherorder features of the data are relevant / necessary; the idea is essentially to construct maximum entropy models with various levels of data complexity "built in" and then evaluate (perhaps with an infotheoretic style metric) the extent to which that complexity improved the model. Perhaps something similar has been done for granular materials to capture higherorder statistics of packing? I ask this primarily out of curiosity, not as a serious criticism of the current approach. A discussion of this point might add to the paper.
Thank you for these thoughtprovoking resources and ideas. We agree that this would be a very interesting next step to take. In particular, in Figures 1, 3, and 4 we probe cases where our predictions and observations differ. We think that future work could build upon these observations, though the framework to do so for maximum entropy packing may not yet exist. Nevertheless, it would be very interesting to use higherorder correction terms to understand exactly what kinds of new considerations are relevant for organisms that achieve various levels of morphological complexity.
We now directly address this topic in the discussion, and discuss both the suggested higher order structure paper as well as recently published work that uses topological information about the contact network of bacterial cells to uncover universal motif distributions. By combining topological and further additions to geometric models, future research may build upon the platform constructed here.
Text Additions: Inspired by this comment, we added a new paragraph (the second) to the Discussion section.
Reviewer #2:
[…] While the work is very strong overall, there are a few caveats to consider, primarily concerning the simulations. Multicellularity takes many forms by many different processes among eukaryotes. While simulations do cover a range of different morphologies and developmental processes found in nature, factors not explicitly addressed such as constraint or patterning by secreted extracellular matrix, differences in cell shape, cell migration, and others can lead to different kinds of multicellular form. The extent to which potential correlations imposed by diverse morphologies might lead to deviations from theoretical maximum entropy predictions, and how robust those deviations might be to noise is not entirely clear. Additionally, the randomness strength in simulations from high to low, while reasonable, does not appear to be grounded in empirical characterization of randomness strength in developmental processes across biological systems. Ultimately, although they leave some uncertainty as to the generality of the results, these limitations do not contradict or significantly diminish the key claims of the paper.
Comments for the authors:
1) The simulation results are compelling but left me with some questions. To what extent do the morphologies and processes investigated by simulations address the diverse forms of multicellularity encountered across eukaryotes? To what extent does the overall shape of the multicellular structure affect the cell packing distribution (e.g. multilobed structure as in Zoothamnium niveum, dichotomous branching as in Dinobryon, something with an undulating boundary)? Are there any examples of simple multicellular eukaryotes that might exhibit very strongly correlated cell positions? What is known about randomness strength or precision in developmental processes in biological systems, and if anything is known, how does this compare to values in simulations? Providing a bit more contextualization or motivation for specific choices in simulations could help address these questions and would support the generality of conclusions drawn from the simulations. Although I am convinced that the results hold for a broad range of multicellular architectures and do not think that the possible existence of a few edge cases contradicts the main conclusions of the work, it is not entirely clear to me that the effects of growth morphology, connection topology, and dimensionality have been accounted for.
As this is a detailed comment with many independent questions, we interleave them and our responses below.
“To what extent do the morphologies and processes investigated by simulations address the diverse forms of multicellularity encountered across eukaryotes?”
Broadly, multicellular growth morphologies can be sorted into two general classes: intercellular bonds may be reformable, or they can be permanent (i.e., unreformable). Our manuscript addresses both classes.
“Permanent” intercellular bonds are not reformable if broken; mother and daughter cells remain physically attached after cell division is complete. This process occurs in many clades of multicellularity, including plants, green algae, brown algae, red algae, fungi, bacteria, and in some stages of animal development. Incomplete cell separation is one of the oldest forms of multicellular assembly and one of the most successful, dominating the planet’s biomass [1]. Thus, we have primarily focused on this class of growth morphologies, which includes experiments on snowflake yeast and Volvox carteri, which differ in terms of dimensionality. In two simulations, we aspire to construct these experimentally observed growth morphologies. In the remaining five simulations, we seek to address a range of other multicellular morphologies. In particular, we simulate groups with reformable, sticky intercellular bonds and groups of cells that are confined by a corralling membrane. “Sticky” aggregates are broadly prevalent in nature (e.g., flocculating yeast, bacterial aggregates, or slime molds), and growth morphologies featuring maternal membranes are common as well (e.g., animals, algae).
Of course, multicellularity is rich in form and function, and we cannot capture the growth morphologies of all multicellular organisms. However, sampling a range of growth morphologies with “reformable” and “nonreformable” bonds provides at least an initial sampling of the diversity of growth morphologies that exist. We specifically do not simulate growth morphologies of confluent tissues, whether these be assembled through permanent bonds or sticky cells. As confluent tissues exhibit packing fractions of nearly 1.0, the distribution of Voronoi volumes or areas is highly impacted by the cell size distribution, which is regulated at the cellular level.
Text additions: We added the following sentences to clarify the motivation for our simulations:
“These simulations captured a couple basic properties of multicellularity: organisms may grow in both two and three dimensions, and they may assemble with two different classes of bonds: bonds that are reformable, and bonds are not reformable. […] Finally, palintomy, or growth confined inside a maternal membrane, is also common from algae to animals.”
“To what extent does the overall shape of the multicellular structure affect the cell packing distribution (e.g. multilobed structure as in Zoothamnium niveum, dichotomous branching as in Dinobryon, something with an undulating boundary)? Are there any examples of simple multicellular eukaryotes that might exhibit very strongly correlated cell positions?”
This is a very interesting question. What is the role played by the boundary between ‘inside’ and ‘outside’ an organism? Geometric constraints, by definition, limit how cells can be arranged, and sufficiently strong geometric constraints, i.e., constraints that precisely determine cellular positions, could dampen or eliminate fluctuations in cell packing volumes. So, how do boundary conditions impact the cell packing distribution?
First, in the case of groups with unreformable bonds, such as snowflake yeast, Zoothamnium, and Dinobryon, cells are “frozen” into place (subject to minor displacements due to, for example, mechanical deformations). However, so long as the exact location is subject to random fluctuations, each organism will have a similar but different cell spatial structure. The maximum entropy principle thus suggests that the ensemble of all organisms precisely follows the most likely cell packing distribution, even though each individual organism deviates slightly from this distribution.
Second, we would like to clarify how we define the boundary of the organism. This is a nontrivial point: where should the boundary of a multicellular organism be drawn? In some cases, for example cells contained within an epithelial sheet (such as Volvox), the answer is fairly clear. However, even with snowflake yeast, an algorithm must be applied to mathematically define inside versus outside. We used multiple approaches to test that our results did not depend on the method chosen. In most analyses, we used a convex hull to determine the boundary of the organism. In others, we examined “shells” within snowflake yeast clusters; we calculated the Voronoi tessellation of the entire dataset, but then separately analyzed the distribution of Voronoi volumes for cells within a range of distances from the cluster center of mass (e.g., those within 15 microns). For distances smaller than the cluster radius, the shape of Voronoi volumes is unaffected by the exact choice of boundary. Crucially, we find that both approaches exhibit maximum entropy distributions of cell packing volumes, suggesting that the convex hull method and the rough, fractallike boundary of snowflake yeast does not significantly influence the accuracy of maximum entropy cell packing predictions.
Having clarified those points, we now more directly address the referee’s question: could a particular boundary condition make cell positions more correlated, damping fluctuations and producing deviations from maximum entropy predictions? While we cannot completely address this question, at a minimum it is known that packing statistics in thermal systems are different near and far from boundaries. This phenomenon is well demonstrated in soft matter physics, impacting everything from colloidal particles to polymers [3, 5, 7, 8, 9]. Thus, we expect that boundaries can and do impact packing statistics in multicellular organisms. In particular, it is likely that the maternal membrane we simulate causes similar effects. To test this hypothesis, we returned to our simulations of palintomy within a confining membrane to examine how the boundary impacted cell packing statistics. We bin cells into shells of finite width centered at the center of mass of the organism. As expected, we find that the distribution of Voronoi volumes depends on the distance from the boundary; cells located near the boundary are packed more densely than cells deeper in the interior. However, we find that packing within the ‘core’ and within the ‘shell’ still follow separate maximum entropy predictions (see Figure 2—figure supplement 3).
There are, in fact, a number of simple multicellular organisms with highly correlated cell positions. Volvox carteri is just such an example. More broadly, linear filaments (e.g., cyanobacteria) are highly correlated; the topological constraints of the bond network in linear filaments present strong nonstochastic effects, and their cell packing distribution will generally not follow the kgamma distribution. Last, organisms that form with confluent cell layers will likely be highly correlated as the distribution of Voronoi volumes or areas will largely arise from the distribution of cell size, a trait that is highly regulated at the cellular level.
Text additions: To make this subject more clear to the reader, we have revised the first paragraph of the subsection “Snowflake Yeast” in the section “Experimental tests of multicellular maximum entropy predictions”. We have changed this paragraph to read as follows:
“Snowflake yeast grow via incomplete cytokinesis, generating branched structures in which motherdaughter cells remain attached by permanently bonded cell walls (Figure 1A). […] Conversely, if there are strong correlations in the locations of daughter cells, then we will observe deviations from maximum entropy predictions regarding the cell neighborhood volumes.”
We also added a sentence about confluent cell layers: “However, we would not expect this prediction to hold for confluent cell layers (i.e. packing fraction φ ≈ 1), since in general cell size is a highly regulated process; when cells tile space, the cell size distribution is highly correlated with the cell neighborhood volume distribution.”
Figure additions: We have added Figure 2—figure supplement 3, in which we show that in simulations of cells confined by a maternal membrane, the membrane poses boundary effects on the packing of the group, as shown in panel A. In panel A, we show the volume of the Voronoi volumes as a function of the distance from the center of mass (in normalized units). Then, we partition the membrane into spherical shells and show that within each shell, the distribution of Voronoi volumes is welldescribed by the kgamma distribution (panels BE).
“What is known about randomness strength or precision in developmental processes in biological systems, and if anything is known, how does this compare to values in simulations?”
A number of studies have shown that fluctuations can be crucial to highlyfunctioning developmental processes, for instance in sheet folding sepal growth [2, 4]. Such studies tell us three important things: (i) randomness can be just as integral a component of development as directed growth; (ii) it may be difficult to disentangle exactly which reproducible qualities and traits stem from a regulated developmental process, and which stem from random processes; and (iii) these studies imply that random processes may be more prevalent in development than we currently know, as there are not many tools by which we can actually measure the effect of randomness. With this in mind, our manuscript provides a mechanism for testing the effect of randomness on cell packing distributions. Our simulations provide test cases for what we might expect to observe in the presence or absence of randomness. We show how deviations from maximum entropy packing predictions occur, and that they indicate the presence of correlations. With this information, future work can then use this tool to quantify the randomness strength of various developmental processes.
Text additions: To highlight better this point, we have modified a paragraph of the text. The first paragraph of the section “The crucial role of randomness” now reads:
“How much randomness is necessary for the kgamma distribution to predict cell neighborhood size distributions? […] We do not attempt to recapitulate exactly any naturallyoccurring levels of randomness strength; rather, we are looking to investigate the limits about how well maximum entropy predictions can measure a deviation from pure randomness.”
2) The sections titled "Multicellular motility is robust to cellular area heterogeneity" starting on p. 11 is slightly perplexing. It is certainly interested, and I see that it addresses a question that may arise from analysis of Volvox cell packing, but in its current form, I do not believe it contributes substantially to the key points of the paper. The introduction section seemed to imply that the results would demonstrate that fluctuations in cell packing may play a role in the evolution of multicellular systems, but as I understood them, the results suggest that fluctuations do not affect motility, at least implying that there should be little to no effect on any aspect of fitness related to motility. It is possible that there could be other aspects of organismal fitness related to cell packing, so while these results are consistent with cell packing fluctuations not necessarily impeding or constraining the evolution of multicellularity, they do not strongly support that conclusion. Perhaps contextualizing the results a bit more in terms of key points of the paper while reporting them and referring to them in the Discussion section might help the reader better appreciate their significance within the context of the paper overall.
We agree that this section could be made clearer to the reader. Broadly, random perturbations could be detrimental, beneficial, or neutral to an organism. Naively, we may expect that perturbations push an organism further from an ‘ideal’ morphology and thus are largely detrimental. With snowflake yeast, we show that randomness in cell positions can lead to highly repeatable group level traits, which allows emergent multicellular traits to become remarkably heritable (ZamaniDahaj et al., 2021: doi.org/10.1101/2021.07.19.452990). In this way, entropic cell packing imbues nascent multicellular organisms like snowflake yeast with an evolutionary benefit, allowing natural selection to act upon multicellular traits that emerge from the mechanics of cellular packing. Our analysis of Volvox swimming speed shows that randomness can also be neutral. One may have expected that swimming speed is optimized via precise arrangement of flagella; instead, our results show that, to leading order, swimming speed is unaffected by cell packing fluctuations. Thus, while randomness does not produce a benefit to Volvox swimming speed, the fact that optimal swimming does not require an ordered array of flagella represents the absence of a barrier. Swimming speed, a complex grouplevel trait, can be optimized without first evolving a developmental mechanism that produces precisely arranged cells.
Text additions: We make the following additions to the text to help the reader contextualize these results:
“However, the effect of heterogeneity in cell area on motility is as yet unknown; naively, we might expect that heterogeneities in flagella packing may lower the motility of the organism in comparison to a highly regular arrangement. […] The absence of such a barrier may represent a scenario where trait optimization can be achieved without first evolving sophisticated developmental genes.”
3) I might suggest removing or otherwise modifying the phrase "highlyevolved" (p.14) as its meaning is unclear, has connotations of evolutionary teleology, and clashes with the fact that all extant organisms share an evolutionary history of equal length. Maybe something such as "organisms with highlyregulated development" may be more appropriate.
We agree and have made the suggested change.
4) Is anything known about the source of correlated subregions of cells in Volvox? Do the authors have any ideas about this? Either way, it would be interesting to know and may warrant at least a small comment in the text.
We agree that this question should be addressed in the manuscript. Volvox exhibit a known anterior/posterior polarity. The organism will orient its anterior pole towards a light source and swim in that direction. It is possible that this polarity also affects the somatic cell area distribution.
Typically, an anterior/posterior polarity is measured by observing a moving organism: however, in our snapshot data, the organism is stationary. In other cases, the distribution of the germ cells within the interior of the organism can be used, at some stages of the Volvox lifespan, to correlate with the anterior/posterior positions. However, we have considered here only the somatic cells, and although we have returned to the original images and labeled the germ cells, we have found that, at best, any labeling of anterior/posterior is messy and unreliable due to uncertainty both in the position of the germ cells and in extrapolating those positions to define a definitive pole. Therefore, we do not report these measurements here, but instead discuss the idea that anterior/posterior polarity may affect the correlated subregions of the somatic cell areas.
Text additions: We have revised the first paragraph of the subsection “The role of spatial correlations” to read:
“While we have shown that the distribution of cell neighborhood volumes closely follows the kgamma distribution in two very different organisms, we have also seen that in some cases maximum entropy predictions are more accurate in subsections of an organism than across its entirety. […] This polarity may affect the somatic cell packing distribution in different subregions of the organism, leading to deviations from maximum entropy predictions.”
5) In the author list, SSH is missing an asterisk to denote corresponding authorship.
Corrected.
6) An "e" is missing in "surface" in the caption for Figure 2B.
Corrected.
7) I believe the dotted red line in Figure 4B should be a solid line to match those in panels A and C.
We appreciate your close scrutiny of our figures. Here, the dotted red line signifies that there is only a small, discrete number of observed cell volumes when there is no noise added to the system. We modified the caption to emphasize this point and ensure it is clear to the reader.
Text additions: We added the following sentences to the figure caption:
“Note: the discrete points in (B) arise because, absent randomness in the cell locations, the exact same cellular structure is achieved by every simulation. […] Upon adding randomness, the cellular structure is altered between successive simulations, and the distribution again achieves continuity.”
References:
1. Yinon M. BarOn, Rob Phillips, and Ron Milo. “The biomass distribution on Earth”. In: Proceedings of the National Academy of Sciences 115.25 (2018), pp. 6506–6511. issn: 10916490. doi: 10.1073/pnas.1711842115.
2. Pierre A. Haas et al. “The noisy basis of morphogenesis: Mechanisms and mechanics of cell sheet folding inferred from developmental variability”. In: PLoS Biology 16.7 (2018), pp. 1–37. issn: 15457885. doi: http://dx.doi.org/10.1371/journal.pbio.2005536.
3. K. Harth, A. Mauney, and R. Stannarius. “Frustrated packing of spheres in a flat container under symmetrybreaking bias”. In: Physical Review E – Statistical, Nonlinear, and Soft Matter Physics 91.3 (2015), pp. 1–5. issn: 15502376. doi: 10.1103/PhysRevE.91.030201.
4. Lilan Hong et al. “Variable Cell Growth Yields Reproducible OrganDevelopment through Spatiotemporal Averaging”. In: Developmental Cell 38.1 (2016), pp. 15–32. issn: 18781551. doi: http://dx.doi.org/ 10.1016/j.devcel.2016.06.016.
5. S´ara L´evay et al. “Frustrated packing in a granular system under geometrical confinement”. In: Soft Matter 14.3 (2018), pp. 396–404. issn: 17446848. doi: 10.1039/c7sm01900a.
6. Maciej Lisicki et al. “Swimming eukaryotic microorganisms exhibit a universal speed distribution”. In: eLife 8 (2019), pp. 1–20. issn: 2050084X. doi: 10.7554/eLife.44907.
7. Joerg Reimann et al. “Pebble bed packing in prismatic containers”. In: Fusion Engineering and Design 88.910 (2013), pp. 2343–2347. issn: 09203796. doi: 10.1016/j.fusengdes.2013.05.100. url: http://dx.doi.org/10.1016/j.fusengdes.2013.05.100.
8. Mohammad Mahdi Roozbahani, Bujang B.K. Huat, and Afshin Asadi. “Effect of rectangular container’s sides on porosity for equalsized sphere packing”. In: Powder Technology 224 (2012), pp. 46–50. issn: 00325910. doi: 10.1016/j.powtec.2012.02.018. url: http://dx.doi.org/10.1016/j.powtec.2012. 02.018.
9. Yu G. Stoyan and G. N. Yaskov. “Packing identical spheres into a cylinder”. In: International Transactions in Operational Research 17.1 (2010), pp. 51–70. issn: 14753995. doi: 10.1111/j.14753995.2009.00733.x.
https://doi.org/10.7554/eLife.72707.sa2Article and author information
Author details
Funding
National Institutes of Health (1R35GM138030)
 Will Ratcliff
Wellcome Trust (207510/Z/17/Z)
 Stephanie S Hohn
 Raymond E Goldstein
Engineering and Physical Sciences Research Council (EP/M017982/1)
 Raymond E Goldstein
National Institutes of Health (1R35GM13835401)
 Peter J Yunker
Engineering and Physical Sciences Research Council (Vacation Bursary)
 Hannah R Sleath
John Templeton Foundation (A009723003)
 Raymond Goldstein
 Stephanie S Höhn
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Acknowledgements
Core Facilities at the Carl R Woese Institute for Genomic Biology. WCR was supported by NIH grant 1R35GM138030. This work was funded in whole, or in part, by the Wellcome Trust (Grant 207510/Z/17/Z; REG & SSH). For the purpose of open access, the authors have applied a CC BY public copyright license to any Author Accepted Manuscript version arising from this submission. This work was also supported in part by Established Career Fellowship EP/M017982/1 from the Engineering and Physical Sciences Research Council (REG).
Senior Editor
 Aleksandra M Walczak, École Normale Supérieure, France
Reviewing Editor
 Irene Giardina, Università Sapienza, Italy
Publication history
 Received: August 2, 2021
 Accepted: January 4, 2022
 Version of Record published: February 21, 2022 (version 1)
Copyright
© 2022, Day et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics

 1,890
 Page views

 159
 Downloads

 2
 Citations
Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading

 Physics of Living Systems
Foraging mammals exhibit a familiar yet poorly characterized phenomenon, ‘alternation’, a pause to sniff in the air preceded by the animal rearing on its hind legs or raising its head. Rodents spontaneously alternate in the presence of airflow, suggesting that alternation serves an important role during plumetracking. To test this hypothesis, we combine fully resolved simulations of turbulent odor transport and Bellman optimization methods for decisionmaking under partial observability. We show that an agent trained to minimize search time in a realistic odor plume exhibits extensive alternation together with the characteristic castandsurge behavior observed in insects. Alternation is linked with casting and occurs more frequently far downwind of the source, where the likelihood of detecting airborne cues is higher relative to ground cues. Casting and alternation emerge as complementary tools for effective exploration with sparse cues. A model based on marginal value theory captures the interplay between casting, surging, and alternation.

 Physics of Living Systems
Computational model reveals why pausing to sniff the air helps animals track a scent when they are far away from the source.