Phylogenetic divergence of cell biological features

  1. Michael Lynch  Is a corresponding author
  1. Arizona State University, Arizona

Abstract

Most cellular features have a range of states, but understanding the mechanisms responsible for interspecific divergence is a challenge for evolutionary cell biology. Models are developed for the distribution of mean phenotypes likely to evolve under the joint forces of mutation and genetic drift in the face of constant selection pressures. Mean phenotypes will deviate from optimal states to a degree depending on the effective population size, potentially leading to substantial divergence in the absence of diversifying selection. The steady-state distribution for the mean can even be bimodal, with one domain being largely driven by selection and the other by mutation pressure, leading to the illusion of phenotypic shifts being induced by movement among alternative adaptive domains. These results raise questions as to whether lineage-specific selective pressures are necessary to account for interspecific divergence, providing a possible platform for the establishment of null models for the evolution of cell-biological traits.

https://doi.org/10.7554/eLife.34820.001

eLife digest

When most people think about evolution, they commonly think of natural selection: the evolutionary force that helps populations to develop toward an optimum state for their environment. The observable traits and features of a cell or organism are known as its phenotype. Under natural selection, genes that produce phenotypes that help a cell or organism to thrive and reproduce are more likely to be passed on to future generations. This means that over several generations the population becomes – on average – better adapted to its environment.

Other ‘non-adaptive’ evolutionary forces also influence phenotype. For example, damage to DNA can introduce mutations into the genes that a cell or organism passes on to their offspring. Some mutations are more likely to produce working variants of a gene than others; this is known as a mutation bias. In addition, even in the absence of natural selection, the proportion of particular gene variants in a population changes over the generations because genes are randomly transmitted and not all individuals reproduce. This is known as genetic drift. Together, mutation bias and genetic drift could prevent a population’s average phenotype from reaching an optimal state.

Lynch has now developed mathematical models that describe how certain biological features of cells – such as the structure of the proteins they produce – are likely to evolve due to mutation bias and genetic drift. These models show that these evolutionary processes can cause the features of the cells in a population to diversify, which often leads to a suboptimal average phenotype. Lynch calculated that two alternative phenotypes could even emerge in isolated populations in cases where there is only one optimum phenotype. For example, a mutation bias could drive some cells in one population to evolve one phenotype, while natural selection drives another population towards the other phenotype.

Overall, the model emphasizes that natural selection is not the only force that drives diversity in cells. Future research into cell biology needs to take a broad view of the joint roles played by natural selection, mutation bias and genetic drift.

https://doi.org/10.7554/eLife.34820.002

Introduction

As with nearly all biological traits, most cellular features vary among individuals within populations in a nearly continuous fashion, owing to genetic differences among individuals and the myriad of stochastic factors experienced by all organisms (ranging from intrinsic cellular noise to external environmental forces; Lynch and Walsh, 1998). This is true, for example, for catalytic rates, rates of gene expression and intracellular transport, numbers and sizes of organelles, etc. Ultimately, some fraction of within-species genetic variation is transformed into among-species divergence as alternative alleles arise by mutation and in some cases proceed to fixation (Wright, 1969; Walsh and Lynch, 2018). The magnitude of such divergence is dictated by three major evolutionary factors: the pattern of selection (the phenotypic fitness function), which imposes a directional and/or stabilizing force on the mean phenotype; the rate of origin and distribution of mutational effects, which define the raw materials upon which natural selection operates; and the power of random genetic drift, which imposes noise on the selective process.

Although considerable effort has been devoted to understanding the divergence of mean phenotypes among lineages (Walsh and Lynch, 2018), most of this work is focused on the evolution of morphological phenotypes in response to external pressures, which can vary greatly depending on the ecological setting. In contrast, owing to homeostatic effects, the internal environment of cells remains largely constant over long time scales and broad geographic locations, raising the possibility of establishing general evolutionary principles that transcend the imposition of transient ecological changes. (The same might be true for the internal organs of multicellular species).

The goal here is to derive general expressions for the divergence of mean phenotypes among species under scenarios that are likely to hold for a wide variety of cellular traits. The specific focus will be on the magnitude of divergence expected among lineages in the face of identical evolutionary forces, as this helps clarify the degree to which phenotypic diversification can proceed in the absence of lineage-specific selection pressures. Such a perspective is essential to establishing the degree to which adaptive explanations need to be sought to explain patterns of variation among populations.

The general approach will draw from well-established constructs employed in the field of quantitative genetics (the study of continuously distributed traits with a multifactorial genetic basis; Lynch and Walsh, 1998; Walsh and Lynch, 2018). The traditional focus of this field has been on complex traits in multicellular species, but these same methods can be profitably applied to intracellular morphological and molecular features, such as those involved in the cytoskeleton, gene expression, binding energy, and metabolic rates (Nourmohammad et al., 2013; Farhadifar et al., 2015; Phillips and Bowerman, 2015). Indeed, although most work in phenotypic evolution proceeds as though cellular details are irrelevant, the models employed may be equally if not more relevant to cell-biological traits, owing to their potentially less temporally variable fitness effects.

Theory

The distribution of mean phenotypes

All genetically encoded traits are subject to the recurrent forces of mutation and random genetic drift, and potentially to selection. Selection favors some genotypes over others, while mutation modifies existing genotypes independent of the selective process, and random genetic drift causes stochastic variation in gene transmission across generations. Owing to this latter factor, even if the forces of selection and mutation remain constant, the population mean phenotype of a trait will wander within a certain range over evolutionary time, with the frequency of occurrence of alternative mean phenotypes depending on patterns and strengths of selective and mutational effects (Figure 1).

An idealized overview of the model for the evolution of the distribution of mean phenotypes, given here for a trait under stabilizing selection.

The upper panel denotes a hypothetical phenotype distribution at a single point in time. The population consists of multiple genotypes, each having an expected genotypic value (red) but a range of phenotypes (black distributions) resulting from variance in residual deviations (environmental effects and nonadditive genetic factors). The phenotype distribution for the entire population (red) is the sum of these genotype-specific curves, and has a mean denoted by the blue line. The exact location of this overall distribution can wander over time, owing to the joint forces of selection, mutation, and random genetic drift. The lower panel gives the overall distribution of population means over a long evolutionary time span, with 11 locations at specific points of time being denoted by the short vertical lines. Persistent mutational bias towards smaller phenotypes prevents the overall distribution of means from coinciding with the fitness-function optimum, and random genetic drift causes a dispersion of means around the overall average value.

https://doi.org/10.7554/eLife.34820.003

The focus of this study, the stationary distribution of mean phenotypes, can be viewed as a summary distribution of: (1) phenotypic means across a large number of replicate populations exposed to identical conditions for a very long period; or (2) a historical survey of mean phenotypes in a single population over a long time period, again under constant environmental and population-genetic conditions. Among many other applications, such an approach has long been exploited in attempts to understand the steady-state distribution of allele frequencies expected under a constant regime of selection, mutation, and random genetic drift (e.g. Wright, 1969). From an empirical perspective, this steady-state view of evolution implicitly assumes that enough time has elapsed between observed taxa that the dynamics of the evolutionary process are of negligible significance (which would not be the case for closely related species).

The approach taken here relies on the Kolmogorov forward equation for a diffusion process (Appendix 1, Walsh and Lynch, 2018), the assumption being that the trait of interest is continuously distributed, with z denoting the phenotypic value of an individual. The population mean, z¯, moves in arbitrarily small increments each generation via the deterministic forces of selection and mutation and the stochastic process of drift. Under most reasonable biological conditions, independent of the starting conditions, a stationary distribution of mean phenotypes (among hypothetical replicate populations) is eventually converged upon, at which point there is an exact balance between opposing forces. The probability that a population’s mean phenotype will reside at any particular point is defined by this distribution, which has the general form

(1a) Φ(z¯)=Cexp(2z¯[M(x)/V(x)]dx),

where M(x) defines the rate of directional change (resulting from selection and/or mutation) for a population with mean phenotype x, and V(x) is the variance in change (resulting from drift). C is the normalization constant (containing only terms that are independent of z¯) that ensures that the entire probability density sums to 1.0.

For a quantitative trait, the directional term can be subdivided into independent selection and mutation components, Ms(x) and Mm(x), both of which will be discussed in detail below. Under the assumption of negligible genotype × environment interaction and epistasis, the variance of the change in means, which results from the sampling of heritable genotypic values of individuals, is equal to the underlying additive genetic variance for the trait, σA2, divided by the effective population size, Ne, in the case of haploidy (assumed here; and 2Ne in the case of diploidy). The latter is typically far below the number of reproductive individuals in the population, and defined by various demographic features and interference imposed by chromosomal linkage, with values ranging between 105 for multicellular eukaryotes to 109 for bacteria (Charlesworth, 2009; Lynch et al., 2016; Walsh and Lynch, 2018).

Individual phenotypes are comprised of the sum of a heritable additive genetic component (A) and a nonheritable residual deviation (e, which includes environmental and nonadditive genetic effects), such that z=A+e, with the within-population phenotypic variance being partitioned as σz2=σA2+σe2. For cellular features, a large fraction of σe2 may be a consequence of stochastic gene expression, imprecise placement of cell-division septa, etc. Assuming that both σA2 and Ne remain constant, which is the model adhered to here, Equation (1a) can be rewritten as

(1b) Φ(z¯)=Cexp((2Ne/σA2)z¯[Ms(x)+Mm(x)]dx),

showing that the stationary distribution of mean phenotypes (conditional on a particular level of genetic variance, a point that will be returned to below) is proportional to the product of the distributions expected under selection alone and under mutation alone. With extremely weak selection, Ms(x) would be essentially a flat function, with the overall distribution reflecting the biases due to mutation alone. Conversely, with a flat mutation function, an unlikely scenario, the distribution will follow that expected under selection alone.

The process of selection

The influence of selection on the mean phenotype (the response to selection) is embodied in the breeder’s equation,

(2) Ms(z¯)=z¯(t+1)z¯(t)=h2[ z¯s(t)z¯(t) ],

a general statement about the connection between directional selection within generations and the transmission of such change across generations (Walsh and Lynch, 2018). Here, z¯(t) and z¯s(t) denote the mean phenotypes before and after selection in generation t, the difference being the selection differential. The heritability of the trait, h2=σA2/σz2, which equals the proportion of the total phenotypic variance, σz2, associated with additive genetic variation, σA2, constitutes the fraction of the within-generation change in the mean transmitted to the next generation.

Critical to everything that follows, the selection differential can be described in terms of the within-population phenotype distribution, p(z,t), and the function relating individual fitness to phenotype, W(z). The mean fitness in generation t is

(3) W¯=p(z,t)W(z)dz.

The mean phenotype after selection (but before inheritance) is then obtained by weighting the pre-selection phenotypes by their relative fitnesses,

(4) z¯s(t)=1W¯zp(z,t)W(z)dz.

We will make use of the fact that most quantitative traits have an approximately normal phenotype distribution on some scale of measurement, which follows from the central limit theorem (Lynch and Walsh, 1998). The distribution of individual measures is therefore described completely by the phenotypic mean and variance,

(5) p(z,t)=12πσz2exp([ zz¯(t) ]22σz2).

Substituting Equation (5) into (3) and differentiating, the change in mean fitness with respect to mean phenotype is

(6) W¯z¯(t)=p(z,t)z¯(t)W(z)dz=1σz2[ zz¯(t) ]p(z,t)W(z)dz

(Lande, 1976). From Equation (4), the first term to the right of the integral is equal to z¯s(t)W¯, and the second term is z¯(t)W¯. This provides a direct link to Equation (2), which upon rearrangement becomes

(7) Ms(z¯)=σA2W¯W¯z¯(t).

This expression states that, provided the phenotype distribution is normal, the change in mean phenotype caused by selection is equal to the product of the genetic variance for the trait and the gradient in the logarithm of mean fitness with respect to mean phenotype. Evolution by natural selection comes to a standstill when there is no genetic variance for the trait or the phenotypic mean resides at a point where the slope of the function of mean fitness with respect to mean phenotype is zero. To endow this expression with practical utility, specific expressions for the fitness function, W(z), will be considered below.

The process of mutation

Most attempts to consider the long-term evolutionary features of quantitative traits have assumed one of two mutation models: (1) a distribution of mutational effects always having a mean equal to zero and a constant variance, independent of the starting genotype (Kimura, 1965; Lande, 1975; Lynch and Hill, 1986); or (2) a rate of appearance of each type of mutant allele being independent of the ancestral type (Cockerham, 1984; Turelli, 1984). Under the first scenario, mutation has no directional effect on the mean phenotype, and there are no bounds on the possible mutational effects or the physical limits to which the trait can evolve. Under the second scenario, there is a physical limit to phenotypic divergence, and because the directional effect of mutations depends on the current location, more extreme alleles generate mutations with effects biased back toward the center of the distribution.

Neither of these mutational schemes captures the features of a wide variety of cell biological traits, which often have finite numbers of possible states and state-dependent spectra of mutational effects. A few examples will suffice to make this point. Protein-protein interactions (e.g. the interfaces between dimeric molecules) typically depend on no more than a few dozen amino-acid sites. The same is true for intramolecular interactions such as the constellation of backbone residues that assemble during protein folding. In both cases, the underlying residues operate in an approximately binary manner, for example, hydrophobic vs. hydrophilic, or hydrogen-bonding vs. non-hydrogen bonding. Likewise, the catalytic sites of enzymes often consist of a small-to-moderate numbers of residues that either facilitate or inhibit catalytic rates, and the sizes of intracellular organelles and cytoskeletal components are constrained by cell size. Many other examples could be cited, including those involved in RNA-RNA and DNA-protein interactions.

The approximate structure of a mutation function with a bounded range can be arrived at by considering a trait determined by n binary factors (or sites), each with state b having effect 0, and state B having effect m. For a trait with an additive genetic basis, the mean phenotype in a haploid population can then be represented as

(8) z¯=z0+nmq¯,

where z0 is an arbitrary baseline value for the trait, and q¯ is the mean frequency of B-type alleles averaged over all n factors in the population (Lynch and Walsh, 1998).

Letting u be the mutation rate from B to b alleles, and v be the reciprocal rate, the per-generation change in the mean phenotype resulting from mutation is

(9) Mm(z¯)=nm[v(1q¯)uq¯].

With q^=v/(u+v) being the equilibrium frequency of B alleles under mutation pressure alone, and θm=z0+nmq^ being the expected mean phenotype under neutrality, Equation (9) further reduces to

(10) Mm(z¯)=(u+v)(z¯θm).

This expression is quite general in that (z¯θm) is simply the distance of the mean phenotype from that expected under mutation equilibrium, and (u+v) is a measure of the mutational restoring force per locus. The essential feature of Equation (10) is that mutation acts to reduce the distance between the mean phenotype and θm to a degree that depends on the magnitude of this deviation. Charlesworth (2013) implemented a similar mutation model in an investigation of genomic features.

The stationary distribution of mean phenotypes

Application of Equations (7) and (10) to (1b) yields a useful simplification of the stationary distribution that will be adhered to below,

(11) Φ(z¯)=C[ W¯(z¯) ]2Neexp((z¯θm)22σN2),

with σN2=σA2/[2Ne(u+v)]. As will be discussed below, under neutrality, the genetic variance σA2 often scales directly with Ne, and population size would have no influence on the distribution in this limiting case, as σN2 would be independent of Ne. More generally, σA2 is also a function of the intensity of selection, but the bulk of the steady-state distribution will be represented by mean phenotypes that are in the range of effective neutrality with respect to each other, so the scaling relationship of σA2 under neutrality is expected to be a reasonable first-order approximation.

Equation (11) shows that, provided the genetic variance remains roughly constant, the stationary distribution is equal to the product of the expectation under neutrality (where mutation and drift are the only operable evolutionary forces) and the mean fitness function exponentiated by 2Ne, that is, the stationary distribution is equivalent to a transformation of the neutral expectation by a function of the fitness landscape. Thus, to obtain the overall distribution in the following applications, we require an expression for mean population fitness in terms of the trait mean.

In what follows, insight into the approximate magnitude of σN2 will be useful. This can be achieved by noting that 2Ne(u+v) will have values of the order of magnitude of 4Neμ, where μ is the mutation rate per nucleotide site. This composite parameter is equivalent to the amount of standing heterozygosity at neutral nucleotide sites in natural populations under mutation-drift equilibrium, and generally ranges from 0.001 to 0.1, with the lower and higher ends of the range being typical in vertebrates and microbes, respectively (Lynch, 2007). Thus, because heritabilities (σA2/σz2) of traits are typically on the order of 0.1 to 0.5 (Lynch and Walsh, 1998), σN2 is expected to be in the range of 1× to 100× the average within-population phenotypic variance for the trait.

Selection for an intermediate optimum

A commonly assumed form of selection, probably relevant to many cellular features, is the Gaussian (bell-shaped) fitness function with an intermediate optimum phenotype, θs, and a width, ω, determining the strength of selection around the optimum,

(12) W(z)=exp(( zθs )22ω2).

Application of this expression to Equations (3) and (4) leads to the expression for mean population fitness, which when applied to Equation (7) yields the expression for Ms(z¯) necessary for obtaining the stationary distribution (Table 1). The latter expression shows that the change in the mean phenotype resulting from selection is directly proportional to the deviation of the current mean phenotype from the optimum and inversely proportional to the sum of the squared width of the fitness function and the total phenotypic variance (Lande, 1976). As will be seen repeatedly below, phenotypic variance (an inevitable consequence of external environmental and internal cellular effects) generally reduces the efficiency of selection by diminishing the correspondence between genotype and phenotype. If the mean phenotype were to evolve to the optimum, z¯=θs, which is highly unlikely with biased mutation pressure, selection would be purely stabilizing in nature, operating only to reduce the variation around the mean.

Table 1
Formulae for mean population fitness, W¯(z¯), and the rate of change of the mean phenotype resulting from selection, Ms(z¯), obtained from Equations (4) and (6), respectively.
https://doi.org/10.7554/eLife.34820.004
Model

W¯(z¯)

Ms(z¯)

Gaussian

ω2ω2 + σz2exp(( z¯θs )22(ω2 + σz2))

σA2(z¯θs)ω2 + σz2

Hyperbolic

1α exp{β[ z¯(βσz2/2) ]}

σA2αβexp{β[ z¯(βσz2/2) ]}α

Sigmoid

11 + exp[ (β/γ)(z¯z) ]

σA2(β/γ)1 + exp[ (β/γ)(z¯z) ]

With both the selection and mutation terms in Equation (11) being Gaussian functions, the product is also Gaussian (Lande, 1976), in this case leading to a stationary distribution of mean phenotypes

(13a) Φ(z¯)=12πσ2(z¯)exp([z¯μ(z¯)]22σ2(z¯)),

with overall mean

(13b) μ(z¯)=(θs/σS2)+(θm/σN2)(1/σS2)+(1/σN2)=κθs+θmκ+1,

and variance

(13c) σ2(z¯)=1(1/σS2)+(1/σN2)=σN2κ+1,

where κ=σN2/σS2, with σS2=(ω2+σz2)/(2Ne) and σN2 (as defined as above) being the variances of the contributions associated with selection and mutation.

Equation (13b) states that the grand mean is equal to a weighted average of the expectations under mutation and selection alone (each component being weighted by the inverse of the variance of the function). Equation (13c) states that the variance of means is equal to half the harmonic mean of the variances associated with selection and mutation alone. As σS2, which implies a flatter fitness function and hence an approach toward neutrality, the mean and variance converge on the expectations for a purely mutationally driven process, θm and σN2. As σN2, which implies a weakening influence of mutation on the overall distribution, the mean and variance converge on the expectations for a purely selection-driven process, θs and σS2.

As can be seen from Equations (13b, c), a key determinant of the form of the stationary distribution of means is the composite parameter κ=σA2/[2(u+v)(ω2+σz2)], which the following observations suggest is generally 1. First, the width of the fitness function ω can be expected to be generally greater than the phenotypic standard deviation σz, else the selective load on the trait would be enormous, and this is indeed generally observed (Walsh and Lynch, 2018). Given the range of heritability estimates noted above, this implies that the ratio σA2/(ω2+σz2) is unlikely to be greater than 0.1 under strong selection, and can become one to two orders of magnitude smaller than 0.1 under weak selection. Second, mutation rates at the single nucleotide level are typically in the range of 1011 to 108, with the former being approached in microbes and the latter in large multicellular species (Lynch et al., 2016). Thus, keeping in mind that individual targets of mutation may comprise more than single nucleotide sites, 1/[2(u+v)] is still likely to be in the range of 107 to 1010. Together, these results suggest a likely range for κ of 104 to 109, which simplifies Equations (13b, c) to

(14a) μ(z¯)θs+(θm/κ)
(14b) σ2(z¯)σN2/κ=σS2.

With these parameter values in mind, Figure 2 shows that the form of the stationary distribution varies dramatically with the value of σN2/κ=σS2, becoming extremely narrow and extremely flat at opposite ends of the spectrum for this key composite parameter. The degree to which θm deviates from θs for cellular features is unknown, but there is no reason to expect them to be equal. If they differ greatly, μ(z¯) can substantially deviate from the optimum to a degree that depends on the weighting factor κ (Figure 2).

Stationary distributions of mean phenotypes, with optimum phenotype θs=0 and κ=104.

Results are given for three different values of θm for the condition in which σN2/κ=108 (colored curves), and four different values of σS2 for the case in which the mutational mean coincides with the optimum (black curves). The parametric values used in these plots assume a scale on which the phenotypic standard deviation is 1.0, so a mean phenotype of ±0.001 is equivalent to a shift of 0.1% phenotypic standard deviations from the optimum.

https://doi.org/10.7554/eLife.34820.005

Hyperbolic fitness function

Many cellular features are likely to be primarily under continuous selection for an extreme optimum, but with diminishing strength of selection as the optimum is approached. For example, many enzymes are likely to be selected for as high a catalytic rate as possible, protein structures for as high folding rates and stability as possible, binding interfaces with as high affinities as possible, etc. One way of representing this type of selection involves the hyperbolic function,

(15) W(z)=1α exp(βz),

where the constants 0α1 and β0, respectively, define the amplitude and rapidity of the fitness response to increasing z. Fitness is equal to 1α when z=0, and asymptotically approaches one as z.

Expressions for the mean population fitness and the change in the mean resulting from selection, obtained by the procedures noted above, are provided in Table 1, and substitution of the former into Equation (11) yields the stationary distribution of mean phenotypes. Because of the asymmetry of this fitness function, the resultant distribution is no longer perfectly Gaussian, but setting Φ(z¯)/z¯=0 yields an expression for the single mode of the distribution, z^

(16a) (z^θm){exp[ β z^ ]ϕ}=2NeβσN2ϕ,

with ϕ=α exp(β2σz2/2). Despite the monotonic increase in fitness with z, the distribution of mean phenotypes is prevented from progressive increase by the counteraction of mutation and the diffusive action of drift. Because selection is always in the positive direction, the expected mode always exceeds the neutral expectation θm, to a degree that increases with the effective population size. Equation (16a) is readily solved numerically, but provided βz^<1, in the limit of large Ne,

(16b) z^θm+2NeϕσN2.

Although the hyperbolic fitness function generates a slightly asymmetric distribution of means (with tail to the right), the bulk of the distribution is approximately normal, and an excellent approximation to the variance can be obtained from the curvature of the stationary distribution around the mode (using the negative of the inverse of the second derivative of the stationary distribution),

(17) σ2(z¯)1[ 2Neβ2ϕexp(βz^)/(exp(βz^)ϕ)2 ]+(1/σN2)

As in the case of the Gaussian fitness function, Equation (13c), the two terms in the denominator are respectively the inverses of the variances expected under the limits of strong selection and neutrality.

An example of the influence of population size on the stationary distribution is given in Figure 3, where there is a strong mutational bias away from the optimum. The distributions progressively move to the right with an increase in Ne, with the mean phenotype increasing five-fold over a three order-of-magnitude range of Ne. As can be seen from Equation (16b), equal changes in either Ne or the neutral variance σN2 have identical effects on the mean, although effects on the variance are opposite in direction.

Stationary distributions of mean phenotypes with a hyperbolic fitness function, Equation (14) with α=1 and β=0.03 and denoted by the dashed line.

Black distributions denote results for various effective population sizes (Ne=103 to 106), with phenotypic variance σz2=1, mutational variance σN2=1, and the mean phenotype under neutrality (distribution given by the dotted line) θm=10. Blue and red distributions denote the effects of altering θm and σN2.

https://doi.org/10.7554/eLife.34820.006

Sigmoid fitness function

Finally, we consider a variant of the fitness function just noted. With the previous fitness function, Equation (15), the selection gradient progressively declines with increasing phenotypic value over the full range of z, with increasing z resulting in an asymptotic approach to maximum fitness. With a sigmoid fitness function, sometimes called a mesa function (Gerland and Hwa, 2002; Berg et al., 2004), there is an inflection point such that the fitness landscape becomes progressively flatter at both higher and lower values. This means that adjacent variants become increasingly similar in fitness (i.e. more neutral with respect to each other) at both extremes of the phenotype distribution.

The sigmoid fitness function for individual phenotypes can be described as

(18a) W(z)=11+exp[β(zz)],

where z denotes the inflection point at which W(z)=0.5. This function is closely approximated by

(18b) W(z)=12[ 1+erf(πβ(zz)4) ],

where erf is the error function (the cumulative standard normal distribution), which facilitates integration with Equation (4). The resultant expression for mean population fitness is also sigmoid, but with phenotypic variance reducing the strength of selection from β to β/γ, where γ=1+(β2σz2π/8) (Table 1).

As in the case of the hyperbolic fitness function, the mesa function does not yield a perfectly Gaussian distribution of mean phenotypes, but an expression for the mode (z^) can be acquired using the methods noted above,

(19a) (z^θm){exp[ (β/γ)(z^z) ]+1}=2Ne(β/γ)σN2,

which again has a single solution, indicating a unimodal stationary distribution. If βz^/γ<1, in the limit of large Ne,

(19b) z^θm+2NeσN2exp(βz/γ),

which has a form similar to the expression noted with the hyperbolic fitness function. From the form of these equations, it can again be seen that there are several equivalent effects of the underlying parameters. For example, a doubling of Ne has the same effect as a doubling of σN2 on the mode, and a doubling of β the same effect as a reduction in γ by 50%. Although more complicated, the expected variance in means under the sigmoid model is similar in form to that noted above for the Gaussian and hyperbolic fitness functions,

(20) σ2(z¯)1{ 2Ne(β/γ)2exp[ (β/γ)(z¯z) ]/(1+exp[ (β/γ)(z¯z) ])2 }+(1/σN2)

Discussion

The preceding models are meant to provide heuristic guidance into the evolutionary mechanisms responsible for the dispersion of mean phenotypes of a diversity of subcellular and molecular features. Although such traits may sometimes be under selection for an intermediate optimum, selection may often operate in a continuous directional fashion. In either case, there are two reasons why mean phenotypes are unlikely to commonly achieve states that endow a population with maximum fitness. First, if mutation bias conflicts with the directional effects of selection, the optimum phenotype will not coincide with the mean phenotype. Second, even in the absence of mutation bias and regardless of the form of the fitness function, a drift barrier exists beyond which the gradient of the selection function is not steep enough to overcome the vagaries of genetic drift, thereby preventing further adaptive progress. Within the confines of the drift barrier, the mean phenotype will wander to a degree that depends on the strength of local patterns of mutation and selection.

These points have implications for the degree to which the ‘adaptive paradigm’ should be embraced as an explanatory framework for diversification at the cellular level. For example, with mutation bias encouraging the mean phenotype to deviate from the optimum, the result will be a population under persistent directional selection despite the existence of an attainable (but not sustainable) phenotype with maximum fitness. Even without mutation pressure and in the face of intrinsic directional selection, for example, a hyperbolic or mesa fitness function, the most common mean phenotype will not be equivalent to the optimum phenotype, and the drift barrier will ensure variation in mean phenotypes among populations exposed to identical selection pressures.

An attempt has been made to couch the stationary forms of mean-phenotype distributions in terms of underlying parameters that are at least in principle observable empirically. Consider, for example, the model for stabilizing selection for a specific optimum. From Equation (14a), the expected deviation of the mean phenotype from the optimum resulting from mutation bias is θm/κ, which expands to θm[2(u+v)(ω2+σz2)/σA2], a somewhat complex function that may not be immediately transparent. However, a wide variety of models suggest that σA2 scales directly with Neμ provided selection is weak (Bürger et al., 1989; Zeng and Cockerham, 1993; Charlesworth, 2013), and because u and v (the forward and reverse mutation rates) are both proportional to μ (the total mutation rate per site), this implies that the average deviation of the mean from the optimum scales as θm(ω2+σz2)/Ne, or approximately as θmω2/Ne assuming weak selection. Thus, the deviations of phenotypic means from the selective optimum are expected to be inversely proportional to Ne, a point also made by Charlesworth (2013) in a somewhat different analysis. Note, however, that this is only the expected pattern, as the mean phenotype is still expected to drift above and below the expectation to a degree depending on the effective strength of selection. As noted in Equation (14b), and previously pointed out by Lande (1975) and Lande (1976), the magnitude of this drift variance is also inversely proportional to Ne, which implies that the standard deviation with respect to the expected mean scales as 1/Ne.

Of course, θm (the mean phenotype expected under neutrality) may differ among lineages and the within-population genetic variance σA2 is sensitive to the strength of selection, in which case the power to detect such relationships may be challenging. In addition, the linear scaling of σA2 with Ne is unlikely to continue indefinitely, unless Ne in natural populations rarely attains levels where all constituent loci are saturated with segregating mutations. The salient issue is that the preceding expressions provide qualitative insight into the behavior of mean phenotypes in alternative population-genetic environments, while also revealing the types of measurements that need to be made if we are to understand such behavior. For example, we know essentially nothing about the key mutational (θm) and selection (ω2) parameters for cell biological features and how these might vary among species. This is not a trivial issue, as the influence of both parameters in determining the most likely locations of mean phenotypes are just as central as the role played by Ne.

Applying the same logic to results for plateaued fitness functions leads to the prediction that the expected mode of mean phenotypes will scale fairly strongly with the effective population size, in the limit approaching proportionality to Ne, that is, a 10-fold increase in the mean phenotype with a 100-fold increase in Ne. As shown in Figure 3, a simple change in the mutational variance σM2 (with no associated change in mutational bias) can also cause a substantial shift in the position of the mean phenotype. These sorts of observations raise the significant possibility that species with substantially different population-genetic environments may commonly exhibit measurable differences in trait means despite experiencing identical forms of directional selection, again raising challenging issues for those who wish to interpret phenotypic differences as reflections of different underlying processes of selection.

Although the data are not extensive, several lines of evidence support the idea that the mean phenotypes of cellular attributes are indeed modulated by the power of random genetic drift. The most compelling example derives from observations on the mutation rate (per nucleotide site per generation), which scales approximately inversely with the 1000-fold range of variation in Ne across the Tree of Life (Lynch et al., 2016). Such a scaling is qualitatively consistent with the drift-barrier hypothesis for mutation-rate evolution (Lynch, 2010; Lynch, 2011), which postulates that because most mutations are deleterious, selection will typically operate to improve replication fidelity, with refinements in molecular performance eventually being thwarted by random genetic drift – as the mutation rate is progressively lowered, there is less room for improvement and hence a narrower range of selectively advantageous replication-fidelity variants accessible by selection.

Enzyme efficiency provides a second broad category of traits with evolutionary behavior seemingly in accordance with the theory outline above. For example, Bar-Even et al. (2011) have found that enzymes involved in secondary metabolism are on average 30× less efficient than those involved in central metabolism, suggesting that selection operates less effectively on enzymes further removed from core energetic determinants. More directly relevant to the points made above, Bar-Even et al. (2011) also found that prokaryotic enzymes have slightly better kinetics than those from eukaryotes, as expected for species with higher effective population sizes and consistent with the prediction that improvement of enzyme efficiencies will stall once the gradient of the fitness surface is on the order of 1/Ne (Hartl et al., 1985). The fact that bacteria utilize transcription-factor binding-site motifs with stronger affinity to their cognate transcription factors than is the case in eukaryotes is also plausibly related to a higher efficiency of selection in the former (Lynch and Hagner, 2015).

Finally, proteins typically evolve to the ‘margin of stability,’ such that only one or two mutations are usually enough to destabilize the folding process (Taverna and Goldstein, 2002; Tokuriki and Tawfik, 2009). Protein stability is deemed to be positively associated with fitness because destabilized proteins are prone to loss of function, aggregation, and/or direct toxicity. Strikingly, however, it is relatively easy to obtain more stable proteins by mutagenesis (Matsuura et al., 1999; Bershtein et al., 2013; Sullivan et al., 2012), with the contributing residues typically interacting in an additive fashion (Wells, 1990; Serrano et al., 1993; Zhang et al., 1995). Moreover, although it is commonly argued that marginal stability is required for proper protein function, with excess stability somehow reducing protein performance, this has not held up to close scrutiny. Many examples exist in which increased stability has been achieved in laboratory modifications of proteins with few if any consequences for enzyme efficiency (e.g. Giver et al., 1998; Zhang et al., 1995; Taverna and Goldstein, 2002; Borgo and Havranek, 2012; Moon et al., 2014).

These observations suggest that despite persistent selection for high folding stability, the plateau-like nature of the fitness landscape results in diminishing fitness advantages of increasing stability. A hyperbolic relationship between fitness and the binding energy involving protein stability follows from biophysical principles (Govindarajan and Goldstein, 1997; Taverna and Goldstein, 2002; Bloom et al., 2005; Zeldovich and Shakhnovich, 2008; Wylie and Shakhnovich, 2011; Serohijos and Shakhnovich, 2014), and under this model, proteins are expected to be pushed by natural selection to more stable configurations until reaching the point where any further fitness improvement is small enough to be offset by the vagaries of random genetic drift and/or mutation pressure towards less stable states. Notably, proteins of equivalent length fold at least ten times more rapidly in bacteria than in eukaryotes (Galzitskaya et al., 2011). Moreover, an in vitro evaluation of the folding stability of the dihydrofolate reductase enzyme from 36 species of mesophilic bacteria illustrates the existence of a substantial range of variation among species, with the standard deviation being roughly 10% of the mean (Bershtein et al., 2015). In principle, such a distribution may reflect the dispersion in mean phenotypes associated with drift around the drift barrier.

Although the mutation function employed here likely comes closer to approximating the situation for cellular features than do previous functions relied on in quantitative genetics, in reality we do not know the exact form of this function for any cellular feature. Thus, the mathematical theory developed here is best viewed as a guide to approaching the problem at hand rather than as an indelible platform for quantitative analysis. Despite such uncertainties, however, the central feature of the theory presented above is that, regardless of the form of the underlying mutation and selection functions, the stationary distribution of mean phenotypes can generally be viewed as the product of the pattern expected under neutrality alone and the associated function for mean population fitness taken to the 2Ne power, as described by Equations (1a,b) and (11). Similar behavior was previously pointed out for the stationary distribution of allele frequencies (Wright, 1969). Thus, once the key underlying functions have been elucidated, the precise details of the theory can be readily modified with alternative mathematical functions.

Finally, a key issue that is not formally evaluated here, but is arguably relevant to a number of cellular features, concerns the matter of peak shifts across the stationary distribution. Questions regarding this matter are typically inspired by Wright’s (1932) metaphor of an adaptive topography, with multiple fitness peaks and valleys of various depths over the phenotypic landscape. However, unless the distribution of mutational effects is completely flat, the relevant topography is not simply defined by the fitness landscape but by the joint action of both selection and mutation. Although the stationary distribution was unimodal in all of the cases examined above, plausible cases exist in which the stationary distribution exhibits two peaks, one largely driven by selection and the other by mutation pressure. For this to occur, the gradient of mutation pressure in one direction has to be of a form such that its product with the selection gradient has an internal minimum (Figure 4). In principle, this can happen when at the intersection of intermediate phenotypes the two functions are sufficiently upwardly concave that their product reaches a local minimum.

An example of a bimodal stationary distribution of mean phenotypes, with mutation pressure to the left and selection pressure to the right, with the forms of these two functions being such that their product is a minimum at the valley in the landscape.

Most of the time, a population will reside in one domain or the other, wandering over a range of phenotype space to the left or the right of the valley, but occasionally a transition will be made across the valley impelled by a stochastic series of drift and mutational events. Populations crossing the valley to the right are pulled by selection pressure, whereas those crossing to the left are pulled by the multiplicity of mutational opportunities.

https://doi.org/10.7554/eLife.34820.007

Under such a scenario, the population mean phenotype is expected to reside in two alternative semi-stable domains for extended periods of time, with the rates of transitions between domains depending on the relative heights of the two peaks, the depth of the distributional valley, and the curvatures of the stationary distribution at the inflection points (Lande, 1985; Barton and Rouhani, 1987). Over long evolutionary time periods, such a system will exhibit detailed balance – the net fluxes will be equal in both directions, with the ratio of the occupancy of the two alternative domains being inversely related to the ratio of the transition rates between them, that is, with the less frequent domain having a higher conditional rate of transition to the more frequent domain.

Although the frequency of stationary distributions with multimodal forms is unknown, they have been predicted to arise in some situations involving transcription (Lynch and Hagner, 2015; Tuğrul et al., 2015). Should they exist, the picture from comparative analyses would be one of qualitative changes in mean phenotypes in adjacent lineages. Tempting as it might be to invoke shifting ecological pressures to explain such patterns, they would be occurring in the absence of any underlying changes in selection, being a simple consequence of the multiplicity of mutational opportunities in one direction balanced by selective pressures in the other. Such ideas may be helpful in attempts to decipher the substantial and seemingly disorganized diversity of certain cellular features such as open vs. closed mitosis (Sazer et al., 2014), the structure of the centrosome (Carvalho-Santos et al., 2011), and the variable multimeric states of proteins (Dayhoff et al., 2010; Lynch, 2013; Ahnert et al., 2015) across the Tree of Life.

References

    1. Hartl DL
    2. Dykhuizen DE
    3. Dean AM
    (1985)
    Limits of adaptation: the evolution of selective neutrality
    Genetics 111:655–674.
  1. Book
    1. Lynch M
    2. Walsh JB
    (1998)
    Genetics and Analysis of Quantitative Traits
    Sunderland, MA: Sinauer Assocs., Inc.
  2. Book
    1. Lynch M
    (2007)
    The Origins of Genomic Architecture
    Sunderland, MA: Sinauer Assocs., Inc.
  3. Book
    1. Walsh JB
    2. Lynch M
    (2018)
    Evolution and Selection of Quantitative Traits
    Oxford: Oxford University Press.
    1. Wright S
    (1932)
    The roles of mutation, inbreeding, crossbreeding, and selection in evolution
    Proc. Sixth Internat. Cong. Genetics pp. 355–366.
    1. Wright S
    (1969)
    The Theory of Gene Frequencies
    Evolution and the Genetics of Populations, The Theory of Gene Frequencies, 2, Chicago, IL, Univ. Chicago Press.
    1. Zeng ZB
    2. Cockerham CC
    (1993)
    Mutation models and quantitative genetic variation
    Genetics 133:729–736.

Article and author information

Author details

  1. Michael Lynch

    Center for Mechanisms of Evolution, Biodesign Institute, Arizona State University, Tempe, Arizona
    Contribution
    Formal analysis, Funding acquisition, Investigation, Methodology, Writing—original draft, Project administration, Writing—review and editing
    For correspondence
    mlynch11@asu.edu
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-1653-0642

Funding

Army Research Office (W911NF-09-1-0444)

  • Michael Lynch

Army Research Office (W911NF-14-1-0411)

  • Michael Lynch

National Institutes of Health (R01-GM036827)

  • Michael Lynch

National Institutes of Health (R35-GM122566)

  • Michael Lynch

National Science Foundation (PHY11-25915)

  • Michael Lynch

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

I thank M Bauer, J Felsenstein, P Higgs, P Johri, M Lässig, M Manhart, A Moses, and D Needleman for helpful comments. This research was supported in part by the National Science Foundation under Grant No. PHY11-25915 to the Kavli Institute of Theoretical Physics. Support was also provided by the Multidisciplinary University Research Initiative awards W911NF-09-1-0411 and W911NF-09-1-0444 from the US Army Research Office, National Institutes of Health awards R01-GM036827 and R35-GM122566-01, and National Science Foundation award MCB-1518060.

Version history

  1. Received: January 4, 2018
  2. Accepted: May 10, 2018
  3. Version of Record published: June 21, 2018 (version 1)

Copyright

© 2018, Lynch et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 2,146
    views
  • 301
    downloads
  • 11
    citations

Views, downloads and citations are aggregated across all versions of this paper published by eLife.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Michael Lynch
(2018)
Phylogenetic divergence of cell biological features
eLife 7:e34820.
https://doi.org/10.7554/eLife.34820

Share this article

https://doi.org/10.7554/eLife.34820

Further reading

    1. Cell Biology
    2. Neuroscience
    Marcos Moreno-Aguilera, Alba M Neher ... Carme Gallego
    Research Article Updated

    Alternative RNA splicing is an essential and dynamic process in neuronal differentiation and synapse maturation, and dysregulation of this process has been associated with neurodegenerative diseases. Recent studies have revealed the importance of RNA-binding proteins in the regulation of neuronal splicing programs. However, the molecular mechanisms involved in the control of these splicing regulators are still unclear. Here, we show that KIS, a kinase upregulated in the developmental brain, imposes a genome-wide alteration in exon usage during neuronal differentiation in mice. KIS contains a protein-recognition domain common to spliceosomal components and phosphorylates PTBP2, counteracting the role of this splicing factor in exon exclusion. At the molecular level, phosphorylation of unstructured domains within PTBP2 causes its dissociation from two co-regulators, Matrin3 and hnRNPM, and hinders the RNA-binding capability of the complex. Furthermore, KIS and PTBP2 display strong and opposing functional interactions in synaptic spine emergence and maturation. Taken together, our data uncover a post-translational control of splicing regulators that link transcriptional and alternative exon usage programs in neuronal development.

    1. Cell Biology
    Ang Li, Jianxun Yi ... Jingsong Zhou
    Research Article

    Amyotrophic lateral sclerosis (ALS) is a fatal neuromuscular disorder characterized by progressive weakness of almost all skeletal muscles, whereas extraocular muscles (EOMs) are comparatively spared. While hindlimb and diaphragm muscles of end-stage SOD1G93A (G93A) mice (a familial ALS mouse model) exhibit severe denervation and depletion of Pax7+satellite cells (SCs), we found that the pool of SCs and the integrity of neuromuscular junctions (NMJs) are maintained in EOMs. In cell sorting profiles, SCs derived from hindlimb and diaphragm muscles of G93A mice exhibit denervation-related activation, whereas SCs from EOMs of G93A mice display spontaneous (non-denervation-related) activation, similar to SCs from wild-type mice. Specifically, cultured EOM SCs contain more abundant transcripts of axon guidance molecules, including Cxcl12, along with more sustainable renewability than the diaphragm and hindlimb counterparts under differentiation pressure. In neuromuscular co-culture assays, AAV-delivery of Cxcl12 to G93A-hindlimb SC-derived myotubes enhances motor neuron axon extension and innervation, recapitulating the innervation capacity of EOM SC-derived myotubes. G93A mice fed with sodium butyrate (NaBu) supplementation exhibited less NMJ loss in hindlimb and diaphragm muscles. Additionally, SCs derived from G93A hindlimb and diaphragm muscles displayed elevated expression of Cxcl12 and improved renewability following NaBu treatment in vitro. Thus, the NaBu-induced transcriptomic changes resembling the patterns of EOM SCs may contribute to the beneficial effects observed in G93A mice. More broadly, the distinct transcriptomic profile of EOM SCs may offer novel therapeutic targets to slow progressive neuromuscular functional decay in ALS and provide possible ‘response biomarkers’ in pre-clinical and clinical studies.