A summary of the methods and main results.

We inferred transition rates between leaf shapes in two different ways: (a) by fitting a continuous time Markov chain (CTMC) model of trait evolution to phylogenetic trees and (b) by simulating mutation in a computational model of leaf development. (a) (Left) A sample of images from the Naturalis Biodiversity Center [31][3235] herbarium are manually classified as unlobed (u), lobed (l), dissected (d) or compound (c) (Methods section IV F). This dataset, as well as an analogous dataset by Geeta et al. [36], are used to label the tips of 5 different phylogenetic trees representing angiosperm evolution [3638]. (Centre) the Zuntini et al. [38] species tree is shown in full. Rates are inferred by fitting a CTMC model to the labelled trees using a Markov chain Monte Carlo (MCMC) approach in BayesTraits V4.1.2 [39]. (Right) Posterior net rates for the Zuntini et al. [38] species tree are shown as arrows. Black (grey) arrows depict rates for which 0 falls outside (within) a 95% credible interval. (b) (Left) An initial sample of 48 leaves representing a wide morphospace is generated using the Runions et al. [14] model (Methods sections IV A and IV B). Leaves in this sample are mutated in parallel through a random walk algorithm, with no selection besides rejecting parameter combinations that give physically implausible leaves (table I), generating a sequence of leaves that evolve away from the initial leaf (Methods section IV C). (Middle) Every leaf is classified as unlobed (u), lobed (l), dissected (d) or compound (c) using an automated morphometric approach (Methods section IV D), giving a sequence of character states for each walk. To this trajectory, a CTMC model is fitted using MCMC (Methods section IV E). (Right) Arrows depict posterior net rates for the MUT2 scheme (alg. 1).

An experimentally motivated illustration showing that simple leaves are easier to generate than complex leaves.

(a) Leaf silhouettes of wild-type A. thaliana and three mutants, which result in lobed (+STM, +RCO) and compound shape (+STM /+RCO) (reproduced from Kierzkowski et al. [40] with permission). Ectopic STM and RCO expression individually create a lobed shape, but together induce compound leaflets. (b) A reproduction of the Kierzkowski et al. [40] result using the Runions et al. [14] computational model of leaf development. +STM is approximated by increasing the duration of growth and +RCO by increasing the growth repression strength of one morphogen on the margin of the leaf. Morphogen presence is represented by the purple/blue lines outside the margin. (c) A morphospace showing the resulting shapes from different levels of the two parameters we use to approximate STM and RCO. The range shown for morphogen repression strength is the range in which biologically plausible shapes can be produced. The range for duration of growth is the range in which leaves can be generated in a reasonable timeframe. In the parameter files of the Runions et al. [14] model, the parameters we refer to as morphogen repression strength and duration of growth are given by FAIRING4 and FINALFRAME respectively.

The proportions of each leaf shape over time as predicted by in-silico mutagenesis and phylogeny.

(Simulation) The solid blue line represents the mean proportion of random walk chains in each state at each time step ( ± 1.96× standard error) in our simulation (MUT2 alg. 1). To this data, we fit a CTMC via maximum likelihood estimation (MLE) (dashed line) and MCMC with errors inferred from MCMC posteriors via Monte-Carlo error propagation. Since step size is arbitrary, for comparison with the phylogenetic model, we adjust the x-limits for the simulation such that the CTMC predictions for both the simulation and phylogeny reach halfway from the initial distribution to the stationary distribution by approximately the same distance along the axis. (Phylogeny) We predict how the proportions of each leaf shape evolve from the same initial proportions used in the simulations, using the CTMC inferred from each phylogeny (green - Zuntini et al. [38] genus (fig. S3), red - Janssens et al. genus (fig. S5)). Dashed lines represent the prediction using MLE transition rate values. Errors for all CTMC predictions were inferred from the transition rate MCMC posteriors using Monte-Carlo error propagation (N = 500), and represent a 95% credible interval for each time step. We show these extrapolations for a time period equal to the approximate age of the angiosperm clade (200 Myr) [38].

Simpler leaves occupy a greater volume than complex leaves in model parameter space.

(a) A 2D histogram of model parameter space given by the 2 principle axes from a principle components anlaysis (PCA) of a sample of 4000 leaves from each shape category from one of the simulations (MUT2). Mean bin occupancy is calculated via a bootstrapping approach, by sub-sampling with replacement N = 1000 leaves from each shape category for 10000 iterations. Bins are shaded if the mean occupancy across these iterations is ≥0.5. This reveals that unlobed leaves occupy a larger region of model parameter space than more complex shapes and that this larger space also contains the majority of more complex leaves. Moreover, more complex leaves are concentrated within a much smaller area than unlobed leaves. (b) The no. PC1-PC2 2D histogram bins occupied by each leaf shape in 10000 bootstrap samples (N = 1000) of a balanced sample of MUT2 (4000 per shape category). Bins with at least 1 leaf present are counted. This reveals that unlobed occupies by far the largest area in this 2D projection of parameter space. We find the same is true for other PCs (fig. S16).

A description of the leaf model from Runions et al. [14], highlighting the three key components of leaf development and their interactions.

(a) Model components illustrated in leaf form (inspired by Runions et al. [14] fig. 3). Margin in black, veins in blue, lamina in green, convergence points in white with black outlines. (b) The three components of the model with arrows indicating interactions that affect growth. (1) Existing morphogens and convergence points influence where new ones will form. (2) Convergence points and morphogens influence local growth rate. (3) Convergence points induce new veins. (4) Veins determine local growth direction. (5) Existing veins influence the growth of new veins. (c) Zoomed view of the margin, showing how growth of the vasculature and patterning on the margin interact to produce protrusions and sinuses. Veins (in blue) extend, pushing the margin out, however a morphogen (in orange) inhibits webbing between the veins, causing a sinus to form. This increases the distance between the two convergence points, leading to the insertion of another convergence point between the two that excludes the morphogen from its vicinity. Existing vasculature connects to the convergence point and specifies a new axis of growth. (d) Different growth stages of a model leaf. The dark blue morphogen inhibits webbing and convergence point formation, the light blue morphogen delineates the petiole. The value of t denotes the iteration of the simulation.

A summary of the morphometric classifier.

(a) Example lobed leaf. (b) Silhouette with local maxima in orange, minima in blue and reference point in green. (c) Graph of distances of each contour point to the reference point. Local maxima shown in orange, minima in blue.

Random walk constraints.

The only type of selection we apply during the walk is stabilising selection for physically plausible leaves. See fig. S2 for examples of rejected leaves. Weight difference (fig. S2a) ensures that the midvein and petiole are roughly in line with the centre of mass of the leaf. Leaf width (fig. S2b) ensures that leaves which stop growing after the first few stages of the model simulation are rejected. Veins outside lamina (fig. S2c) prevents leaves forming whose veins grow outside the lamina. Veins width (fig. S2d) and Veins area (fig. S2e) ensure that leaves that generate without adequate vasculature are rejected. Overlapping margin (fig. S2f) rejects leaves with ingrown margins. Error or timeout – some parameter combinations result in the simulation exiting unexpectedly or taking a very long time to complete, posing an additional constraint.

The number of taxa in each shape category for each labelled tree used in our inference.

Columns with multiple shape categories (e.g. ld) represent taxa which could fit into any of the listed categories.

Further details of transition rates from phylogenies and simulations.

(a) Net normalsied transition rates between the 4 leaf shape categories, inferred from both simulation and phylogeny. (Top row) Arrow thickness and direction represents the mean net normalised transition rate between each pair of leaf shape categories. Black arrows represent transitions where the 0 does not fall within an equal-tailed 95% credible interval for the net normalised posterior, and grey where it does. (Bottom row) Posterior distributions for the net normalised rate of all transitions. Solid bars indicate the mean. The net rate is calculated by subtracting the transition given by the dashed arrow from the transition given by the solid arrow in the x-axis labels. (b) Normalised transition rates between different leaf shapes, inferred from both in-silico mutagenesis (left of dashed line) and phylogeny (right of dashed line). Each subplot represents a different transition, from the shape shown on the vertical figure axis to the shape shown on the horizontal figure axis. These rates were normalised by dividing by the mean transition rate estimate within each dataset (eq. (6)).

Comparing in-silico mutagenesis and phylogeny for the probability of each shape transition over time, separated out by starting shape.

The column header gives the initial shape and the line colour gives the final shape. (Top row) The dashed line represents the proportion of walks from each unique initial leaf (fig. S1), averaged over all initial leaves in each state. The error bands represents ± 1.96× standard error for this average. The solid lines represents the CTMC model fit (MLE) to the output of the random walks. The vertical, grey, dashed line represents the estimated simulation step that is equivalent to t = 200 Myr for the phylogeny CTMC. This is calculated as described in fig. 3, and is also the simulation x-limit in fig. 3. We show all 320 simulation steps here for a more detailed summary of the simulation trajectories. (Bottom row) The predicted transition probabilities given by the CTMC model fit to the labelled Zuntini et al. [38] genus tree (fig. S3). The solid line represents the MLE fit and the errors are inferred from MCMC posteriors via Monte-Carlo error propagation (N = 500) as per fig. 3. Note the good qualitative agreement between the dynamics between the phylogenies and the simulations, especially for the (blue) data related to transitions to/from unlobed leaves.

The nearest neighbour graph of parameter PCA space for each shape category highlights how the unlobed space is the most connected, followed by dissected, lobed and then compound.

Complex leaves subdivide into many more connected components than unlobed. The number of nearest neighbours used is 6. Colours highlight different connected components, however, since the number of unique colours is only 10, this under represents the true number of connected components for lobed, dissected and compound. Thus, the colours here are intended only as a visual guide.