Emergence and propagation of epistasis in metabolic networks

Abstract
Introduction
Results
Discussion
Materials and methods
Appendix 1
Appendix 2
Appendix 3
Appendix 4
Appendix 5
Appendix 6
Data availability
References
Article and author information
Metrics

Abstract

Epistasis is often used to probe functional relationships between genes, and it plays an important role in evolution. However, we lack theory to understand how functional relationships at the molecular level translate into epistasis at the level of whole-organism phenotypes, such as fitness. Here, I derive two rules for how epistasis between mutations with small effects propagates from lower- to higher-level phenotypes in a hierarchical metabolic network with first-order kinetics and how such epistasis depends on topology. Most importantly, weak epistasis at a lower level may be distorted as it propagates to higher levels. Computational analyses show that epistasis in more realistic models likely follows similar, albeit more complex, patterns. These results suggest that pairwise inter-gene epistasis should be common, and it should generically depend on the genetic background and environment. Furthermore, the epistasis coefficients measured for high-level phenotypes may not be sufficient to fully infer the underlying functional relationships.

Introduction

Life emerges from an orchestrated performance of complex regulatory and metabolic networks within cells. The blueprint for these networks is encoded in the genome. Mutations alter the genome. Some of them, once decoded by the cell, perturb cellular networks and thereby change the phenotypes important for life. Understanding how mutations affect the function of cellular networks is key to solving many practical and fundamental problems, such as finding mechanistic causes of genetic disorders (Hu et al., 2011; Fang et al., 2019), deciphering the architecture of complex traits (Zuk et al., 2012; Mackay, 2014; Wei et al., 2014), building artificial cells (Hutchison et al., 2016), explaining past, and predicting future evolution (Blount et al., 2008; Wiser et al., 2013; de Visser and Krug, 2014; Harms and Thornton, 2014; Kryazhimskiy et al., 2014; Sailer and Harms, 2017a; Sohail et al., 2017). Conversely, mutations can help us learn how cellular networks are organized (Phillips, 2008; van Opijnen and Camilli, 2013).

To infer the wiring diagram of a cellular network that produces a certain phenotype, one approach in genetics is to measure the pairwise and higher-order genetic interactions (or ‘epistasis’) between mutations that perturb it (Phillips, 2008). Much effort has been devoted in the past 20 years to a systematic collection of such genetic interaction data for several model organisms and cell lines (Kelley and Ideker, 2005; Lehner et al., 2006; Jasnos and Korona, 2007; Collins et al., 2007; St Onge et al., 2007; Typas et al., 2008; Roguev et al., 2008; Costanzo et al., 2010; Szappanos et al., 2011; Huang et al., 2012; Roguev et al., 2013; Bassik et al., 2013; Babu et al., 2014; Costanzo et al., 2016; van Leeuwen et al., 2016; Skwark et al., 2017; Du et al., 2017; Heigwer et al., 2018; Horlbeck et al., 2018; Norman et al., 2019; Liu et al., 2019; Kuzmin et al., 2018; New and Lehner, 2019; Celaj et al., 2020). This approach is particulary powerful when the phenotypic effect of one mutation changes qualitatively depending on the presence or absence of a second mutation in another gene, for example when a mutation has no effect on the phenotype in the wildtype background, but abolishes the phenotype when introduced together with another mutation, such as synthetic lethality (Tong et al., 2001). Such qualitative genetic interactions can often be directly interpreted in terms of a functional relationship between gene products (Tong et al., 2001; Davierwala et al., 2005; Phillips, 2008).

Most pairs of mutations do not exhibit qualitative genetic interactions. Instead, the phenotypic effect of a mutation may change measurably but not qualitatively depending on the presence or absence of other mutations in the genome (Babu et al., 2014; Costanzo et al., 2016). The genetic interactions can in this case be quantified with one of several metrics that are termed ‘epistasis coefficients’ (Wagner et al., 1998; Hansen and Wagner, 2001; Mani et al., 2008; Wagner, 2015; Poelwijk et al., 2016). Although some rules have been proposed for interpreting epistasis coefficients, in particular, their sign (Dixon et al., 2009; Lehner, 2011; Baryshnikova et al., 2013), the validity, and robustness of these rules are unknown because there is no theory for how functional relationships translate into measurable epistasis coefficients in any system (Lehner, 2011; Domingo et al., 2019). To avoid this major difficulty, most large-scale empirical studies focus on correlations between epistasis coefficients rather than on their actual values (but see Velenich and Gore, 2013, for a notable exception). Genes with highly correlated epistasis profiles are then interpreted as being functionally related (Segrè et al., 2005; Bellay et al., 2011; Babu et al., 2014; Costanzo et al., 2016; Horlbeck et al., 2018). Although this approach successfully groups genes into protein complexes and larger functional modules (Michaut et al., 2011; Bellay et al., 2011), it does not reveal the functional relationships themselves. As a result, many if not most, genetic interactions between genes and modules still await their biological interpretation (Costanzo et al., 2016; Fang et al., 2019).

While geneticists measure epistasis to learn the architecture of biological networks, evolutionary biologists face the reverse problem: they need to know how the genetic architecture constrains epistasis at the level of fitness. Epistasis determines the structure of fitness landscapes on which populations evolve (Fragata et al., 2019). Understanding it would bear on many important evolutionary questions, such as why so many organisms reproduce sexually (Kondrashov, 2018), how novel phenotypes evolve (Blount et al., 2008; Bridgham et al., 2009; Natarajan et al., 2013; Harms and Thornton, 2014), how predictable evolution is (Weinreich et al., 2006; Tenaillon et al., 2012; Wiser et al., 2013; Kryazhimskiy et al., 2014), etc. So far, evolutionary biologists have relied primarily on abstract models of fitness landscapes (see Orr, 2005, for a review), rather than those firmly grounded in organismal biochemistry and physiology (e.g. Dykhuizen et al., 1987; Das et al., 2020). For example, Fisher’s geometric model—one of the most widely used fitness landscape models—is explicitly devoid of the physiological and biochemical details (Fisher, 1930; Tenaillon, 2014; Martin, 2014).

A theory of epistasis must address two challenges. First, it must specify how the architecture of a biological network constrains epistasis. Such knowledge is important not only for evolutionary questions, but also for the inference problem in genetics. Consider a biological network module that produces a phenotype of interest but whose internal structure is unknown. By genetically perturbing all genes within the module and measuring the phenotype in all single, double and possibly some higher-order mutants, we can obtain the matrix of epistasis coefficients. In principle, we can then fit a network topology and parameters to these data. However, without knowing what information about the network is contained in the matrix in the first place, we cannot be sure whether the inferred topology and parameters are close to their true values or represent one of many possible solutions consistent with the data.

The second challenge is that epistasis may arise at a different level of biological organization than where it is measured by the experimentalist or by natural selection. For example, geneticists are often interested in understanding the structures of specific regulatory or metabolic network modules (Collins et al., 2007; Costanzo et al., 2010). However, measuring the peformance of a module directly is often experimentally difficult or impossible. Then epistasis is measured for an experimentally accessible ‘high-level’ phenotype, such as fitness, which depends on the performance of the focal ‘lower-level’ module, but also on other unrelated modules. However, if we do not know how epistasis that originally emerged in one module maps onto epistasis that is measured, it is unclear what we can infer about module’s internal structure.

Evolutionary biologists encounter a related problem when they wish to learn the evolutionary history of a protein or a larger cellular module. To do so, they would in principle need to know how different mutations in this module affected fitness of the whole organism in its past environment. But such information is rarely available. Instead, it is sometimes possible to reconstruct past mutations and measure their biochemical effects in the lab (Lunzer et al., 2005; Bridgham et al., 2009; Natarajan et al., 2013; Sarkisyan et al., 2016). When interesting patterns of epistasis are identified at the biochemical level, it is usually assumed that the same patterns manifested themselved at the level of fitness and drove module’s evolution. However, this is not obvious. If interactions with other modules distort epistasis as it propagates from the biochemical level to the level of fitness (Snitkin and Segrè, 2011), our ability to infer past evolutionary history from in vitro biochemical measurements could be diminished. Therefore, the second challenge that a theory of epistasis must address is how epistasis propagates from lower-level phenotypes to higher-level phenotypes.

There is a large body of theoretical and computational literature on epistasis. As early as 1934, Sewall Wright realized that epistasis naturally emerges in molecular networks (Wright, 1934). This was later explicitly demonstrated in many mathematical and computational models (e.g. Kacser and Burns, 1981; Keightley, 1989; Szathmáry, 1993; Gibson, 1996; Keightley, 1996; Wagner et al., 1998; Omholt et al., 2000; Peccoud et al., 2004; Gjuvsland et al., 2007; Gertz et al., 2010; Fiévet et al., 2010; Pumir and Shraiman, 2011). Metabolic control analysis became one of the most successful and general frameworks for understanding epistasis between metabolic genes (Kacser and Burns, 1973). Dean et al., 1986, Dykhuizen et al., 1987, Dean, 1989, Lunzer et al., 2005, MacLean, 2010 used it to interpret the empirically measured fitness effects of mutations and their interactions in terms of the metabolic relationships between the products of mutated genes. Kacser and Burns, 1981, Hartl et al., 1985, Keightley, 1989, Clark, 1991, Keightley, 1996, Bagheri-Chaichian et al., 2003, Bagheri and Wagner, 2004, Fiévet et al., 2010 explored the implications of epistasis in metabolism for genetic variation in populations, their response to selection, long-term evolutionary dynamics and outcomes, such as the evolution of dominance. However, most studies analyzed only the linear metabolic pathway (but see Keightley, 1989) and assumed that fitness equals flux through the pathway (but see Szathmáry, 1993), thereby bypassing the problem of epistasis propagation.

There have been few attempts to theoretically relate the molecular architecture of an organism to the types of epistasis that would arise for its high-level phenotypes, such as fitness. Segrè et al., 2005 and He et al., 2010 used flux balance analysis (FBA, Orth et al., 2010) to compute genome-wide distributions of epistasis coefficients in metabolic models of Escherichia coli and Saccharomyces cerevisiae and arrived at starkly discordant conclusions. Recently, Alzoubi et al., 2019 showed that FBA is generally poor in predicting experimentally measured genetic interactions, suggesting that it might be difficult to understand the emergence and propagation of epistasis by relying exclusively on genome-scale computational models. Sanjuán and Nebot, 2008 and Macía et al., 2012 modeled various abstract metabolic and regulatory networks and found a possible link between epistasis and network complexity. The work by Chiu et al., 2012 is a more systematic attempt to develop a general theory of epistasis. They established a fundamental connection between epistasis and the curvature of the function that maps lower-level phenotypes onto a higher-level phenotype. However, further progress has been so far hindered by uncertainty in what types of functions map phenotypes onto one another in real biological systems. Previous studies made various idiosyncratic choices with respect to this mapping, leaving us without a clear guidance as to the conditions or systems where they are expected to hold.

To overcome this problem, here I consider a whole class of hierarchical metabolic networks and obtain the family of all functions that determine how the effective activity of a larger metabolic module can depend on the activities of smaller constituent modules. There are several advantages to this approach. First, it leads to an intuitive understanding of how the structure of the network influences epistasis emergence and propagation. Second, my approach is based on basic biochemical principles, so it should be relevant for many phenotypes. For example, epistasis is often measured at the level of growth rate (Jasnos and Korona, 2007; St Onge et al., 2007; Babu et al., 2014; Costanzo et al., 2016), and metabolism fuels growth. Moreover, metabolic genes occupy a large fraction of most genomes (Orth et al., 2011) and the general organization of metabolism is conserved throughout life (Csete and Doyle, 2004). Thus, by understanding genetic interactions between metabolic genes, we will gain an understanding of a large fraction of all genetic interactions.

In my model, I consider a hierarchical network with first-order kinetics but arbitrary topology, and ask two questions related to the two challenges mentioned above. (1) How does an epistasis coefficient that arose at some level of the metabolic hierarchy propagate to higher levels of the hierarchy? (2) How does the network topology constrain the value of an epistasis coefficient between two mutations that affect different enzymes in this network? I obtain answers to these questions analytically in the limiting case when the effects of mutations are vanishingly small. I then computationally probe the validity of the conclusions outside of the domain where they are expected to hold.

My model is not intended to generate predictions of epistasis for any specific organism. Instead, its main purpose is to provide a baseline expectation for how epistasis that emerges at lower-level phenotypes manifests itself at higher-level whole-organism phenotypes, such as fitness, and what kind of information may be gained from measurements of such higher-level epistasis. One possible outcome of this analysis is that there may be fundamental limitations to what an epistasis measurement at one level of biological organization can tell us about epistasis at another level. On the other hand, if it turns out that there is a general correspondence between epistasis coefficients at different levels in this simple model, then it may be worth developing more sophisticated and general models on which inference from data can be based.

Model

Hierarchical metabolic network

Consider a set of metabolites $A = {1, 2, \dots, n}$ with concentrations $S_{1}, \dots, S_{n}$ which can be interconverted by reversible first-order biochemical reactions. The rate of the reaction converting metabolite $i$ into metabolite $j$ is $x_{i j} (S_{i} - S_{j} / K_{i j})$ where $K_{i j}$ is the equilibrium constant. The rate constants $x_{i j}$ , which satisfy the Haldane relationships $x_{j i} = x_{i j} / K_{i j}$ (Cornish-Bowden, 2013), form the matrix $\vec{x} = {∥ x_{i j} ∥}_{i, j = 1}^{n}$ . The metabolite set $A$ and the rate matrix $\vec{x}$ define a biochemical network $𝒩 = (A, \vec{x})$ .

The first-order kinetics assumption makes the model analytically tractable, as discussed below; biologically, it is equivalent to assuming that all enzymes are far from saturation. The rate constants $x_{i j}$ depend on the concentrations and the specific activities of enzymes and therefore can be altered by mutations. $K_{i j}$ characterize the fundamental chemical nature of metabolites $i$ and $j$ and cannot be altered by mutations (Savageau, 1976).

The whole-cell metabolic network is large, and it is often useful to divide it into subnetworks that carry out certain functions important for the cell. I define subnetworks mathematically as follows. I say that two metabolites $i$ and $j$ are adjacent (in the graph-theoretic sense) if there exists an enzyme that catalyzes a biochemical reaction between them, that is, if $x_{i j} > 0$ . Now consider a subset of metabolites $A_{μ} \subset A$ . For this subset, let $A_{μ}^{IO}$ be the set of all metabolites that do not belong to $A_{μ}$ but are adjacent to at least one metabolite from $A_{μ}$ . Let ${\vec{x}}_{μ}$ be the submatrix of $\vec{x}$ which corresponds to all reactions where both the product and the substrate belong to either $A_{μ}$ or $A_{μ}^{IO}$ . The metabolite subset $A_{μ}$ and the rate matrix ${\vec{x}}_{μ}$ form a subnetwork $μ = (A_{μ}, {\vec{x}}_{μ})$ of network $𝒩$ . I refer to $A_{μ}$ and $A_{μ}^{IO}$ as the sets of internal and ‘input/output’ (‘I/O’ for short) metabolites for subnetwork μ, respectively. Thus, all internal metabolites and all reactions that involve only internal and I/O metabolites are part of the subnetwork. Note that the I/O metabolites do not themselves belong to the subnetwork, but reactions between them, if they exist, are part of the subnetwork. Metabolites that are neither internal nor I/O for μ are referred to as external to subnetwork μ. These definitions are illustrated in Figure 1A.

Figure 1

Download asset Open asset

Illustration of a hierarchical metabolic network and its coarse-graining.

(A) White rectangle represents the whole metabolic network $𝒩$ . Example subnetworks μ and ν are represented by the dark and light gray rectangles. Only metabolites and reactions that belong to these subnetworks are shown; other metabolites and reactions in $𝒩$ are not shown. Metabolites 1 and 5 may be adjacent to other metabolites in $𝒩$ ; this fact is represented by short black lines that do not terminate in metabolites. Subnetworks μ and ν are both modules because there exists a simple path connecting their I/O metabolites that lies within μ and ν and contains all their internal metabolites (dashed blue line). (B) Network $𝒩$ can be coarse-grained by replacing module μ at steady state with an effective reaction between its I/O metabolites 1 and 2, with the rate constant is $y_{μ}$ . (C) Network $𝒩$ can be coarse-grained by replacing module ν at steady state with an effective reaction between its I/O metabolites 1 and 5, with the rate constant is $y_{ν}$ .

The main objects in this work are biochemical modules, which are a special type of subnetworks. To define modules, I introduce some auxiliary concepts. I say that two metabolites $i$ and $j$ are connected if there exists a series of enzymes that interconvert $i$ and $j$ , possibly through a series of intermediates. Mathematically, $i$ and $j$ are connected if there exists a simple (i.e. non-self-intersecting) path between them. If all metabolites in this path are internal to the subnetwork μ (possibly excluding the terminal metabolites $i$ and $j$ themselves) then $i$ and $j$ are connected within the subnetwork μ, and such path is said to lie within μ. By this definition, metabolites $i$ and $j$ can be connected within μ only if they are either internal or I/O metabolites for μ.

Definition 1

A subnetwork μ is called a module if (a) it has two I/O metabolites, and (b) for every internal metabolite $i \in A_{μ}$ , there exists a simple path between the I/O metabolites that lies within μ and contains $i$ .

This definition is illustrated in Figure 1A. The assumption that modules only have two I/O metabolites is not essential. However, mathematical calculations become unwieldy when the number of I/O metabolites increases. Moreover, modules with just two I/O metabolites already capture two most salient features of metabolism: its directionality, and its complex branched topology (Csete and Doyle, 2004). Such modules are a natural generalization of the linear metabolic pathway which has been extensively studied in the previous literature (Kacser and Burns, 1973; Szathmáry, 1993; Bagheri-Chaichian et al., 2003; MacLean, 2010).

Modules have two important properties. First, for any given concentrations of the two I/O metabolites, all internal metabolites in the module can achieve a unique steady state which depends only on concentrations of these I/O metabolites but not on the concentrations of any other metabolites in the network (see Proposition 1 in Materials and methods). Now consider a module μ whose I/O metabolites are (without loss of generality) labeled 1 and 2 (Figure 1A). The second property is that, at steady state, the flux through this module is $J_{μ} = y_{μ} (S_{1} - S_{2} / K_{12})$ , where

y_{μ} = F ({\vec{x}}_{μ})

is the effective reaction rate constant of module μ (Figure 1B). Importantly, $y_{μ}$ depends only on the rate matrix ${\vec{x}}_{μ}$ , but not on any other rate constants (see Corollary 2 in Materials and methods), and it can be recursively computed for any module, as described in Materials and methods. In other words, metabolic network $𝒩$ can be coarse-grained by replacing module μ at steady state with a single first-order biochemical reaction with rate $y_{μ}$ . Importantly, such coarse-graining does not alter the dynamics of any metabolites outside of module μ (see Proposition 1 in Materials and methods). This statement is the biochemical analog of the star-mesh transformation (and its generalization, Kron reduction, Rao et al., 2014) well known in the theory of electric circuits (Versfeld, 1970). The biological interpretation of these properties is that a module is somewhat isolated from the rest of the metabolic network. And vice versa, the larger network (i.e. the cell) ‘cares’ only about the total rate at which the I/O metabolites are interconverted by the module but ‘does not care’ about the details of how this conversion is enzymatically implemented. In this sense, the effective rate $y_{μ}$ quantifies the function of module μ (a macroscopic parameter) while the rates ${\vec{x}}_{μ}$ describe the specific biochemical implementation of the module (microscopic parameters).

The effective rate constant $y_{μ}$ of module μ depends on the entire rate matrix ${\vec{x}}_{μ}$ . In general, a single mutation may perturb several rate constants within a module, so that the entire shape of the function $F$ may be important. Here, I focus on a special case when each mutation perturbs one reaction (real or effective) within a module, while all others remain constant. To examine epistasis between mutations, I will also consider two different mutations that perturb two separate reactions within a module. In these special cases, we do not need to know the entire function $F$ . We only need to know how module’s effective rate constant $y_{μ}$ depends on the one or two rate constants of the perturbed reactions. When $y_{μ}$ is considered as a function of the rate constant ξ of one reaction, I write

y_{μ} = f_{1} (ξ),

and when $y_{μ}$ is considered as a function of the rate constants ξ and η of two reactions, I write

y_{μ} = f_{2} (ξ, η) .

The rate constants of all other reactions within module μ play a role of parameters in functions $f_{1}$ and $f_{2}$ .

Consider now a network $𝒩$ that has a hierarchical structure, such that there is a series of nested modules $μ \subset ν \subset \dots$ , in the sense that $A_{μ} \subset A_{ν} \subset \dots$ (Figure 1A). Since any module at steady state can be replaced with an effective first-order biochemical reaction, there exists a hierarchy of quantitative metabolic phenotypes $y_{μ}, y_{ν}, \dots$ (Figure 1B,C). These phenotypes are of course functionally related to each other. Specifically, because ν is a ‘higher-level’ module (in the sense that it contains a ‘lower-level’ module μ), the matrix ${\vec{x}}_{ν}$ can be decomposed into two submatrices ${\vec{x}}_{μ}$ and ${\vec{x}}_{ν ∖ μ}$ where the latter is the matrix of rate constants of reactions that belong to module ν but not to module μ. Since replacing the lower-level module μ with an effective reaction with rate constant $y_{μ}$ does not alter the dynamics of metabolites outside of μ, $y_{ν}$ must depend on all elements of ${\vec{x}}_{μ}$ only through $y_{μ}$ , that is,

y_{ν} = f_{1} (y_{μ}),

where rates ${\vec{x}}_{ν ∖ μ}$ act as parameters of function $f_{1}$ . Thus, in the hierarchy of metabolic phenotypes $y_{μ}, y_{ν}, \dots$ , a phenotype at each subsequent level depends on the phenotype at the preceding level according to Equation 4, and the lowest level phenotype $y_{μ}$ depends on the actual rate constants accroding to Equation 1. This hierarchy of functionally nested phenotypes is conceptually similar to the hierarchical ‘ontotype’ representation of genomic data proposed recently by Yu et al., 2016.

Quantification of epistasis

Consider a mutation A that perturbs only one rate constant $x_{i j}$ , such that the wildtype value $x_{i j}^{0}$ changes to $x_{i j}^{A}$ . This mutation can be quantified at the microscopic level by its relative effect $δ^{A} x_{i j} = x_{i j}^{A} / x_{i j}^{0} - 1$ . If the reaction between metabolites $i$ and $j$ belongs to nested modules $μ, ν, \dots$ , then mutation A may impact the functions of these modules, which can be quantified by the relative effects $δ^{A} y_{μ}$ , $δ^{A} y_{ν}$ , etc. at each level of the hierarchy.

Consider now another mutation B that only perturbs the rate constant $x_{k ℓ}$ of another reaction. Since mutations A and B perturb distinct enzymes, they by definition do not genetically interact at the microscopic level. However, if both perturbed reactions belong to the metabolic module μ (and, as a consequence, to all higher-level modules which contain μ), they may interact at the level of the function of this module, in the sense that the effect of mutation B on the effective rate $y_{μ}$ may depend on whether mutation A is present or not. Such epistasis between mutations A and B can be quantified at the level μ of the metabolic hierarchy by a number of various epistasis coefficients (Wagner et al., 1998; Hansen and Wagner, 2001; Mani et al., 2008). I will quantify it with the epistasis coefficient

ε^{A B} y_{μ} = \frac{δ^{A B} y_{μ} - δ^{A} y_{μ} - δ^{B} y_{μ}}{2 δ^{A} y_{μ} δ^{B} y_{μ}},

where $δ^{A B} y_{μ}$ denotes the effect of the combination of mutations A and B on phenotype $y_{μ}$ relative to the wildtype. Since I only consider two mutations A and B, I will write $ε y_{μ}$ instead of $ε^{A B} y_{μ}$ to simplify notations. Note that other epistasis coefficients can always be computed from $ε y_{μ}$ , $δ^{A} y_{μ}$ and $δ^{B} y_{μ}$ , if necessary. Expressions for epistasis coefficients at other levels of the metabolic hierarchy are analogous.

Results

The central goal of this paper is to understand the patterns of epistasis between mutations that affect reaction rates in the hieararchical metabolic network described above. Specifically, I am interested in two questions. (1) Given that two mutations A and B have an epistasis coefficient $ε y_{μ}$ at a lower level μ of the metabolic hierarchy, what can we say about their epistasis coefficient $ε y_{ν}$ at a higher level ν of the hierarchy? In other words, how does epistasis propagate through the metabolic hierarchy? (2) If mutation A only perturbs the activity $x_{i j}$ of one enzyme and mutation B only perturbs the activity $x_{k ℓ}$ of another enzyme that belongs to the same module μ, then what values of $ε y_{μ}$ can we expect to observe based on the topological relationship between the two perturbed reactions within module μ? In other words, what kinds of epistasis emerge in a metabolic network?

Propagation of epistasis through the hierarchy of metabolic phenotypes

Assuming that the effects of both individual mutations and their combined effect at the lower-level μ are small, it follows from Equation 4 and Equation 5 that

ε y_{ν} = \frac{ε y_{μ}}{C} + \frac{H}{2 C^{2}} + o (1),

where $C = f_{1}^{'} y_{μ} / y_{ν}$ and $H = f_{1}^{''} y_{μ}^{2} / y_{ν}$ are the first- and second-order control coefficients of the lower-level module μ with respect to the flux through the higher-level module ν and $o (1)$ denotes all terms that vanish as the effects of mutations tend to zero (see Materials and methods for details). Note that Equation 6 is a special case of a more general Equation 49 which describes the case when mutations affect multiple enzymes. Equation 6 defines a linear map $ϕ$ with slope $1 / C$ and a fixed point $\bar{ε} = - H {(2 C (1 - C))}^{- 1}$ , which both depend on the topology of the higher-level module ν and the rate constants ${\vec{x}}_{ν ∖ μ}$ .

To gain some intuition for how the map $ϕ$ governs the propagation of epistasis from a lower level μ to a higher level ν, suppose that module ν is a linear metabolic pathway. In this case, it is intuitively clear that function $f_{1}$ is monotonically increasing (i.e. the higher $y_{μ}$ , the more flux can pass through the linear pathway ν) and concave (i.e. as $y_{μ}$ grows, other reactions in ν become increasingly more limiting, such that further gains in $y_{μ}$ yield smaller gains in $y_{ν}$ ). Indeed, it is easy to show that $C = {(1 + α y_{μ})}^{- 1} > 0$ and $H = - 2 α y_{μ} {(1 + α y_{μ})}^{- 2} < 0$ , where α is a positive constant that depends on other reactions in the pathway (see Materials and methods for details). It then immediately follows that any zero or negative epistasis $ε y_{μ}$ that already arose at the lower level would propagate to negative epistasis $ε y_{ν}$ at the level of the linear pathway ν. Moreover, since $C < 1$ , the fixed point of the map in Equation 6 is unstable. Therefore, if epistasis $ε y_{μ}$ was already sufficiently large at the lower level, it would induce even larger positive epistasis $ε y_{ν}$ at the level of the linear pathway ν. In fact, when module ν is a linear pathway, $\bar{ε} = 1$ , so that $ε y_{ν} > 1$ whenever $ε y_{μ} > 1$ .

The first result of this paper is the following theorem, which shows that the same rules of propagation of epistasis hold not only for a linear pathway but for any module (Figure 2).

Figure 2

Download asset Open asset

Propagation of epistasis.

Properties of Equation 6 that maps lower-level epistasis $ε y_{μ}$ onto higher-level epistasis $ε y_{ν}$ . Slope $1 / C$ and fixed point $\bar{ε}$ depend on the topology and the rate constants of the higher-level module ν, but they are bounded, as shown. Thus, the fixed point $\bar{ε}$ of this map lies between 0 and 1 and is always unstable (open circle).

Theorem 1

For any module ν,

0 \leq C \leq 1

and

0 \leq \bar{ε} \leq 1 .

The proof of Theorem 1 is given in Materials and methods. Its main idea is the following. The functional form of $f_{1}$ in Equation 4 depends on the topology of module ν. Since the number of topologies of ν is infinite, we might a priori expect that there is also an infinite number of functional forms of $f_{1}$ . However, this is not the case. In fact, all higher-level modules that contain a lower-level module fall into three topological classes defined by the location of the lower-level module with respect to the I/O metabolites of the higher-level module (see Proposition 2 and Figure 7 in Materials and methods). To each topological class corresponds a parametric family of the function $f_{1}$ , so that there are only three such families. For each family, the values of $C$ and $H$ can be explicitly calculated, yielding the bounds in Equation 7 and Equation 8.

Equation 6 together with Equation 7 and Equation 8 show that the linear map $ϕ$ from epistasis at a lower-level to epistasis at the higher-level has an unstable fixed point between 0 and 1 (Figure 2). This implies that negative epistasis at a lower level of the metabolic hierarchy necessarily induces negative epistasis of larger magnitude at the next level of the hierarchy, that is, $ε y_{ν} \leq ε y_{μ} < 0$ . Therefore, once negative epistasis emerges somewhere along the hierarchy, it will induce negative epistasis at all higher levels of the hierarchy, irrespectively of the topology or the kinetic parameters of the network.

Similarly, if epistasis at the lower level of the metabolic hierarchy is positive and strong, $ε y_{μ} > 1$ , it will induce even stronger positive epistasis at the next level of the hierarchy, that is, $ε y_{ν} \geq ε y_{μ} > 1$ . Therefore, once strong positive epistasis emerges somewhere in the metabolic hierarchy, it will induce strong positive epistasis of larger magnitude at all higher levels of the hierarchy, irrespectively of the topology or the kinetic parameters of the network. If positive epistasis at a lower level of the hierarchy is weak, $0 < ε y_{μ} < 1$ , it could induce either negative, weak positive or strong positive epistasis at the higher level of the hierarchy, depending on the precise value of $ε y_{μ}$ , the topology of the higher-level module ν and the microscopic rate constants ${\vec{x}}_{ν ∖ μ}$ .

In summary, there are three regimes of how epistasis propagates through a hierarchical metabolic network. Negative and strong positive epistasis propagate robustly irrespectively of the topology and kinetic parameters of the metabolic network, whereas the propagation of weakly positive epistasis depends on these details. The strongest qualitative prediction that follows from Theorem 1 is that negative epistasis for a lower-level phenotype cannot turn into positive epistasis for a higher-level phenotype, but the converse is possible.

Emergence of epistasis between mutations affecting different enzymes

Which of the three regimes described above can emerge in metabolic networks and under what circumstances? In other words, if two mutations affect the same module, are there any constraints on epistasis that might arise at the level of the effective rate constant of this module? To address this question, I consider two mutations A and B that affect the rate constants of different single reactions within a given module.

Consider a relatively simple module ν shown in Figure 1A and two mutations A and B that affect the reactions, as shown in Figure 3A. I will now show that the epistasis coefficient $ε y_{ν}$ can take values in all three domains described above, depending on the biochemical details of this module. Using the recursive procedure for evaluating $y_{μ}$ described in Materials and methods, it is straightforward to obtain an analytical expression for $y_{ν}$ as a function of the rate matrix ${\vec{x}}_{ν}$ , from which $ε y_{μ}$ can also be obtained (see Materials and methods for details). To demonstrate that $ε y_{μ}$ can take values below 0, between 0 and 1, and above 1, it is convenient to keep all of the rate constants fixed except for the rate constant $z \equiv x_{34}$ of a reaction that is not affected by mutations A or B, as shown in Figure 3A. Figure 3B then shows how the epistasis coefficient $ε y_{μ}$ varies as a function of $z$ for one particular choice of all other rate constants. When $z$ is small, $ε y_{μ} < 0$ . As $z$ increases, it becomes weakly positive ( $0 < ε y_{μ} < 1$ ) and eventually strongly positive ( $ε y_{μ} > 1$ ). Thus, in my model, there are no fundamental constraints on the types of epistasis that can emerge between mutations.

Figure 3

Download asset Open asset

Emergence of epistasis and its dependence on the topological relationship between the reactions affected by mutations.

(A) An example of a simple module ν (same as in Figure 1A) where negative, weak positive and strong positive epistasis can emerge between two mutations A and B. (B) Epistasis between mutations A and B at the level of module ν depicted in (A) as a function of the rate constant $z$ of a third reaction. The values of other parameters of the network are given in Materials and Methods. (C) An example of a simple module where reactions affected by mutations are strictly parallel. In such cases, epistasis for the effective rate constant $y_{ν}$ is non-positive. Dashed blue lines highlight paths that connect the I/O metabolites and each contain only one of the affected reactions. (D) An example of a simple module where reactions affected by mutations are strictly serial. In such cases, epistasis for the effective rate constant $y_{ν}$ is equal to or greater than 1 (i.e. strongly positive). Dashed blue line highlights a path that connects the I/O metabolites and contains both affected reactions.

This simple example also reveals that not only the value but also the sign of epistasis generically depend on the rates of other reactions in the network, such that other mutations or physiological changes in enzyme expression levels can modulate epistasis sign and strength. In other words, ‘higher-order’ and ‘environmental’ epistasis are generic features of metabolic networks.

Upon closer examination, the toy example in Figure 3 also suggests that the sign of $ε y_{ν}$ may depend predictably on the topological relationship between the affected reactions. When $z = 0$ , the two reactions affected by mutations are parallel, and epistasis is negative. When $z$ is very large, most of the flux between the I/O metabolites passes through $z$ such that the two reactions affected by mutations become effectively serial, and epistasis is strongly positive. Other toy models show consistent results: epistasis between mutations affecting different reactions in a linear pathway is always positive and epistasis between mutations affecting parallel reactions is negative (see Materials and methods for details). These observation suggest an interesting conjecture. Do mutations affecting parallel reactions always exhibit negative epistasis and do mutations affecting serial reactions always exhibit positive epistasis? In fact, such relationship between sign of epistasis and topology has been previously suggested in the literature (e.g. Dixon et al., 2009; Lehner, 2011).

To formalize and mathematically prove this hypothesis, I first define two reactions as parallel within a given module if there exist at least two distinct simple (i.e. non-self-intersecting) paths that connect the I/O metabolites, such that each path lies within the module and contains only one of the two focal reactions. Analogously, two reactions are serial within a given module if there exists at least one simple path that connects the I/O metabolites, lies within the module and contains both focal reactions.

According to these definitions, two reactions can be simultaneously parallel and serial, as, for example, the reactions affected by mutations A and B in Figure 3A. I call such reaction pairs serial-parallel. I define two reactions to be strictly parallel if they are parallel but not serial (Figure 3C) and I define two reactions to be strictly serial if they are serial but not parallel (Figure 3D). Thus, each pair of reactions within a module can be classified as either strictly parallel, strictly serial or serial-parallel.

The second result of this paper is the following theorem.

Theorem 2

Let ξ and η be the rate constants of two different reactions in module μ. Suppose that mutation A perturbs only one of these reactions by $δ^{A} ξ$ and mutation B perturbs only the other reaction by $δ^{B} η$ . In the limit $δ^{A} ξ \to 0$ and $δ^{B} η \to 0$ , the following statements are true. If the affected reactions are strictly parallel then $ε y_{μ} \leq 0$ . If the affected reactions are strictly serial, then $ε y_{μ} \geq 1$ .

The detailed proof of this theorem is given in Materials and methods. Its key ideas and the logic are the following. It follows from Equation 3 and Equation 5 that

ε y_{μ} = \frac{H_{ξ η}}{2 C_{ξ} C_{η}} + o (1),

where $C_{ξ} = \frac{\partial f_{2}}{\partial ξ} \frac{ξ}{y_{μ}}$ , $C_{η} = \frac{\partial f_{2}}{\partial η} \frac{η}{y_{μ}}$ , $H_{ξ η} = \frac{\partial^{2} f_{2}}{\partial ξ \partial η} \frac{ξ η}{y_{μ}}$ are the first- and second-order control coefficients of the affected reactions with respect to the flux through module μ and $o (1)$ denotes terms that vanish when $δ^{A} ξ$ and $δ^{B} η$ approach zero (see Materials and methods for details). Note that Equation 9 was previously derived by Chiu et al., 2012.

To compute the epistasis coefficient $ε y_{μ}$ for an arbitrary module μ, we need to know the first and second derivatives of function $f_{2}$ . Analogous to function $f_{1}$ , there is a finite number of parametric families to which $f_{2}$ can belong. Specifically, all modules fall into nine topological classes with respect to the locations of the affected reactions within the module (see Figure 8), and each of these topologies defines a parametric family of function $f_{2}$ (see Proposition 3 and its Corollary 3 in Materials and methods). Most of these topological classes are broad and contain modules where the affected reactions are strictly parallel, those where they are strictly serial as well as those where they are serial-parallel. And it is easy to show that not all members of each topological class have the same sign of $ε y_{μ}$ . However, modules from the same topological class where the affected reactions are strictly parallel or strictly serial fall into a finite number of topological sub-classes (see Figure 10 through Figure 14, Table 2 and Table 3). Overall, there are only 17 distinct topologies where the affected reactions are strictly parallel (Table 2), which define 17 parametric sub-families of function $f_{2}$ . For all members of these sub-families, Equation 9 yields $ε y_{μ} \leq 0$ (see Proposition 7 in Materials and methods). Similarly, there are only 11 distinct topologies where the affected reactions are strictly serial (Table 3), which define 11 parametric sub-families of function $f_{2}$ . For all members of these sub-families, Equation 9 yields $ε y_{μ} \geq 1$ (see Proposition 8 in Materials and methods).

The results of Theorem 1 and Theorem 2 together imply that the topological relationship at the microscopic level between two reactions affected by mutations constrains the values of their epistasis coefficient at all higher phenotypic levels. Specifically, if negative epistasis is detected at any phenotypic level, the affected reactions cannot be strictly serial. And conversely, if strong positive epistasis is detected at any phenotypic level, the affected reactions cannot be strictly parallel. In this model, weak positive epistasis in the absence of any additional information does not imply any specific topological relationship between the affected reactions.

Sensitivity of results with respect to the magnitude of mutational effects

Both Theorem 1 and Theorem 2 strictly hold only when the effects of both mutations are infinitesimal. Next, I investigate how these results might change when the mutational effects are finite.

Propagation of epistasis between mutations with finite effect sizes

As mentioned above and discussed in detail in Materials and methods, all higher-level modules that contain a lower-level module fall into three topological classes, which I label $ℳ^{b}$ , $ℳ^{io}$ and $ℳ^{i}$ , depending on the location of the lower-level module within the higher- level module (see Figure 7). The topological class specifies the parametric family of the function $f_{1}$ which maps the effective rate constant $y_{μ}$ onto the effective rate constant $y_{ν}$ (see Equation 4). For all modules from the topological class $ℳ^{b}$ , function $f_{1}$ is linear (see Equation 29), which implies that the results of Theorem 1 hold exactly even when the effects of mutations are finite. For modules from the topological classes $ℳ^{io}$ and $ℳ^{i}$ , function $f_{1}$ is hyperbolic (see Equation 30 and Equation 31), so that the results of Theorem 1 may not hold when the effects of mutations are finite. To test the validity of Theorem 1 in these cases, I calculated the non-linear function $ϕ$ that maps the epistasis coefficient $ε y_{μ}$ onto the epistasis coefficient $ε y_{ν}$ for 1000 randomly generated modules from each of the two topological classes and for mutations that increase or decrease the effective rate constant of the lower-level module $y_{μ}$ by up to 50% (see Materials and methods for details).

The validity of Theorem 1 depended on the sign of mutational effects. When at least one of the two mutations had a negative effect on $y_{μ}$ , map $ϕ$ had the same properties as described in Theorem 1, even for mutations with large effect, that is, it had a fixed point $\bar{ε}$ in the interval $[0, 1]$ and this fixed point was unstable. When the effects of both mutations on $y_{μ}$ were positive and small, these results also held in about 82% of sampled modules (see Figure 4A, Figure 4—figure supplement 1, Figure 4—figure supplement 2). In the remaining ∼18% of sampled modules, the fixed point $\bar{ε}$ shifted slightly above 1. As the magnitude of mutational effects increased, the fraction of sampled modules with $\bar{ε} > 1$ grew, reaching 42% when both mutations increased $y_{μ}$ by 50%. In most of these cases, $\bar{ε}$ remained below 2, and I found only one module with $\bar{ε} > 4$ (Figure 4A, Figure 4—figure supplement 1, Figure 4—figure supplement 2). Whenever the fixed point existed, it was unstable, with the exception of 12 modules for which $ϕ$ was very close to the identity map. For 289 modules (14.5%), the fixed point disappeared when both mutations increased $y_{μ}$ by 50%. In all these cases, $ε y_{ν} < ε y_{μ}$ , indicating that even large positive epistasis may decline as it propagates through the metabolic hierarchy when the effects of mutations are finite.

Figure 4 with 3 supplements see all

Download asset Open asset

Sensitivity of results of Theorem 1 and Theorem 2 with respect to the magnitude of mutational effects.

(A) Distribution of the position of the fixed point $\bar{ε}$ of the function $ϕ$ that maps lower-level epistasis $ε y_{μ}$ onto higher-level epistasis $ε y_{ν}$ in modules with random parameters and for mutations with positive effects on $y_{μ}$ (see text and Materials and methods for details). All cases are shown in Figure 4—figure supplement 1 and Figure 4—figure supplement 2. The effect size of both mutations is indicated on each panel. ‘No f.p'. indicates that no fixed point exists. (B) Fraction of sampled modules (averaged across generating topologies) where mutations affect strictly serial reactions but the epistasis coefficient is less than 1, contrary to the statement of Theorem 2 (see text and Materials and methods for details). All cases stratified by generating topology are shown in Figure 4—figure supplement 3.

Emergence of epistasis between mutations with finite effect sizes

As mentioned above and discussed in detail in Materials and methods, modules where the reactions affected by mutations are strictly parallel fall into 17 topological classes (see Table 2) and modules where the reactions affected by mutations are strictly serial fall into 11 topological classes (see Table 3). The topological class specifies the parametric family of the function $f_{2}$ which maps the rate constants ξ and η of the affected reactions onto the effective rate constant $y_{μ}$ . To test how well Theorem 2 holds when the effects of mutations are finite, I calculated $ε y_{μ}$ for randomly generated modules from these topological classes and for mutations increasing or decreasing ξ and η by up to 50% (see Materials and methods for details).

The validity of Theorem 2 depended most strongly on the topological relationship between the reaction affected by mutations. Whenever the affected reactions were strictly parallel, the epistasis coefficient at the level of module μ was always less than or equal to zero, even when mutations perturbed the rate constants by as much as 50%, consistent with Theorem 2. This was also true for strictly serial reactions, as long as both mutations had positive effects. When the affected reaction were strictly serial and at least one of the mutations had a negative effect, the epistasis coefficient was always positive, but in some cases it was less than 1 (see Figure 4B, Figure 4—figure supplement 3), in disagreement with Theorem 2. This indicates that when the effects of mutations are not infinitesimal, even mutations that affect strictly serial reactions can potentially produce negative epistasis for higher-level phenotypes.

Taken together, these results suggest that both Theorem 1 and Theorem 2 extend reasonably well, but not perfectly, to mutations with finite effect sizes. The domains of validity of both theorems appear to depend on the sign of mutational effects. The way in which the theorems break down as their assumptions are violated appears to be stereotypical: when the mutational effects increase, more types of mutations produce weak epistasis, and the bias toward negative epistasis increases during propagation from lower to higher levels of the metabolic hierarchy.

Beyond first-order kinetics: epistasis in a kinetic model of glycolysis

The results of previous sections revealed a relationship between network topology and the ensuing epistasis coefficients in an analytically tractable model. However, the assumptions of this model are most certainly violated in many realistic situations. It is therefore important to know whether the same or similar rules of epistasis emergence and propagation hold beyond the scope of this model. I address this question here by analyzing a computational kinetic model of glycolysis developed by Chassagnole et al., 2002. This model keeps track of the concentrations of 17 metabolites, reactions between which are catalyzed by 18 enzymes (Figure 5A and Figure 5—figure supplement 1; see Materials and methods for details). This model falls far outside of the analytical framework introduced in this paper: some reactions are second-order, reaction kinetics are non-linear, and in several cases the reaction rates are modulated by other metabolites (Chassagnole et al., 2002).

Figure 5 with 3 supplements see all

Download asset Open asset

Epistasis in a kinetic model of *Escherichia coli* glycolysis.

(A) Simplified schematic of the model (see Figure 5—figure supplement 1 for details). Different shades of gray in the background highlight four modules as indicated (see text). Light blue circles represent metabolites. Reactions are shown as lines with dark gray boxes. The enzymes catalyzing reactions whose control coefficients with respect to the flux through the module are positive are named; other enzyme names are ommitted for clarity (see Table 5 and Table 6 for abbreviations). Three reactions, catalyzed by PGI, PFK, PGDH, for which the epistasis coefficients are shown in panel B are highlighted in dark blue, red, and orange, respectively. (B) Epistasis coefficients for flux through each module between mutations perturbing the respective reactions, computed at steady state (see text and Materials and methods for details). Reactions catalyzed by PGI and PGDH are strictly parallel (path g6p-f6p-fdp-gap contains only PGI, path g6p-6pg-ribu5p-gap contains only PGDH and there is no simple path in UGPP between g6p and gap that contains both PGI and PGDH). Reactions catalyzed by PGI and PFK are serial-parallel (path g6p-f6p-fdp-gap contains both reactions, path g6p-f6p-gap contains only PGI, path g6p-6pg-ribu5p-f6p-fdp-gap contains only PFK). Reactions catalyzed by PFK and PGDH are also serial-parallel (path g6p-6pg-ribu5p-f6p-fdp-gap contains both reactions, path g6p-f6p-fdp-gap contains only PFK, path g6p-6pg-ribu5p-gap contains only PGDH).

Testing the predictions of the analytical theory in this computational model faces two complications. First, in a non-linear model, modules are no longer fully characterized by their effective rate constants, even at steady state. Instead, each module is described by the flux between its I/O metabolites which non-linearly depends on the concentrations of these metabolites. Consequently, the effects of mutations and epistasis coefficients also become functions of the I/O metabolite concentrations. An epistasis coefficient at the level of module ν can still be evaluated according to Equation 5, with $y_{ν}$ now representing the flux through module ν evaluated at a particular concentration of the I/O metabolites. For simplicity, I computationally find the steady state of the full glycolysis network and evaluate the epistasis coefficients only at this steady state, that is, for each module, I keep the concentrations of the I/O metabolites fixed at their steady-state values for the full network (see Materials and methods for details).

The second complication is that some control coefficients are so small that they fall below the threshold of numerical precision. Perturbing such reactions has no detectable effect on flux (Figure 5—figure supplement 2). In the analysis that follows, I ignore such reactions because the epistasis coefficient defined by Equation 5 can only be computed for mutations with non-zero effects on flux. In addition, the control coefficients of some reactions are negative, which implies that an increase in the rate of such reaction decreases the flux through the module (Figure 5—figure supplement 2). I also ignore such reactions because there is no analog for them in the analytical theory presented above. After excluding seven reactions for these reasons, I examine epistasis in 55 pairs of mutations that affect the remaining 11 reactions.

The glycolysis network shown in Figure 5A (see also Figure 5—figure supplement 1) can be naturally partitioned into four modules which I name ‘LG’ (lower glycolysis), ‘UGPP’ (upper glycolysis and pentose phosphate), ‘GPP’ (glycolysis and pentose phosophate), and ‘FULL’. Modules LG and UGPP are non-overlappng and both of them are nested in module GPP which in turn is nested in the FULL module. Thus, at least for some reaction pairs it is possible to calculate epistasis coefficients at three levels of metabolic hierarchy. There are three such pairs, and the results for them are shown in Figure 5B. Epistasis for the remaining pairs of reactions can be evaluated only at one or two levels of the hierarchy because these reactions belong to different modules at the lowest levels or because their individual effects are too small. The results for all reaction pairs are shown in Figure 5—figure supplement 3.

The strongest qualitative prediction of the analytical theory described above is that negative epistasis for a lower-level phenotype cannot turn into positive epistasis for a higher-level phenotype, while the converse is possible. Figure 5B and Figure 5—figure supplement 3 show that the data are consistent with this prediction. Another prediction is that epistasis between strictly parallel reactions should be negative. There is only one pair of reactions that are strictly parallel, those catalyzed by glucose-6-phosphate isomerase (PGI) and 6-phosphogluconate dehydrogenase (PGDH), and indeed the epistasis coefficients between mutations affecting these reactions are negative at all levels of the hierarchy (Figure 5B). Finally, the analytical theory predicts that mutations affecting strictly serial reactions should exhibit strong positive epistasis. There are 36 reaction pairs that are strictly serial. Epistasis is positive between mutations in 33 of them, and it is strongly positive in 17 of them (Figure 5—figure supplement 3). Three pairs of strictly serial reactions (those where one reaction is catalyzed by PK and the other is catalyzed by PGI, PGDH, or PFK) exhibit negative epistasis (Figure 5—figure supplement 3). These results suggest that, although one may not be able to naively extrapolate the rules of emergence and propagation of epistasis derived in the simple analytical model to more complex networks, some generalized versions of these rules may nevertheless hold more broadly.

Discussion

Genetic interactions are a powerful tool in genetics, and they play an important role in evolution. Yet, how epistasis emerges from the molecular architecture of the cell and how it propagates to higher-level phenotypes, such as fitness, remains largely unknown. Several recent studies made a statistical argument that the structure of the fitness landscape (and, as a consequence, the epistatic interactions between mutations at the level of fitness) may be largely independent of the underlying molecular architecture of the organism (Martin, 2014; Lyons et al., 2020; Reddy and Desai, 2020). If mutations are typically highly pleiotropic (i.e. affect many independent phenotypes relevant for fitness) or are engaged in a large number of idiosyncratic epistatic interactions with other mutations in the genome, the resulting fitness landscapes converge to certain limiting shapes, such as the Fisher’s geometric model (Martin, 2014; Tenaillon, 2014). To what extent these arguments indeed apply in practice is unclear. But if they do, most genetic interactions detected at the fitness level may be uninformative about the architecture of the underlying biological networks.

In this paper, I took a ‘mechanistic’ approach, which is in a sense orthogonal to the statistical one. In my model of a hierarchical metabolic network, mutations are highly pleiotropic (a mutation in any enzyme affects all the fluxes in the module) and highly epistatic (a mutation in any enzyme interacts with mutations in any other enzyme). Yet, these pleiotropic and epistatic effects appear to be sufficiently structured that some information about the topology of the network is preserved through all levels of the hierarchy. Indeed, the emergence and propagation of epistasis follow two simple rules in my model. First, once epistasis emerges at some level of the hierarchy, its propagation through the higher levels of the hierarchy depends weakly on the details of the network. Specifically, negative epistasis at a lower level induces negative epistasis at all higher levels and strong positive epistasis induces strong positive epistasis at all higher levels, irrespectively of the topology or the kinetic parameters of the network. Second, what type of epistasis emerges in the first place depends on the topological relationship between the reactions affected by mutations. In particular, negative epistasis emerges between mutations that affect strictly parallel reactions and positive epistasis emerges between mutations that affect strictly serial reactions. Insofar as my model is relevant to nature, the key conclusion from it is that epistasis at high-level phenotypes carries some, albeit incomplete, information about the underlying topological relationship between the affected reactions.

These results have implications for the interpretation of empirically measured epistasis coefficients. It is often assumed that a positive epistasis coefficient between mutations that affect distinct genes signals that their gene products act in some sense serially, whereas a negative epistasis coefficient is a signal of genetic redundency, that is, a parallel relationship between gene products (Dixon et al., 2009). My results suggest that this reasoning is generally correct, but that the relationship between epistasis and topology is more nuanced. In particular, the sign of the epistasis coefficient in my model constrains but does not uniquely specify the topological relationship, such that a negative epistasis coefficient implies that the affected reactions are not strictly serial (but may or may not be strictly parallel) and an epistasis coefficient exceeding unity excludes a strictly parallel relationship (but does not necessarily imply a strictly serial relationship). My model suggests that one should also be careful with inferences going in the other direction, that is, extrapolating the patterns of epistasis measured at the biochemical level to those at the level of fitness. For example, if one wishes to infer the past evolutionary trajectory of an enzyme and finds two amino acid changes that exhibit a positive interaction at the level of enzymatic activity, it does not automatically imply that these mutations will exhibit a positive interaction at the level of fitness.

The strongest results presented here rely on several assumptions. I proved Theorem 1 and Theorem 2 in the limit of vanishingly small mutational effects. Some results of the metabolic control analysis, notably the summation theorem, are sensitive to this assumption (Bagheri-Chaichian et al., 2003; Bagheri and Wagner, 2004). To test the sensitivity of my analytical results with respect to this assumption, I used numerical simulations of networks with randomly sampled kinetic parameters and found that the results hold reasonably well when the effects of mutations are not infinitesimal.

The most restrictive assumption in the present work is that of first-order kinetics. Networks with only first-order kinetics clearly fail to capture some biologically important phenomena, such as sign epistasis (Weinreich et al., 2005; Chou et al., 2014; Ewald et al., 2017; Kemble et al., 2020). I discuss possible ways to relax this assumption below. But at present, a major question remains whether the rules of epistasis and propagation described here hold for realistic biological networks and whether they can be directly used to interpret empirical epistasis coefficients. My analysis of a fairly realistic computational model of glycolysis cautions against overinterpreting empirical epistasis coefficients using the rules derived here. But it also suggests that more general rules of propagation and emergence of epistasis may be found for more realistic networks. Thus, the simple rules derived here should probably be thought of as null expectations.

Relaxing the first-order kinetics assumption is analytically challenging because it is critical for replacing a module with a single effective reaction without altering the dynamics of the rest of the network. Although such lossless replacement is almost certainly not possible in networks with more complex kinetics, advanced network coarse-graining techniques may offer a promising way forward (Rao et al., 2014). Flux balance analysis (FBA) is an alternative approach (Orth et al., 2010). FBA is appealing because it entirely removes the dependence of the model on reaction kinetics. However, this comes at a substantial cost. In FBA models, fitness and other high-level phenotypes become independent of the internal kinetic parameters, which is clearly unrealistic. Nevertheless, FBA is often very good at capturing the effects of mutations that change the topology of metabolic networks, such as reaction additions and deletions (reviewed in Gu et al., 2019). At the same time, there is no natural way within FBA to model mutations that perturb reaction kinetics (He et al., 2010; Alzoubi et al., 2019). In short, FBA and my approach are complementary (see Appendix 5 for a more detailed discussion).

Generic properties of epistasis in biological systems

Simple models help us identify generic phenomena—those that are shared by a large class of systems—which should inform our ‘null’ expectations in empirical studies. Deviations from such null in a given system under examination inform us about potentially interesting peculiarities of this system. The model presented here suggests several generic features of epistasis between genome-wide mutations.

Epistasis has two contributions

My analysis shows that the value of an epistasis coefficient measured for a higher level phenotype is a result of two contributions (Domingo et al., 2019), propagation and emergence, which correspond to two terms in Equation 6 (or the more general Equation 49). The first term, propagation, shows that if two mutations exhibit epistasis for a lower-level phenotype they also generally exhibit epistasis for a higher-level phenotype. The second contribution comes from the fact that lower-level phenotypes map onto higher-level phenotypes via non-linear functions. This is true even in a simple model with linear kinetics considered here. As a result, even if two mutations exhibit no epistasis at the lower-level phenotype, epistasis must emerge for the higher-level phenotype, as previously pointed out by multiple authors (e.g. Kacser and Burns, 1981; DePristo et al., 2005; Martin et al., 2007; Chiu et al., 2012; Otwinowski et al., 2018; Domingo et al., 2019; Husain and Murugan, 2020).

Epistasis depends on the genetic background and environment

My analysis shows that the value of an epistasis coefficient for a particular pair of mutations is in large part determined by the topological relationship between reactions affected by them. Since the topology of the metabolic network itself depends on the genotype (which genes are present in the genome) and on the environment (which enzymes are active or not), the topological relationship between two specific reactions might change if, for example, a third mutation knocks out another enzyme or if an enzyme is up- or down-regulated due to an environmental change (see Figure 3). Thus, we should generically expect epistasis between mutations to depend on the environment and on the presence or absence of other mutations in the genome. In other words, $G \times G \times G$ interactions (higher-oder epistasis) and $G \times G \times E$ interactions (environmental epistasis) should be common (Snitkin and Segrè, 2011; Flynn et al., 2013; Lindsey et al., 2013; Taylor and Ehrenreich, 2015; Sailer and Harms, 2017a). This fact complicates the interpretation of inter-gene epistasis since mutations in the same pair of genes can exhibit qualitatively different genetic interactions in different strains, organisms and environments, as has been observed (St Onge et al., 2007; Musso et al., 2008; Tischler et al., 2008; Dowell et al., 2010; Heigwer et al., 2018; Li et al., 2019). However, the situation may not be hopeless because the topological relationship between two reactions cannot change arbitrarily after addition or removal of a single reaction. For example, if two reactions are strictly parallel, removing a third reaction does not alter their relationship (see Proposition 5). Thus, comparing matrices of epistasis coefficients measured in different environments or genetic backgrounds could inform us about how the organism rewires its metabolic network in response to these perturbations (St Onge et al., 2007; Musso et al., 2008; Heigwer et al., 2018; Li et al., 2019).

Skew in the distribution of epistasis coefficients

Studies that measure epistasis for fitness-related phenotypes among genome-wide mutations usually find both positive and negative epistases, but the preponderance of positive and negative epistasis varies. Some authors reported a skew toward positive interactions among deleterious mutations (Jasnos and Korona, 2007; He et al., 2010; Johnson et al., 2019), whereas others reported a skew toward negative interactions (Szappanos et al., 2011; Costanzo et al., 2016). Beneficial mutations appear to predominantly exhibit negative epistasis, also known as ‘diminishing returns’ epistasis (e.g. Martin et al., 2007; Khan et al., 2011; Chou et al., 2011; Kryazhimskiy et al., 2014; Schoustra et al., 2016). The reasons for these patterns are currently unclear. Several recent theoretical papers offer possible statistical explanations for them (Martin, 2014; Lyons et al., 2020; Reddy and Desai, 2020). On the other hand, mechanistic predictions for the distribution of epistasis coefficients are not yet available (but see Sanjuán and Nebot, 2008; Macía et al., 2012; Chiu et al., 2012). The present work does not directly address this problem either, but it provides some additional clues.

First, my model shows that the sign of epistasis at least to some extent reflects the topology of the network. Thus, the distribution of epistasis coefficients at high-level phenotypes in real organisms should ultimately depend on the preponderance of different topological relationships between the edges in biological networks. It then seems a priori unlikely that positive and negative interactions would be exactly balanced. Thus, we should expect the distribution of epistasis coefficients to be skewed in one or another direction.

The second observation is that in the metabolic model considered here a positive epistasis coefficient at one level of the hierarchy can turn into a negative one at a higher level, but the reverse is not possible. This bias toward negative epistasis at higher-level phenotypes appears to be even stronger in networks with saturating kinetics (Figure 5 and Figure 5—figure supplement 3).

The third observation is that epistasis among beneficial and deleterious mutations affecting metabolic genes should be identical at the level where they arise, provided that beneficial and deleterious mutations are identically distributed among metabolic reactions. Thus, a stronger skew toward negative epistasis among beneficial mutations at the level of fitness could arise in my model for two mutually non-exclusive reasons. One possibility is that beneficial mutations tend to affect certain special subsets of genes, those that predominantly give rise to negative epistasis. For example, beneficial mutations may for some reason predominantly arise in enzymes that catalyze strictly parallel reactions. Another possibility is that when epistasis between beneficial mutations propagates through the metabolic hierarchy it tends to exhibit a stronger negative bias compared to epistasis between deleterious mutations. Indeed, this phenomenon arises in my model among mutations with large effect (see Figure 4A, Figure 4—figure supplement 1 and Figure 4—figure supplement 2).

Epistasis is generic

Perhaps the most important—and also the most intuitive—conclusion of this work is that we should expect epistasis for high-level phenotypes, such as fitness, to be extremely common. Consider first a unicellular organism growing exponentially. Its fitness is fully determined by its growth rate, which can be thought of as the rate constant of an effective biochemical reaction that converts external nutrients into cells (see Appendix 6 for a simple mathematical model of this statement). In other words, growth rate is the most coarse-grained description of a metabolic network and, as such, it depends on the rate constants of all underlying biochemical reactions. Many previous studies have shown that within-protein epistasis is extremely common (e.g. Lunzer et al., 2005; DePristo et al., 2005; Sailer and Harms, 2017b; Husain and Murugan, 2020). Present work shows that, once epistasis arises at the level of protein activity, it will propagate all the way up the metabolic hierarchy and will manifest itself as epistasis for growth rate. It also suggests that growth rate is a generically non-linear function of the rate constants of the underlying biochemical reactions, such that all mutations that affect growth rate individually would also exhibit pairwise epistasis for growth rate with each other (Kacser and Burns, 1981; DePristo et al., 2005; Martin et al., 2007; Chiu et al., 2012; Otwinowski et al., 2018; Husain and Murugan, 2020).

In more complex organisms and/or in certain variable environments, it may be possible to decompose fitness into multiplicative or additive components, for example, plant’s fitness may be equal to the product of the number of seeds it produces and their germination probability, as pointed out by Chou et al., 2011. Then, mutations that affect different components of fitness would exhibit no epistasis. However, such situations should be considered exceptional, as they require fitness to be decomposable and mutations to be non-pleiotropic.

If epistasis is in fact generic for high-level phenotypes, why do we not observe it more frequently? For example, a recent study that tested almost all pairs of gene knock-out mutations in yeast found genetic interactions for fitness for only about 4% of them (Costanzo et al., 2016). One possibility is that many pairs of mutations exhibit epistasis that is simply too small to detect with current methods. As the precision of fitness measurements improves, we would then expect the fraction of interacting gene pairs to grow. Another possibility is that systematic shifts in the distribution of estimated epistasis coefficients away from zero are taken by researchers as systematic errors rather than real phenomena, and are normalized out. Thus, some epistasis that would otherwise be detectable may be lost during data processing.

If epistasis is indeed as ubiquituous as the present analysis suggests, it would call into question how observations of inter-gene epistasis are interpreted. In particular, contrary to a common belief, a non-zero epistasis coefficient does not necessarily imply any specific functional relationship between the components affected by mutations beyond the fact that both components somehow contribute to the measured phenotype (Boyle et al., 2017). The focus of future research should then be not merely on documenting epistasis but on developing theory and methods for a robust inference of biological relationships from measured epistasis coefficients.

Class	$a$	$b$	Shared metab.	Verbal description	$m_{ℳ}$	$A_{ℳ}$	Equation for f₂
$ℳ^{b, io, IO}$	$(1, 2)$	$(1, 3)$	1	Bypass and i/o reactions, shared I/O metabolite	2	${1, 2, 3}$	Equation (34)
$ℳ^{b, i, \emptyset}$	$(1, 2)$	$(3, 4)$	–	Bypass and internal reactions, no shared metabolies	2	${1, 2, 3, 4}$	Equation (35)
$ℳ^{io, io, I}$	$(1, 3)$	$(2, 3)$	3	i/o reactions, shared internal metabolite	1	${1, 2, 3}$	Equation (36)
$ℳ^{io, io, IO}$	$(1, 3)$	$(1, 4)$	1	i/o reactions, shared I/O metabolite	2	${1, 2, 3, 4}$	Equation (37)
$ℳ^{io, io, \emptyset}$	$(1, 3)$	$(2, 4)$	–	i/o reactions, no shared metabolites	2	${1, 2, 3, 4}$	Equation (38)
$ℳ^{io, i, I}$	$(1, 3)$	$(3, 4)$	3	i/o and internal reactions, shared internal metabolite	2	${1, 2, 3, 4}$	Equation (39)
$ℳ^{io, i, \emptyset}$	$(1, 3)$	$(4, 5)$	–	i/o and internal reactions, no shared metabolites	3	${1, 2, 3, 4, 5}$	Equation (40)
$ℳ^{i, i, I}$	$(3, 4)$	$(3, 5)$	3	Internal reactions, shared internal metabolite	3	${1, 2, 3, 4, 5}$	Equation (41)
$ℳ^{i, i, \emptyset}$	$(3, 4)$	$(5, 6)$	–	Internal reactions, no shared metabolites	4	${1, 2, 3, 4, 5, 6}$	Equation (42)

	Marked reactions		Generating topology
Class	$a$	$b$	ID	Excluded reactions	Figure
$ℳ^{b, io, IO}$	$1 \leftrightarrow 2$	$1 \leftrightarrow 3$	$b, io, IO, F$	$\emptyset$	Figure 7
$ℳ^{b, i, \emptyset}$	$1 \leftrightarrow 2$	$3 \leftrightarrow 4$	$b, i, \emptyset, F$	$\emptyset$	Figure 7
$ℳ^{io, io, IO}$	$1 \leftrightarrow 3$	$1 \leftrightarrow 4$	$io, io, IO, F$	$\emptyset$	Figure 7
$ℳ^{io, io, \emptyset}$	$1 \leftrightarrow 3$	$2 \leftrightarrow 4$	$io, io, \emptyset, P$	${3 \leftrightarrow 4}$	Figure 9
$ℳ^{io, i, I}$	$1 \leftrightarrow 3$	$3 \leftrightarrow 4$	$io, i, I, P$	${2 \leftrightarrow 4}$	Figure 10
$ℳ^{io, i, \emptyset}$	$1 \leftrightarrow 3$	$4 \leftrightarrow 5$	$io, i, \emptyset, P_{1}$	${3 \leftrightarrow 4, 3 \leftrightarrow 5}$	Figure 11
			$io, i, \emptyset, P_{2}$	${2 \leftrightarrow 5, 3 \leftrightarrow 5}$
			$io, i, \emptyset, P_{3}$	${2 \leftrightarrow 4, 2 \leftrightarrow 5}$
$ℳ^{i, i, I}$	$3 \leftrightarrow 4$	$3 \leftrightarrow 5$	$i, i, I, P_{1}$	${2 \leftrightarrow 4, 2 \leftrightarrow 5}$	Figure 12
$ℳ^{i, i, I}$	$3 \leftrightarrow 4$	$3 \leftrightarrow 5$	$i, i, I, P_{2}$	${1 \leftrightarrow 5, 2 \leftrightarrow 5}$	Figure 12
$ℳ^{i, i, \emptyset}$	$3 \leftrightarrow 4$	$5 \leftrightarrow 6$	$i, i, \emptyset, P_{1}$	${3 \leftrightarrow 5, 3 \leftrightarrow 6, 4 \leftrightarrow 5, 4 \leftrightarrow 6}$	Figure 13
			$i, i, \emptyset, P_{2}$	${1 \leftrightarrow 5, 1 \leftrightarrow 6, 2 \leftrightarrow 5, 2 \leftrightarrow 6}$
			$i, i, \emptyset, P_{3}$	${2 \leftrightarrow 4, 2 \leftrightarrow 6, 3 \leftrightarrow 6, 4 \leftrightarrow 5, 4 \leftrightarrow 6}$
			$i, i, \emptyset, P_{4}$	${2 \leftrightarrow 4, 2 \leftrightarrow 5, 2 \leftrightarrow 6, 4 \leftrightarrow 5, 4 \leftrightarrow 6}$
			$i, i, \emptyset, P_{5}$	${1 \leftrightarrow 6, 2 \leftrightarrow 4, 2 \leftrightarrow 5, 2 \leftrightarrow 6, 4 \leftrightarrow 6}$
			$i, i, \emptyset, P_{6}$	${1 \leftrightarrow 4, 1 \leftrightarrow 6, 2 \leftrightarrow 4, 2 \leftrightarrow 6, 4 \leftrightarrow 6}$
			$i, i, \emptyset, P_{7}$	${1 \leftrightarrow 4, 1 \leftrightarrow 5, 2 \leftrightarrow 3, 2 \leftrightarrow 6, 3 \leftrightarrow 5, 4 \leftrightarrow 6}$

	Marked reactions		Generating topologies
Class	$a$	$b$	ID	Excluded reactions	Figure
$ℳ^{io, io, I}$	$1 \leftrightarrow 3$	$2 \leftrightarrow 3$	$io, io, I, F$	$\emptyset$	Figure 7
$ℳ^{io, io, \emptyset}$	$1 \leftrightarrow 3$	$2 \leftrightarrow 4$	$io, io, \emptyset, S$	${2 \leftrightarrow 3}$	Figure 9
$ℳ^{io, i, I}$	$1 \leftrightarrow 3$	$3 \leftrightarrow 4$	$io, i, I, S_{1}$	${2 \leftrightarrow 3}$	Figure 10
$ℳ^{io, i, I}$	$1 \leftrightarrow 3$	$3 \leftrightarrow 4$	$io, i, I, S_{2}$	${1 \leftrightarrow 4}$	Figure 10
$ℳ^{io, i, \emptyset}$	$1 \leftrightarrow 3$	$4 \leftrightarrow 5$	$io, i, \emptyset, S_{1}$	${1 \leftrightarrow 4, 1 \leftrightarrow 5}$	Figure 11
			$io, i, \emptyset, S_{2}$	${2 \leftrightarrow 3, 2 \leftrightarrow 4, 3 \leftrightarrow 5}$
			$io, i, \emptyset, S_{3}$	${1 \leftrightarrow 5, 2 \leftrightarrow 3, 2 \leftrightarrow 5}$
$ℳ^{i, i, I}$	$3 \leftrightarrow 4$	$3 \leftrightarrow 5$	$i, i, I, S_{1}$	${1 \leftrightarrow 3, 2 \leftrightarrow 3}$	Figure 12
$ℳ^{i, i, I}$	$3 \leftrightarrow 4$	$3 \leftrightarrow 5$	$i, i, I, S_{2}$	${2 \leftrightarrow 3, 2 \leftrightarrow 5, 4 \leftrightarrow 5}$	Figure 12
$ℳ^{i, i, \emptyset}$	$3 \leftrightarrow 4$	$5 \leftrightarrow 6$	$i, i, \emptyset, S_{1}$	${2 \leftrightarrow 3, 2 \leftrightarrow 5, 2 \leftrightarrow 6, 4 \leftrightarrow 5, 4 \leftrightarrow 6}$	Figure 13
$ℳ^{i, i, \emptyset}$	$3 \leftrightarrow 4$	$5 \leftrightarrow 6$	$i, i, \emptyset, S_{2}$	${1 \leftrightarrow 3, 1 \leftrightarrow 6, 2 \leftrightarrow 3, 2 \leftrightarrow 6, 4 \leftrightarrow 6}$	Figure 13

Model	Internal metabolites	Concentrations of I/O metabolites	Reactions	Output flux
UGPP	6 pg, dhap, e4p, f6p, fdp, rib5p, ribu5p, sed7p, xyl5p	[g6p]=3.82 mM, [gap]=0.44 mM	ALDO, G6PDH, PFK, PGDH, PGI, Ru5P, R5PI, TA, TIS, TKa, TKb	$J_{ALDO} + J_{TIS} + J_{TKb} + J_{TKa} - J_{TA}$
LG	2 pg, 3 pg, pgp	[gap]=0.44 mM, [pep]=0.08 mM	ENO, GAPDH, PGK, PGM	$J_{ENO}$
GPP	all in UGPP and in LG, gap	[g6p]=3.82 mM, [pep]=0.08 mM	all in UGPP and in LG	$J_{ENO}$
FULL	all in GPP, g6p, pep	[Ext glu]=2 µM, [pyr]=10 µM	all in GPP, PTS, PK, PEPCxyl	$J_{PK} + J_{PTS}$

Share this article

Cite this article

Illustration of a hierarchical metabolic network and its coarse-graining.

Propagation of epistasis.

Emergence of epistasis and its dependence on the topological relationship between the reactions affected by mutations.

Sensitivity of results of Theorem 1 and Theorem 2 with respect to the magnitude of mutational effects.

Epistasis in a kinetic model of Escherichia coli glycolysis.

Box 1.

Simple modules.

Classification of single-marked modules.

Classification of double-marked modules.

Classification of double-marked modules.

A counter example illustrating that the converse to claim 2 in Proposition 4 may not be true.

Strictly parallel generating topologies.

Strictly serial generating topologies.

Graphical representation of strictly serial and strictly parallel generating topologies in the class Mio,io,∅.

Graphical representation of strictly serial and strictly parallel generating topologies in class Mio,i,I.

Graphical representation of strictly serial and strictly parallel generating topologies in class Mio,i,∅.

Graphical representation of strictly serial and strictly parallel generating topologies in class Mi,i,I.

Graphical representation of strictly serial and strictly parallel generating topologies in class Mi,i,∅.

Definition of modules in the glycolysis network shown in Figure 5—figure supplement 1.

Names of metabolites used in the kinetic model of glycolysis.

Names of enzymes used in the kinetic model of glycolysis.

Illustration for the proof of Lemma 1.

Author details

Sergey Kryazhimskiy

Contribution

For correspondence

Competing interests

Citations by DOI

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

Categories and tags

Graphical representation of strictly serial and strictly parallel generating topologies in the class $M^{io, io, \emptyset}$ .

Graphical representation of strictly serial and strictly parallel generating topologies in class $M^{io, i, I}$ .

Graphical representation of strictly serial and strictly parallel generating topologies in class $M^{io, i, \emptyset}$ .

Graphical representation of strictly serial and strictly parallel generating topologies in class $M^{i, i, I}$ .

Graphical representation of strictly serial and strictly parallel generating topologies in class $M^{i, i, \emptyset}$ .