Abstract
Enhancers are discrete DNA elements that regulate the expression of eukaryotic genes. They are important not only for their regulatory function, but also as loci that are frequently associated with disease traits. Despite their significance, our conceptual understanding of how enhancers work remains limited. CRISPR-interference methods have recently provided the means to systematically screen for enhancers in cell culture, from which a formula for predicting whether an enhancer regulates a gene, the Activity-by-Contact (ABC) Score, has emerged and has been widely adopted. While useful as a binary classifier, it is less effective at predicting the quantitative effect of an enhancer on gene expression. It is also unclear how the algebraic form of the ABC Score arises from the underlying molecular mechanisms and what assumptions are needed for it to hold. Here, we use the graph-theoretic linear framework, previously introduced to analyze gene regulation, to formulate the default model, a mathematical model of how multiple enhancers independently regulate a gene. We show that the algebraic form of the ABC Score arises from this model. However, the default model assumptions also imply that enhancers act additively on steady-state gene expression. This is known to be false for certain genes and we show how modifying the assumptions can accommodate this discrepancy. Overall, our approach lays a rigorous, biophysical foundation for future studies of enhancer-gene regulation.
Introduction
Much of our current understanding of how genes are regulated arose from classical studies in bacteria of the lac operon and λ-phage [1]. However, the eukaryotic context differs from the bacterial in many significant ways. One key difference is that, in bacteria, regulatory DNA is found proximal to the gene, typically within 1kb upstream of the transcription start site (TSS), whereas eukaryotic regulatory sequences are found in discrete pieces that may be proximal to, or distal from, the TSS. The eukaryotic regulatory elements known as enhancers form a particularly important class. Enhancers were originally defined as DNA sequences which could drive the expression of genes in a location and orientation independent manner [2, 3]. Since these initial discoveries, many native enhancers have been identified which play critical roles in a variety of processes, such as embryonic development [4], physiology [5] and evolution [6]. Genetic variation in enhancers has also been shown to mediate risk for complex disease [7], Mendelian disease [8] and cancer [9]. Based on these and many other studies, we know that enhancers can be located over 1Mb from a target gene TSS, an individual enhancer may regulate multiple genes, some genes are regulated by multiple enhancers and the set of enhancers actively regulating a given gene may depend on cellular context. These properties have made it difficult to identify the rules governing enhancer-gene regulation.
Given their importance, much attention has been given to systematically identifying enhancer sequences and the genes they regulate. An important breakthrough has been the development of high-throughput CRISPR interference (CRISPRi) screens, which enable putative enhancer sequences to be perturbed in cell culture and the resulting effect on expression of a target gene to be measured [10–13]. These screens typically measure quantitative effects on gene expression as the proportional change in mean gene expression over a cell population. We call this quantity the fractional change and, given its importance in this paper, define it formally as follows: let ψ(g) denote the wild-type mean expression level of a gene g, in whatever units are used to measure it, and let
The fractional change for thousands of putative enhancer-gene connections has been measured and computational methods have assessed whether the observed fractional change is statistically different from zero. Current efforts are now focused on two main questions. First, what can we learn about enhancer biology from these screens? Second, can the results from these screens be used to develop computational methods which can predict which enhancers regulate which genes in arbitrary cellular contexts?
The Activity-by-Contact (ABC) model has been proposed as a way to make progress on both of these questions [11]. The ABC model is based on the mechanistic notion that an enhancer’s effect on gene expression depends on the intrinsic strength of the enhancer (activity) and the frequency with which it comes into physical proximity to the gene promoter (contact). The ABC model gives rise to the ABC Score, a quantitative formula which is intended to predict the fractional change observed in an enhancer perturbation experiment. For a gene, g, with N putative enhancers, e1, · · ·, eN, the ABC Score for a specific enhancer eq, is given by,
where αi represents the activity of ei and γi represents the contact frequency between ei and the promoter of
g. In [11] a putative enhancer was defined as a chromatin-accessible DNA element of approximately 500 base pairs; αi was assigned using measures of chromatin state of the enhancer, such as DNase-Seq and H3K27ac ChIP-Seq; γi was assigned using the contact frequency between a putative enhancer and the gene promoter, as measured by Hi-C; and the sum in the denominator of Eqn.2 was taken over all putative enhancers within 5Mb of g.
The ABC Score is reasonably effective at predicting the results of CRISPRi screens. When considered as a binary classifier, the ABC Score has achieved a precision of 59% at 70% recall benchmarked against a database of nearly 4,000 putative enhancer-gene connections in the K562 cell line [11]. Similar performance has also been observed in other cell types [11, 14] and in subsequent benchmarking against other CRISPRi screens in K562 cells [15, Fig.S8a]. We emphasize that the ABC Score is computed directly from genomic data orthogonal to the CRISPRi experiment. As such, it has no free parameters and does not require fitting or training. The classification ability of the ABC Score and its modest input data requirements have resulted in its widespread use to interpret non-coding genetic variation [14, 16, 17], identify enhancers in disease related contexts [18, 19] and investigate the dosage effect of transcription factor concentration on gene expression [20].
Despite its practical utility as a binary classifier, the ability of the ABC Score to predict the fractional change is fundamentally limited [11, Fig.3c]. From Eqn. 2, it is clear that the sum of the ABC Scores over all putative enhancers of a given gene is equal to 1,
We define the total fractional change of a gene to be the sum of the fractional changes of all enhancers for the gene, f(g, e1) + · · · + f(g, eN). If the ABC model were perfectly reflecting the fractional change, so that ABC(g, ei) = f (g, ei), it would predict that the total fractional change for all genes is equal to 1. However, experimentally, a range of total fractional changes has been observed from 0 to greater than 3 [10, 11, 14, 21–23]. This incompatibility is a consequence of the algebraic structure of the ABC Score formula and cannot be resolved in a straightforward way. For example, it cannot be resolved by using different types of epigenomic data to assign values to αi or γi.
What, if anything, about enhancer biology can be concluded from the successes and limitations of the ABC Score? We believe that considering this question requires a formal description of the ABC model. The original description of the ABC model is informal, in the sense that the relationships between the mechanisms of activity and contact and the ABC Score formula were not determined by formal mathematical arguments. In consequence, the biological and biophysical assumptions that underlie formulas of this kind have not been clarified.
In the present paper, we present a strategy for the formal mathematical modeling of enhancer-gene regulation. We introduce the default model, a set of assumptions for how multiple enhancers independently regulate a gene. We show that a formula with the same algebraic structure as the ABC Score formula in Eqn.2 can be rigorously derived from a special case of the default model. This clarifies the assumptions that underlie the ABC Score formula. However, these assumptions also imply that the total fractional change of a gene is equal to one. We show how changing the assumptions of the default model can lead to total fractional changes which are less than or greater than one. More generally, the framework introduced here offers a rigorous foundation for future studies of enhancer-gene regulation.
Results
An activation-communication model of enhancer function
Our approach to modelling enhancer-gene regulation is based on the linear framework, a method of using graphs to analyse biomolecular systems [24–26] that has been previously introduced to study gene regulation [26]; see [27, 28] for up-to-date reviews. The graphs in question have vertices that are linked by labelled, directed edges. The vertices represent molecular states of DNA, the edges represent transitions between these molecular states and the labels represent the transition rates, which are positive numbers with dimensions of (time)−1.
An example linear framework graph is shown in Fig.1a. This graph, which we have called H, represents a single enhancer which can be either activated (filled red circle) or not and in communication with its target gene (curved arrow) or not. It thereby captures the two main notions in the original ABC model, although we prefer to speak here of “communication” rather than of “contact” (see below). These two features of the enhancer are treated in the graph as being independent of each other: the rates for becoming activated or deactivated do not depend on the state of communication, and the rates for making or losing communication do not depend on the state of activation. Independence will be one of the central features of our treatment and will appear both in how an individual enhancer is treated, as in this example in Fig.1a, and in how a gene is regulated by multiple enhancers, as we will explain below.

The activation-communication coarse graining. a) An example linear framework graph, H, representing a coarse-grained view of an enhancer. Each vertex contains a schematic of the enhancer (circle) and its target gene (black rectangle with the transcription start site marked with an arrow). The enhancer may be activated (filled red circle) or communicating (curved arrow to the target gene), encoded in the notation (i, j) used to denote vertices. The edge labels show that activation and communication take place independently of each other. b) A more detailed picture of the molecular complexity that may underlie the coarse-grained graph in panel a, as described further in the text. c) The example graph H in panel a is the graph product of two simpler 2-vertex graphs, Ka, which represents activation, and Kc, which represents communication. The product structure of H is equivalent to the independence of activation and communication.
The graph H in Fig.1a represents a coarse-graining of the actual complexity of enhancer-gene regulation (Fig.1b). Activation is intended to capture processes local to the enhancer sequence such as transcription factor binding, chromatin reorganisation, nucleosome remodelling, recruitment of co-regulators or transcription of the enhancer sequence itself to generate enhancer RNA. Communication refers to the processes by which information is transferred from the enhancer to its target gene. Many communication mechanisms have been proposed including physical contact through DNA looping [29], diffusion of regulatory molecules and phase separation [31]. We thus use the word ‘communication’ instead of ‘contact’ to reflect that enhancer-gene regulation may not require physical contact. It is, of course, possible that the specific activation or communication mechanisms may differ between enhancers. The value of this coarse-graining lies in not making commitments about the underlying mechanism, at the price of ignoring the potential consequences of how activation and communication are implemented in molecular terms. This particular coarse-graining will facilitate our clarification of the ABC Score formula below.
Having provided an example of a linear framework graph and explained how it describes the biological context that we will be studying, we now go into the details of the linear framework. We will make use of the example in Fig.1a throughout this work.
Preliminaries on the linear framework
Notation and terminology
We will start by introducing some basic ideas about linear framework graphs. We will use a letter like G or H to refer to a graph. Vertices will generally be denoted i, j, etc. We will use the notation i ∈ G to mean the state i from the graph G. Edges will be denoted i → j and edge labels will be denoted ℓ(i → j). So, using the notation for the example graph H in Fig.1a, ℓ((0, 0) → (0, 1)) = ℓ((1, 0) → (1, 1)) = kc (the notation for the vertices in this graph arises from its product structure and will be explained later). If some feature X is being discussed for different graphs, we will sometimes use brackets, as in X(G), or a subscript, as in XG, to specify the graph in question. We will use the word structure to refer to just the vertices and edges of a graph, ignoring the edge labels; when we say “graph”, we will always be including the labels, even when they are not mentioned explicitly.
The Markov process
A graph G is equivalent to a finite-state, continuous-time, time-homogeneous Markov process [25, 28, 32]. This stochastic behaviour can be understood as follows. If the system is in state i, then for each edge i → j which leaves i, a “firing” time is randomly chosen from the exponential probability distribution, λ exp(−λt), where λ is the transition rate of that edge, λ = ℓ(i → j), and the edge with the lowest firing time is taken, at that time. This generates a stochastic trajectory of states and transitions. If we follow a trajectory up to time T and measure the proportion of time spent in state i, then that ratio stabilises with increasing T to become the steady-state probability of state i [32], which we will denote by
Thermodynamic equilibrium and steady-state probabilities
One of the advantages of the linear framework is that, provided the graph is finite, its steady-state probabilities can be calculated algebraically in terms of the edge labels. (We will encounter an infinite graph below but, as we will see, we do not have to deal with them directly and can work only with finite graphs.) If the graph can reach thermodynamic equilibrium the algebra can be done quite easily but, importantly, it can also be done when the graph is away from thermodynamic equilibrium, although the formulas become more complicated. A graph can reach thermodynamic equilibrium if, and only if, it satisfies two conditions. First, it must be reversible, so that if there is an edge i → j, then there is also an edge j → i, which represents the reverse of the process that corresponds to i → j. Second, it must satisfy the cycle condition: the product of the label ratios around any cycle of reversible edges must be 1. The graph in Fig.1a is evidently reversible and has only one cycle of reversible edges, (0, 0) ⇋ (1, 0) ⇋ (1, 1) ⇋ (0, 1) ⇋ (0, 0), for which the product of label ratios is
For this graph, the independence of activation and communication ensures that the graph can reach thermodynamic equilibrium.
When a graph can reach thermodynamic equilibrium, its steady-state probabilities can be calculated as follows. First, choose any vertex as a reference; let us call it 1. Second, choose any path of reversible edges from 1 to the state in question, say i: 1 ⇋ i1 ⇋ · · · ⇋ ik = i. The steady-state probability of i is then proportional to the product of the label ratios along this path,
It is a simple consequence of the cycle condition that the quantity on the right-hand side of Eqn.5 does not depend on the choice of path from 1 to i. The proportionality constant in Eqn.5 is readily obtained by exploiting the fact that the sum of all the probabilities must be 1, so that, if the vertices are denoted 1, · · ·, N, then
(We will sometimes use a “·” to denote multiplication to make formulas like this look clearer.) The reorganisation of Eqn.6 into Eqn.7 reveals a product structure in the algebra whose significance will emerge below. The formula in Eqn.6 is the same as would arise from equilibrium statistical mechanics. It is one of the features of the linear framework that it reduces to equilibrium statistical mechanics for systems that are at thermodynamic equilibrium but also yields algebraic formulas for systems away from thermodynamic equilibrium.
Product graphs as models of independence
In studying gene regulation, a very helpful construction is that of a product graph, because it captures the default situation in which two or more genetic systems operate independently of each other. The example graph in Fig.1a is a case in point. This graph H is the product of the graphs Ka and Kc in Fig.1c. Here, Ka is a two-vertex graph that represents just the activation of the enhancer and Kc is a two-vertex graph that represents just the communication.
We will use Ka and Kc to describe the product graph construction. We will do this in two steps. We will first specify the vertices and edges by building the product structure, denoted Ka × Kc, and then we will specify the labels to get the product graph, denoted Ka⊗Kc. As we will see below, product structures underlie other constructions in which the independence of the product graph is broken, which is why it is helpful to distinguish structures and graphs. The vertices in Ka × Kc are ordered pairs, (i, j), of vertices i ∈ Ka and j ∈ Kc. The edges in Ka × Kc arise from the edges in either component Ka or Kc, taken independently of the state of the other component. In other words, if i1 → i2 is any edge in Ka, then (i1, j) → (i2, j) is an edge in Ka × Kc, for all j ∈ Kc; similarly, if j1 → j2 is any edge in Kc, then (i, j1) → (i, j2) is an edge in Ka × Kc, for all i ∈ Ka; these are the only edges in Ka × Kc. This prescription yields the structure of the graph H in Fig.1a.
The labels of the product graph, Ka ⊗ Kc, are also inherited from those in Ka or Kc, independently of the state of the other component,
We see that Ka ⊗ Kc corresponds exactly to the graph H in Fig.1a. The graph product precisely captures the sense in which the components of the product, here Ka and Kc, operate independently of each other: the transitions in either component are unaffected, as to their occurrence and their rates, by the state of the other component.
In the more general case of a product of m graphs, the vertices are naturally indexed as ordered tuples, (i1, · · ·, im).
One of the consequences of the product graph construction is that its steady-state probabilities are easily calculated. If K1, · · ·, KN are any set of N strongly connected graphs, then the steady-state probabilities in the product graph K1 ⊗ · · · ⊗ KN can be computed by multiplying the steady-state probabilities in the individual graphs,
Eqn.8, which is proved in [26], again captures the sense in which the components K1, · · ·, KN are independent of each other. We note that Eqn.8 holds even for graphs which are unable to reach thermodynamic equilibrium.
We can see Eqn.8 at work for the graph in Fig.1a, which is the product of the graphs Ka and Kc in Fig.1c. If we follow the prescription in Eqn.5, we see that
If we apply Eqn.8 to the formulas above, we see that,
which recovers the expression in Eqn.7, whose algebraic product structure is now seen to reflect the underlying product graph.
Eqn.8 for individual vertices has a straightforward extension to subsets of vertices. To explain this, let K be any graph and let S ⊆ K be any subset of vertices in K. The steady-state probability of being in any vertex of S, denoted
One of the implications of Eqn.11 is that if we take ij to be a coordinate that runs over the vertices of Kj, then the probability that ij has a particular value, say ij = b, remains the same irrespective of the other factors in the graph product,
Eqn.12 follows from Eqn.11 because the subset {ij = b} in K1 ⊗ · · · ⊗ KN is the product subset,
and
The gene expression response
The graphs we have considered up to now are models of the regulatory state of the gene. We now discuss how to incorporate the production and degradation of mRNA. The standard approach in the literature is known as kinetic modeling and uses a Markovian framework based on the chemical master equation [33]. We follow this same approach within the graph-theoretic setting introduced here.
At any given time, the state of gene expression is specified by a certain number of molecules of the corresponding mRNA. This number increases by 1 each time RNA polymerase transcribes the gene and decreases by 1 each time an mRNA molecule is degraded or lost through transport out of the nucleus. We can represent such an expression system by the (semi)-infinite pipeline structure, P, in which the state p represents the number p of mRNA molecules, from p = 0 onwards, and the edges correspond to mRNA production, p → p + 1, and degradation or loss, p → p − 1 (Fig.2a).

Modeling mRNA production and degradation through copy-number graphs. a) The pipeline structure P represents the number of mRNA molecules and their production and loss. b) An example regulatory graph, G, and the resulting copy-number graph G ⋉ P. In this example G has two production states, X and Y, with corresponding mRNA production rates rx and ry respectively. We note that G ⋉ P is a sub-structure of G × P ; it has the same vertices but lacks the edges corresponding to a production rate of zero. G and P also do not operate independently in G ⋉ P because the mRNA production rates depend on the regulatory state. c) A compact way to represent G ⋉ P. The production states are outlined in purple with corresponding mRNA production rates. The degradation rate, δ, is shown above the arrow from purple squiggles (mRNA) to the empty set ∅. d) The compact representation of the graph H ⋉ P, where H is given in Fig.1a. The only production state is the state (1, 1), in which the enhancer is both activated and communicating, which has production rate r.
Given a gene-regulatory graph, G, we represent the overall system of regulation and expression by a copy-number graph, G ⋉ P, that will be derived from the product structure, G × P (Fig.2b). The states of G ⋉ P are identical to those of G × P but G ⋉ P may not have all the edges that are present in G × P. Each state in G ⋉ P keeps track of the regulatory state of the gene and the number of mRNA molecules that are present. We now discuss how to assign labels to this graph (Fig.2b). We assume that each state, i ∈ G, has a corresponding non-negative rate of mRNA production, ri(G) ≥ 0. If rk(G) = 0, so that mRNA production is not possible in state k of G, then the edges (k, p) → (k, p + 1) are removed from G × P for all p ∈ P. (Note that edge labels must always be positive.) This is the only way in which the structure of G ⋉ P differs from that of G × P. If rk(G) > 0, we will assume that the rate of mRNA production does not change with the number of mRNA molecules that have been expressed, so that ℓ((k, p) → (k, p + 1)) = rk(G) for any p ∈ P. As for mRNA degradation or loss, this takes place independently of the regulatory system, so the most parsimonious assumption is that its rate is proportional to the number of mRNAs that are present and is independent of the regulatory state. Accordingly, we may write ℓ((k, p) → (k, p − 1)) = δ(G) · p for any k ∈ G and any positive p ∈ P, where δ(G) is the degradation rate constant. Finally, we assume that regulatory transitions do not depend on gene expression, so that ℓ((i, p) → (j, p)) = ℓG(i → j) for all i, j ∈ G and for all p ∈ P. A compact way to visually represent a copy-number graph is shown in Fig.2c.
At steady state, G ⋉ P gives rise to a probability distribution over the mRNA copy number. We will define the response of the gene, which we will denote by R(G), to be given by the average of this number
We note that R(G) ≥ 0.
Because G ⋉ P is not a finite graph, the prescription given in Eqn.5 for calculating steady-state probabilities no longer works. (G ⋉ P is also not at thermodynamic equilibrium, unless every regulatory state has the same rate of mRNA production, as can be checked by following the cycle condition formula in Eqn.4.) However, we can appeal to a very useful theorem, due to Sanchez and Kondev, which tells us that we do not have to operate on G ⋉ P in order to calculate R(G) [34]. Translating their work into the graph-theory language used here, we find that the response of the gene can be calculated in terms of the average of ri(G) over only the the steady-state probabilities of G, normalized by δ(G),
It follows that, although infinite graphs arise to represent the mRNA expression system, we do not need to work with them to calculate the mean steady-state expression R(G), under the assumptions made above. Sanchez and Kondev did not use graph theory in their work, so we provide an independent graph-based proof of Eqn.14 in the Methods. In subsequent work, we will show how the copy-number graphs introduced here lead to generalizations of the results of [34] but we do not need that for the present paper.
This result provides some justification for reducing the notational clutter from multiple instances of P. We will refer to the regulatory graph G when we mean G on its own, and to the copy-number graph G when we mean G ⋉P, defined for some specified choice of production rates ri(G) and degradation rate δ(G). These parameters may not be explicitly mentioned when speaking of a copy-number graph but they should be kept in mind.
As an illustration of Eqn.14, we will assign production rates to the graph in Fig.1a and compute its response. We will make the assumption that mRNA is only produced when the enhancer is both activated and communicating (Fig. 2d). The mRNA production rates of H are therefore given by
We can now use Eqn.14 to calculate the response, R(H), taking advantage of Eqn.10, in which we exploited the product graph decomposition H = Ka ⊗ Kc. We see that,
It follows from Eqns.9 and 12 that we can interpret ka/(ka + la) as the probability that the enhancer is activated and, similarly, kc/(kc + lc) as the probability that the enhancer is communicating. Eqn.16 tells us that the response of H is the product of the ratio of production to degradation, the probability of activation and the probability of communication.
This concludes our analysis of a gene regulated by a single enhancer using the activation-communication coarse graining. We now turn to considering how multiple enhancers work together to regulate gene expression.
A default model of how multiple enhancers independently regulate a gene
We now introduce the default model of enhancer-gene regulation. This is a set of assumptions for how multiple enhancers collectively regulate a gene in an independent manner. We previously introduced the product graph construction which represents independence between regulatory graphs. We now broaden those assumptions to also allow for mRNA production. We expect this default model construction to be of general interest. In the next section we will show how a special case of the default model clarifies the ABC Score formula.
Consider a gene, g, that is regulated by N enhancers, e1, · · ·, eN. We will assume that enhancer el is modelled by the graph Gl. We make no assumptions about Gl other than the prevailing assumption that all our graphs are strongly connected. Gl could be substantially more complicated than the graph in Fig.1a and could incorporate, for example, chromatin organisation, nucleosomes, co-regulators, post-translational modifications, chromosome conformation, etc [26]. In particular, there is no requirement that Gl should be able to reach thermodynamic equilibrium. At this point our assumptions are very general and could apply to essentially any enhancer, when considered from a Markovian perspective.
We denote the graph that models the collective regulation of the enhancers by G and describe how G is defined in terms of the Gl.
The first assumption says that each enhancer has its own individual effect.
Individuality. Each enhancer el, when acting in the absence of any of the other enhancers, drives gene expression at the rate ri(Gl) ≥ 0 for each state i ∈ Gl, and gives rise to the response R(Gl), as defined by Eqn.13. If the enhancer is unable to drive expression on its own, then ri(Gl) = 0 for every state i ∈ Gl.
The next two assumptions specify how the enhancers work together.
Regulatory independence. Each enhancer acts independently of all the others, so that the regulatory graph of G is given by the product graph G1 ⊗ · · · ⊗ GN.
Production-rate summation. Each enhancer independently influences mRNA production. Accordingly, if (i1, · · ·, iN) is a state in G, then its mRNA production rate is a sum of the corresponding production rates in each enhancer graph:
The summation of rates in Assumption 3 arises for the following reason. If each enhancer influences mRNA production independently, then the time at which an mRNA is produced will be the minimum of the times at which each individual enhancer has its effect on production. These individual times are exponentially distributed with rates
for state ij in Gj. The minimum of several exponentially distributed random variables is a random variable that is also exponentially distributed, with rate given by the sum of the individual rates. This leads to Eqn.17. The final assumption specifies the degradation rates.Uniform degradation. Since mRNA degradation is a separate process to gene regulation and gene expression, we consider the characteristic degradation rate to be a property of the gene, not the enhancer. As such, each graph Gl is assumed to have the same degradation rate, δ(Gl) = δ for all l, and the mRNA degradation rate of G is also δ: δ(G) = δ.
For any set of copy-number graphs G1, …, GN, we denote the copy-number graph which models their collective effect on transcription according to Assumptions 1 to 4 by the graph
Example constructions using the default model are given in Fig.3.

Two examples a and b of the default model construction. The copy-number graphs are depicted in compact format, as shown in Fig.2c but omitting the degradation symbols for clarity. Production states are outlined in bold purple with corresponding production rates in purple text. The model in b is adapted from Figure 9 of [26].
Assumptions 1 to 4 specify our default model of how enhancers collectively regulate a gene. Whether any of the default model assumptions hold for an individual gene is a question that has to be addressed experimentally. In particular, we would expect that Assumption 3 would eventually break down as more enhancers are added to a gene since production rates will be limited by the physical processes involved in transcription. Our goal here is to rigorously work out the consequences of these assumptions, so that we know what to expect when the assumptions do hold and can compare these predictions to what is found experimentally. Of particular significance is that the assumptions above imply that the collective response of the enhancers is always the sum of their individual responses,
A proof of this fundamental property of the default model is given in the Methods.
A recent commentary has argued that formal definitions and rigorous modeling are necessary to investigate whether a set of enhancers is “greater than the sum of its parts” [35]. We fully agree and suggest that the notion of independence encoded by the default model, which gives rise to Eqn.19, could serve as a definition of what it means for a gene to be the sum of its parts.
Transcription in the default model relies on the presence of enhancers. It is well known that the promoter sequences at some eukaryotic genes are sufficient to drive transcription even in the absence of distal enhancers [36, 37]. It is a future area of research to incorporate the role of core promoter elements and promoter proximal regulatory sequences along with their interactions with distal enhancers.
A clarification of the ABC Score formula
Enhancer perturbation and deletion fidelity
To see how formulas similar to the ABC Score can be derived from the default model, we need to consider how to formally model perturbations to enhancers such as genetic deletions or CRISPRi. As previously, we will assume that the target gene g is collectively regulated by enhancers e1, · · ·, eN. We assume that enhancer ei is modeled by the graph Gi and that g is modeled by G1 ⊛ · · · ⊛ GN. Let us consider what happens when enhancers
It is important to keep in mind that the deletion effect is defined in terms of a model of gene regulation, whereas the fractional change is defined in terms of experimental data. The definition in Eqn.20 implicitly assumes that the system has returned to steady state following the perturbation. Furthermore, Eqn.20 says nothing about how the enhancer is perturbed or whether a CRISPRi perturbation has the same effect as a genetic deletion.
To calculate the deletion effect using Eqn.20, we need to know the perturbed graph,
5. Deletion fidelity. The regulatory graph of
Deletion fidelity ensures that if G obeys Assumptions 1-4, then
Using Eqn.22, it follows from Eqn.19 that,
and so the formula for the deletion effect in Eqn.20 tells us that,
Eqns.23 and 24 are general properties that hold for the default model whenever Assumption 5 of deletion fidelity also holds. They allow us to formalise the notion of enhancer additivity, which we will discuss below, but, first, let us turn to the ABC Score formula.
The Independent-Activation-Communication (IAC) model
In the default model, the graph representing each individual enhancer can be arbitrarily complicated. To show how the ABC Score formula can arise from the default model, we need to impose the further assumption that each enhancer is modeled by the activation-communication coarse graining shown in Figs.1a and 1b.
6. The activation-communication coarse-graining. Enhancer ei is described by the graph Hi, where Hi is the same graph as H in Fig.2d. Specifically, Hi is the graph product of an activation graph, Ka,i, with labels ka,i, la,i, and a communication graph, Kc,i, with labels kc,i, lc,i (Fig.1c), and Hi = Ka,i ⊗ Kc,i. The only non-zero production rate of Hi occurs in the state in which the enhancer is both active and communicating, where the rate is ri.ei is described by the graph Hi, where Hi is the same graph as H in Fig.2d. Specifically, Hi is the graph product of an activation graph, Ka,i, with labels ka,i, la,i, and a communication graph, Kc,i, with labels kc,i, lc,i (Fig.1c), and Hi = Ka,i ⊗ Kc,i. The only non-zero production rate of Hi occurs in the state in which the enhancer is both active and communicating, where the rate is ri.
The overall regulatory system is then described by G = H1 ⊛ · · · ⊛ HN (Fig. 4, Fig.S1). We call the model obeying Assumptions 1-6 the Independent-Activation-Communication (IAC) model. It follows from Eqn.16 that the response of enhancer i in the IAC model is given by

The Independent-Activation-Communication (IAC) model. A gene described by the IAC model follows Assumptions 1-6 for the component graphs H1, …, HN. The ordered pair of binary digits for the vertices in each Hi represent the activation and communication status, respectively, of each enhancer. Each Hi has the same structure as the graph in Fig.2d, but different labels, and represents the independence of activation and communication within each enhancer. Each Hi is assumed to have the same degradation rate which is omitted for clarity. See also Fig.S1.
Let us define
Furthermore, as a consequence of deletion fidelity (Assumption 5), it follows from Eqn.24 that the deletion effect for enhancer eq is given by,
Eqn.27 shows a striking algebraic similarity to the ABC Score formula in Eqn.2. The quantity
Eqn.27 is our clarification of the algebraic structure of the ABC Score formula. Eqn.27 rigorously follows if enhancers collectively regulate a gene according to the IAC model (Assumptions 1 to 6).
Enhancer additivity and departures from it
The default model, satisfying Assumptions 1 to 4, exhibits response additivity, as shown by Eqn.19: the response of the gene to all the enhancers acting collectively is just the sum of the responses to each individual enhancer. When the default model also obeys Assumption 5 of deletion fidelity, then response additivity has a counterpart in the deletion effect, as defined in Eqn.20. This allows us to rigorously define the properties of super-additivity and sub-additivity. These departures from the properties of the default model may be helpful to interpret the effects of experimental perturbations, such as genetic deletions or CRISPRi, in which subsets of enhancers are prevented from influencing a gene and the effect of these perturbations on the gene expression response is measured.
With Assumptions 1 to 5, if U1, · · ·, Um are pairwise disjoint subsets of enhancers, so that Ui ⊆ {e1, · · ·, eN} and Ui ∩ Uj = ∅ when i ≠ j, then it follows from Eqn.24 that the effect of deleting all the subsets together is just the sum of the individual deletion effects,
We refer to this property as deletion additivity. Furthermore, it is evident from Eqn.24 that, if all the enhancers are deleted, so that U1 ∪ · · · ∪ Um = {e1, · · ·, eN}, then the total deletion effect must be 1,
Assuming deletion fidelity, the total deletion effect being 1 is equivalent to the response additivity in Eqn.19. A special case of Eqn.29 arises if all enhancers are deleted individually, when, once again, the total deletion effect is 1,
Now suppose that a gene g is regulated by N enhancers, e1, · · ·, eN, each enhancer is modeled by the graph Gi and the regulatory graph of g, G, has the product structure, G1 × · · · × GN. The labels in G need not be related to those of the component graphs Gi, so that G need not be the product graph G1 ⊗· · · ⊗ GN. We can no longer calculate R(G) in terms of R(Gi). However, we can still define through Eqn.20 the deletion effect ∆(G; U) for any collection U ⊆ {e1, · · ·, eN} of enhancers. We say that g exhibits response super-additivity if,
In terms of the deletion effect, this corresponds to when a collective deletion has less effect than the sum of the individual deletions, so that,
Similarly, g exhibits response sub-additivity if,
and this corresponds to the collective deletion having more effect that the sum of the individual deletions,
Experimentally, response additivity [15, 23, 38–43], super-additivity [15, 21, 39–44] and sub-additivity [38, 40, 41] have all been observed. Because the super-additive and sub-additive findings cannot be accounted for by the default model with deletion fidelity, we next consider some extensions of this model that show how such effects could arise.
Mechanisms beyond the default model
In the following sections, we examine two departures from the default model and consider their impact on whether enhancers act additively (Eqn.19), super-additively (Eqn.31) or sub-additively (Eqn.33). This will also illustrate how our modeling framework can be used to reason about different biological mechanisms.
Non-additivity in mRNA production rates
In the default model, the summation of production rates in Assumption 3 is crucial for the property of enhancer additivity in Eqn.19. The production rate is a convenient abstraction that aggregates over many underlying molecular mechanisms, such as RNA Polymerase recruitment, pausing and elongation. It is conceivable that, when multiple enhancers jointly influence transcription, the resulting rate is a more complex function than simple addition [38]. Here, we consider the effect of dropping Assumption 3.
Let us assume that we have two enhancers, e1 and e2, which are described by the graphs H1 = Ka,1 ⊗Kc,1 and H2 = Ka,2 ⊗ Kc,2, respectively, as specified in Assumption 6 in the coarse-grained version of our default model. The overall regulatory graph is given by H1 ⊗ H2, so that e1 and e2 remain independent (Assumption 2). Note that (Ka,1 ⊗ Kc,1) ⊗ (Ka,2 ⊗ Kc,2) has a product hierarchy and its vertices are therefore indexed by tuples of tuples of the form,
Here, ai and ci, for i = 1, 2, are coordinates for activation and communication, respectively, which take the values 0 and 1 in all cases. The graphs H1 and H2 have mRNA production rates, r1 and r2, respectively, as specified in Eqn.15 and mRNA degradation rate δ.
We now consider a copy-number graph, G⋄, whose regulatory graph is given by (Ka,1 ⊗Kc,1)⊗(Ka,2 ⊗Kc,2) but whose production rates do not obey Assumption 3. Note that we use the same symbol, G⋄, for the regulatory graph and the copy-number graph and rely on the context to clarify which is meant. There are many ways to assign production rates to the vertices of G⋄ which do not obey Assumption 3; here we consider one of the simplest possible ways. We define 3 subsets of vertices of G⋄ in terms of the coordinates in Eqn.35: W := {a1 = 1, c1 = 1, a2 = 1, c2 = 1}, U := {a1 = 1, c1 = 1} \ W and V := {a2 = 1, c2 = 1} \ W. We assign the production rate of vertices in U to be r1, of vertices in V to be r2 and of the vertex in W to be (1 + μ)(r1 + r2); all other vertices have production rate 0. We can summarise these assumptions in the following table, which gives the production rate for each of the 16 states in G⋄ in the coordinate system described by Eqn.35.

We further assume that μ ≥ −1 to ensure that the production rate of W does not become negative. If μ = 0, then Assumption 3 holds for G⋄ but not otherwise. We also assume that G⋄ has degradation rate δ. We now calculate R(G⋄). Using the Sanchez-Kondev theorem in Eqn.14 we have,
Expanding the term on the right hand side of Eqn.36 and rearranging terms results in,
Using the fact that the pairs of sets U and W, and, V and W are disjoint gives,
We now note that, by definition, U ∪W = {a1 = 1, c1 = 1} and V ∪W = {a2 = 1, c2 = 1}. Given the independence assumption on G⋄, we can apply Eqn.11 and have that
Given that
we see that e1 and e2 act additively for μ = 0 (Eqn.19), act super-additively for μ > 0 (Eqn.31) and act sub-additively for μ ∈ [−1, 0) (Eqn.33).
Non-independence in regulatory transitions between enhancers
So far all the graphs we have considered obey regulatory independence as defined by Assumption 2: the state of one enhancer does not affect the transitions or the rates of any other enhancer. Let us examine this more closely for the IAC model, with just two enhancers, e1 and e2, described by graphs H1 and H2, respectively, as in the previous subsection. According to Assumption 6, H1 = Ka,1 ⊗ Kc,1 and H2 = Ka,2 ⊗ Kc,2 and the overall regulatory system is therefore described by the graph, G = H1 ⊗ H2 = (Ka,1 ⊗ Kc,1) ⊗ (Ka,2 ⊗ Kc,2). Using Eqn.12 and the coordinate system in Eqn.35, we have that
In order to model non-independence between enhancers, we will consider a gene to be modeled by the graph G♯, where G♯ is the product between an activation graph A♯ and a communication graph C, so that G♯ = A♯ ⊗ C (Fig.5). A♯ has the structure Ka,1 × Ka,2, in which Ka,1 and Ka,2 are both present as the subgraphs,

A model of non-independence in activation between enhancers. (a) The graph A♯ represents the activation components of each enhancer. A♯ has the structure of Ka,1 × Ka,2. Labels on the unmarked edges can be arbitrary. (b) The graph C = Kc,1 ⊗ Kc,2 represents the communication components of each enhancer.
respectively (Fig.5a). The remaining labels, on the edges (1, 0) ⇌ (1, 1), which specify the rates of activation and deactivation of e2 when e1 is activated, and on the edges (0, 1) ⇌ (1, 1), which specify the rates of activation and deactivation of e1 when e2 is activated, can be arbitrary. For simplicity, we assume that C = Kc,1 ⊗ Kc,2, (Fig.5b), but note that non-independence in communication could be considered similarly. G♯ has the same structure as H1 × H2, and thus still models the activation and communication statuses of e1 and e2, but using the form G♯ = A♯ ⊗ C allows us to clarify the independence relationships in G♯. Under this reorganization, the vertex in Eqn.35 is now described in a new coordinate system as,
We assume that G♯ has the same mRNA production rates as for the IAC model. In terms of the vertex subsets W = {a1 = 1, c1 = 1, a2 = 1, c2 = 1}, U = {a1 = 1, c1 = 1} \ W and V = {a2 = 1, c2 = 1} \ W, the vertices in U have production rate r1, those in V have production r2 and those in W have production rate r1 + r2; all other vertices have production rate 0. We can summarise this in the following table, in which the states are described by the coordinate system in Eqn.41.

As in the previous section, we can use the Sanchez-Kondev theorem in Eqn.14 to calculate,
Given that G♯ = A♯ ⊗ C, the probabilities of the sets {a1 = 1, c1 = 1} and {a2 = 1, c2 = 1} factor according to Eqn.11. Continuing from Eqn.44 we have,
Eqn.46 shows that the labels of A♯ do not directly appear in R(G♯); they only affect R(G♯) through the enhancer activation probabilities
Comparing Eqns.46 and 47, we see that whether the enhancers act additively (Eqn.19), sub-additively (Eqn.33) or super-additively (Eqn.31) depends on the terms
Discussion
In this paper, we have introduced mathematical formulations of a default model and an IndependentActivation-Communication model (IAC model) for how multiple enhancers collectively regulate a gene. The default model encodes the notion that enhancers operate independently of each other (Assumptions 2 and 3). At the same time, the default model imposes no assumptions on how the individual enhancers themselves are working, at the level of transcription factors, co-regulators, chromatin, etc. They can be arbitrarily complicated, so long as they operate within the Markovian setting that is commonly assumed for analysing gene regulation. The default model assumptions imply that the collective response of a gene, as measured by the mean mRNA level, is the sum of the responses coming from each enhancer individually, which we have called response additivity (Eqn.19). The default model explains the mechanistic requirements for a gene to exhibit this property and clarifies how ‘independence implies additivity’. We emphasize that independence refers here to assumptions about gene regulatory mechanisms whereas additivity refers to the consequences of those assumptions on steady-state gene expression. The default model supports the view that response additivity is a reasonable baseline against which to assess the collective action of enhancers in regulating a gene.
One of the advantages of the default model is that, because it is mathematically formulated, it allows mechanistic departures from its assumptions to be systematically analysed. We have shown how departures from Assumptions 2 and 3 can give rise to response super-additivity (Eqn.32) as well as sub-additivity (Eqn.34). As we have noted, response additivity [15, 23, 38–43], super-additivity [15, 21, 39–44] and subadditivity [38, 40, 41] have all been observed experimentally. The default model suggests the mechanistic assumptions that could be experimentally tested to determine what underlies the observed response behaviour.
The IAC model is a special case of the default model that further assumes deletion fidelity, which allows enhancers to be removed from the collective without influencing the remaining enhancers (Assumption 5), and also assumes that individual enhancers can be described at a coarse-grained level in which they are independently becoming activated and communicating their state to the gene (Assumption 6). Under Assumptions 1 to 6 for the IAC model, we derive a formula for the deletion effect of an individual enhancer (Eqn.27) that shows a striking algebraic relationship to the ABC Score formula in Eqn.2. This relationship suggests that the IAC model has accurately captured in mathematical terms the core intuitions behind the ABC model from which the Score formula emerged [11].
A persistent conceptual theme that underlies the results reported here is that of independence. The default model assumes that enhancers act independently, both in their regulatory state (Assumption 2) and in their effect on mRNA production (Assumption 3). Furthermore the IAC model assumes that individual enhancers become activated and communicating independently (Assumption 6). Our clarification of the
ABC Score formula thus arises from assuming independence between enhancers, along with independence of activation and communication within each enhancer. The product construction on graphs and on graph structures has been the key mathematical tool for rigorously defining independence, illustrating the value of the graph-based linear framework for analysing gene regulation. We note that the concept of deletion fidelity (Assumption 5) is also easily defined in the context of graphs.
Previous work used finite linear framework graphs to describe gene regulation [26]. Here, we have introduced copy-number graphs, which have infinitely many vertices that keep track of both regulatory states as well as the numbers of expressed mRNAs (Fig.2). Copy-number graphs allowed us to exploit the Sanchez-Kondev theorem and calculate the mean mRNA number at steady state in terms solely of the finite regulatory graph (Eqn.14). We therefore avoided dealing with infinite graphs despite relying on them. Importantly, the linear framework also allows the unknown parameters within graphs to be treated symbolically, so that conclusions may be drawn, as we saw above, without the need for assigning numerical values to any of the parameters.
Beyond its utility in clarifying the ABC Score formula, the activation-communication coarse graining in Assumption 6 provides an interesting lens through which to investigate enhancers. Many new experimental technologies have emerged which allow perturbing entire enhancer sequences as a whole (as opposed to small changes to DNA within an enhancer sequence). Such technologies include synthesizing and integrating long DNA sequences [43, 44], modulating the genomic position of an enhancer [47–51], high throughput enhancer perturbations with CRISPRi [11–13] and combining CRISPRi screens with rapid protein degradation [52]. By considering which perturbations affect, and do not affect, activation and communication, it may be possible to probe the validity of the activation-communication coarse graining itself.
As noted in the Introduction, the ABC Score formula has been widely adopted for predicting enhancergene connections. It has also been suggested that it could be combined with other predictive methods [53] and that the ABC model could be used as a guiding principle in formulating other quantitative models [54]. We believe the mathematical formulations that we have introduced here provide a foundation for such efforts.
The ABC Score formula is quantitative (Eq.2) but the ABC model that gave rise to it is not a formal mathematical model but, rather, an informal statement about the features, of enhancer activation and contact, that are believed to be important in determining the response of a gene. Such informal models play a critical role in biology but have the disadvantage that the underlying mechanistic requirements are not clear. It is therefore difficult to know when the model can be applied and what can be deduced from it when it does apply. In contrast, the mechanistic assumptions underlying our formal mathematical models are precisely stated—Assumptions 1 to 4 for the default model and Assumptions 1 to 6 for the IAC model— making it clear when the model can be applied and suggesting experimental tests to check the assumptions. Moreover, if those assumptions are met, then the conclusions we have drawn, such as the response additivity of the default model (Eqn.19) and the enhancer deletion formula for the IAC model (Eqn.27), are guaranteed to hold as a matter of mathematical logic [55]. If those conclusions are not found experimentally, for example, if response additivity is not found, then we know, as a matter of logic, that at least one of the assumptions underlying the corresponding model does not hold. This understanding can, in turn, inform experiments to determine where the departures from the assumptions occur. Such an approach allows a level of rigorous reasoning about enhancer behaviour in gene regulation that is significantly harder to undertake with only an informal quantitative model.
Mathematical theory has typically been introduced to analyse data, but the conceptual issues underlying gene regulation are sufficiently intricate that theory may be necessary to understand the kinds of experiments that are needed and how the data from them should best be interpreted [56]. Studies of the simple repression motif in bacterial gene regulation may have already reached that point [57–60], as reviewed in [61]. The foundation provided here, based on the linear framework, may offer similar opportunities in the eukaryotic context. We believe our rigorous mathematical approach can play a significant role in investigating the intricate interplay of enhancers in regulating gene expression.
Methods
A graph theory interpretation of the Sanchez and Kondev theorem
In this section we provide a proof of Eqn.14. We follow the Sanchez and Kondev approach described in but present it using the graph theory notation and language used in this paper. Sanchez and Kondev provide in [34] a recurrence relation for all the moments of the mRNA probability distribution. A graph theory interpretation of these results, together with generalisations, will be presented in a separate paper; here we focus on the first moment only. We use bold face to denote matrices and vectors.
The steady-state distribution of a finite graph
Let G be a finite regulatory graph on the vertex set V (G) = {1, …, N}. We define the Laplacian matrix of G, ℒ = ℒ (G), to be the N × N matrix,
As mentioned in the main text, G is equivalent to a continuous-time Markov process on the state space {1, …, N} [25, 28, 32]. Let ui(t) be the probability that the process occupies state i at time t. Then the time evolution of the probability vector,
is given by the master equation
If G is strongly connected, then the kernel of ℒ(G) is one dimensional, so there is a unique vector, u∗(G), such that ℒ(G)u∗(G) = 0 and u1(G) + · · · + uN (G) = 1. u∗(G) is the steady-state probability distribution on G.
The master equation for a copy-number graph
Let G ⋉ P denote a copy-number graph with regulatory graph G, production rate vector r ∈ ℝN and degradation rate δ. Let Π be the diagonal matrix of production rates, Πi,i = ri and Πi,j = 0 when i ≠j, and let I be the N × N identity matrix. Let
be the vector of probabilities over the regulatory states with mRNA copy number p. It follows from the definition of the copy-number graph in the main text that u(p, t) satisfies the master equation,
in which terms with arguments of p − 1 are appropriately omitted when p = 0. The first term of Eqn.51 arises from mRNA production, the second term from mRNA degradation and the third term from transitions in the regulatory graph. Eqn.51 can be rewritten as,
We now let
be the vector of marginal probabilities for the regulatory states. Proceeding from Eqn.52 we have,
This is a telescoping sum which simplifies to
It follows that the steady-state marginal probability vector, q∗, lies in the kernel of ℒ(G) and must therefore be equal to u∗(G),
In other words, the steady-state marginal distribution of regulatory states in a copy-number graph is identical to the steady-state distribution of regulatory states in a finite regulatory graph.
Proof of Eqn.14
We want to show that,
Let u∗(p) denote the steady-state probability distribution over the copy-number graph. (Note the distinction with the marginal probability distribution over the regulatory states, u∗(G) = ∑p u∗(p).) Let
be the corresponding steady-state average copy number vector. Evidently,
where 1 is the all-ones column vector of dimension N. Now let
be the time-dependent average copy-number vector. It follows from Eqn.52 that,
The first summand in Eqn.55 can be simplified to,
The second summand can be simplified to,
And the third summand is evidently just ℒ (G) μ(t). Combining these three simplifications, we see that,
At steady state this becomes,
Multiplying both sides 1T, and recalling that,
we find that,
Using Eqns.53 and 54, we see that Eqn.57 becomes,
as required. This completes the proof of Eqn.14.
Proof of response summation in the default model
In this section we prove Eqn.19 which shows that, within the default model, the collective response of all the enhancers is the sum of their individual responses. That is, if G = G1 ⊛ · · · ⊛ GN, then
We consider the case with only two enhancers, N = 2, from which the general case follows easily. Recall from the Sanchez and Kondev formula in Eqn.14 that
Assumption 3 on response summation tells us that
We can perform the summation over (i1, i2) in any order, for instance by first summing over i2 and then summing over i1. This gives,
In the inner left-hand sum over i2, the terms indexed by i1 are constant and may be extracted from that sum to give
Total probability always sums to 1, so that
Similarly, the inner right-hand sum in Eqn.62 may be written as
We recognise from Eqn.14 that the sum in brackets is the response of graph G2, so that Eqn.65 becomes,
We can now substitute Eqns.64 and 66 back into Eqn.62 to get,
We recognise from Eqn.14 that the left-hand sum is δ times the response of graph G1. In the right-hand sum, we can extract the terms that do not depend on i1 and use once again that the total probability is 1. This allows us to rewrite Eqn.67 as,
from which we conclude that, indeed,
as claimed. This completes the proof of Eqn.19.
Symbolic computations
We have provided mathematical proofs for all of our results. However, many of our results were originally discovered by exploration using computer algebra systems. Specifically, the Sage [62] computer algebra system, and the SymPy [63] and NetworkX [64] Python packages were crucial for the development of this paper.

IAC model for N = 2 enhancers as a product graph of product graphs. (a) Graphs describing the activation and communication status of enhancer 1. (b) Graphs describing the activation and communication status of enhancer 2. (c) The graph H1 whose underlying regulatory graph is Ka,1 ⊗ Kc,1. (d) The graph H2 whose underlying regulatory graph is Ka,2 ⊗ Kc,2. (e) The graph H1 ⊛ H2 satisfying the default model Assumptions 1-4 with components H1 and H2. The regulatory graph of H1 ⊛ H2 is given by (Ka,1 ⊗ Kc,1) ⊗ (Ka,2 ⊗ Kc,2). Each vertex of this graph corresponds to the activation and communication statuses of both enhancers. For the vertices along the top of the graph, the product-graph binary notation is also provided using the coordinate system ((a1, c1), (a2, c2)). Reverse edges and labels of most edges are omitted for clarity. Production states are highlighted in purple with corresponding production rates also in purple font.
Acknowledgements
JN, K-MN and JG were funded in part by NIH award R01GM122928; JN was also funded the NSF-Simons Center for Mathematical and Statistical Analysis of Biology at Harvard University. We thank Zeba Wunderlich, Rosa Martinez-Corral, Jané Kondev and members of the Gunawardena lab for discussions and comments on the manuscript.
References
- 1.Genes and SignalsCold Spring Harbor, NY, USA: Cold Spring Harbor Laboratory Press
- 2.Expression of a beta-globin gene is enhanced by remote SV40 DNA sequencesCell 27:299–308
- 3.The SV40 72 base repair repeat has a striking effect on gene expression both in SV40 and other chimeric recombinantsNucleic Acids Research 9:6047–6068
- 4.Transcriptional enhancers in animal development and evolutionCurrent biology : CB 20:R754–R763
- 5.Exploring the emerging complexity in transcriptional regulation of energy homeostasisNat. Reviews. Genet 16:665–81
- 6.Evolving new skeletal traits by cis-regulatory changes in bone morphogenetic proteinsCell 164:45–56
- 7.Enhancer variants: evaluating functions in common diseaseGenome Medicine 6:85
- 8.Genomics of long-range regulatory elementsAnnual Review of Genomics and Human Genetics 11:1–23
- 9.Illuminating the noncoding genome in cancerNature Cancer 1:864–872
- 10.Systematic mapping of functional enhancer–promoter connections with CRISPR interferenceScience 354:769–773
- 11.Activity-by-contact model of enhancer–promoter regulation from thousands of CRISPR perturbationsNature Genetics 51:1664–1669
- 12.Targeted perturb-seq enables genome-scale genetic screens in single cellsNature Methods 17:629–635
- 13.Direct characterization of cis-regulatory elements and functional dissection of complex genetic associations using HCR–FlowFISHNature Genetics 53:1166–1176
- 14.Genome-wide enhancer maps link risk variants to disease genesNature 593:238–243
- 15.An encyclopedia of enhancer-gene regulatory interactions in the human genomebioRxiv
- 16.Inherited causes of clonal haematopoiesis in 97,691 whole genomesNature 586:763–768
- 17.Structural variants drive context-dependent oncogene activation in cancerNature 612:564–572
- 18.The three-dimensional landscape of cortical chromatin accessibility in Alzheimer’s diseaseNature Neuroscience 25:1366–1378
- 19.Integration of 3D genome topology and local chromatin features uncovers enhancers underlying craniofacial-specific cartilage defectsScience Advances 8:eabo3648
- 20.Precise modulation of transcription factor levels identifies features underlying dosage sensitivityNature Genetics :1–11
- 21.Hierarchy within the mammary STAT5-driven Wap super-enhancerNature Genetics 48:904–911
- 22.Interrogation of enhancer function by enhancer-targeting CRISPR epigenetic editingNature Communications 11:485
- 23.Genetic dissection of the α-globin super-enhancer in vivoNature Genetics 48:895–903
- 24.A linear framework for time-scale separation in nonlinear biochemical systemsPLOS One 7:e36321
- 25.Laplacian dynamics on general graphsBulletin of Mathematical Biology 75:2118–2149
- 26.A framework for modelling gene regulation which accommodates non-equilibrium mechanismsBMC Biology 12:102
- 27.The linear framework: using graph theory to reveal the algebra and thermodynamics of biomolecular systemsInterface Focus 12:20220013
- 28.The linear framework II: using graph theory to analyse the transient regime of markov processesFront. Cell Dev. Biol 11:1233808
- 29.Coming full circle: on the origin and evolution of the looping model for enhancer–promoter communicationJournal of Biological Chemistry 298
- 30.The transcription factor activity gradient (TAG) model: contemplating a contact-independent mechanism for enhancer–promoter communicationGenes & Development
- 31.A phase separation model for transcriptional controlCell 169:13–23
- 32.Markov ChainsCambridge, UK: Cambridge University Press
- 33.Markovian modeling of gene-product synthesisTheoretical Population Biology 48:222–234
- 34.Transcriptional control of noise in gene expressionProceedings of the National Academy of Sciences 105:5081–5086
- 35.Is a super-enhancer greater than the sum of its parts?Nature Genetics 49:2–3
- 36.Genome-wide mapping of autonomous promoter activity in human cellsNature Biotechnology 35:145–153
- 37.Systematic interrogation of human promotersGenome Research 29:171–183
- 38.Large-scale analysis of the integration of enhancerenhancer signals by promoterseLife 12:RP91994
- 39.Developmental and housekeeping transcriptional programs display distinct modes of enhancer-enhancer cooperativity in drosophilabioRxiv
- 40.Enhancer additivity and non-additivity are determined by enhancer strength in the Drosophila embryoeLife 4:e07956
- 41.Signal Integration by Shadow Enhancers and Enhancer Duplications Varies across the Drosophila EmbryoCell Reports 26:2407–2418
- 42.Partially redundant enhancers cooperatively maintain mammalian pomc expression above a critical functional thresholdPLOS Genetics 11:e1004935
- 43.Synthetic regulatory genomics uncovers enhancer context dependence at the Sox2 locusbioRxiv
- 44.Super-enhancers include classical enhancers and facilitators to fully activate gene expressionCell 186:5826–5839
- 45.Temporal dissection of an enhancer cluster reveals distinct temporal and functional contributions of individual elementsMolecular Cell 81:969–982
- 46.An oncogenic super-enhancer formed through somatic mutation of a noncoding intergenic elementScience 346:1373–1377
- 47.Nonlinear control of transcription through enhancer–promoter interactionsNature :1–7
- 48.Building regulatory landscapes reveals that an enhancer can recruit cohesin to create contact domains, engage CTCF sites and activate distant genesNature Structural & Molecular Biology 29:563–574
- 49.Enhancer cooperativity can compensate for loss of activity over large genomic distancesbioRxiv
- 50.Long range regulation of transcription scales with genomic distance in a gene specific mannerbioRxiv
- 51.Stochastic motion and transcriptional dynamics of pairs of distal DNA loci on a compacted chromosomeScience 380:1357–1362
- 52.Cohesin-mediated 3D contacts tune enhancer-promoter regulationbioRxiv
- 53.DeepSTARR predicts enhancer activity from DNA sequence and enables the de novo design of synthetic enhancersNature Genetics 54:613–624
- 54.Deciphering the multi-scale, quantitative cis-regulatory codeMolecular Cell. Reimagining the Central Dogma 83:373–392
- 55.Models in biology: ‘accurate descriptions of our pathetic thinking’BMC Biol 12:29
- 56.Theory in Biology: Figure 1 or Figure 7?Trends in Cell Biology 25:723–729
- 57.Quantitative dissection of the simple repression input–output functionProceedings of the National Academy of Sciences 108:12173–12178
- 58.Promoter architecture dictates cell-to-cell variability in gene expressionScience 346:1533–1536
- 59.The transcription factor titration effect dictates level of gene expressionCell 156:1312–1323
- 60.Tuning transcriptional regulation through signaling: a predictive theory of allosteric inductionCell Systems 6:456–469
- 61.Figure 1 theory meets figure 2 experiments in the study of gene expressionAnnual Review of Biophysics 48:121–163
- 62.SageMath, the Sage Mathematics Software System
- 63.SymPy: symbolic computing in PythonPeerJ Computer Science 3:e103
- 64.Exploring network structure, dynamics, and function using NetworkXIn: Proceedings of the 7th Python in Science Conference pp. 11–15
Article and author information
Author information
Version history
- Preprint posted:
- Sent for peer review:
- Reviewed Preprint version 1:
Copyright
© 2025, Nasser et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics
- views
- 65
- downloads
- 0
- citations
- 0
Views, downloads and citations are aggregated across all versions of this paper published by eLife.