This paper present an important theoretical exploration of how a flexible protein domain with multiple DNA binding sites may simultaneously provide stability to the DNA-bound state and enables exploration of the DNA strand. The authors present compelling evidence that their findings has important implication for the way intrinsically disordered regions (IDR) of transcription factors proteins (TF) can enhance their ability to efficiently find their binding site on the DNA from which they exert control over the transcription of their target gene. The paper concludes with a comparison of model predictions with experimental data which gives further support to the proposed model.
important: Findings that have theoretical or practical implications beyond a single subfield
landmark
fundamental
important
valuable
useful
Strength of evidence
compelling: Evidence that features methods, data and analyses more rigorous than the current state-of-the-art
exceptional
compelling
convincing
solid
incomplete
inadequate
During the peer-review process the editor and reviewers write an eLife assessment that summarises the significance of the findings reported in the article (on a scale ranging from landmark to useful) and the strength of the evidence (on a scale ranging from exceptional to inadequate). Learn more about eLife assessments
Abstract
Transcription Factors (TFs) are proteins crucial for regulating gene expression. Effective regulation requires the TFs to rapidly bind to their correct target, enabling the cell to respond efficiently to stimuli such as nutrient availability or the presence of toxins. However, the search process is hindered by slow diffusive movement and the presence of ‘false’ targets - DNA segments that are similar to the true target. In eukaryotic cells, most TFs contain an Intrinsically Disordered Region (IDR), which is a long, flexible polymeric tail composed of hundreds of amino acids. Recent experimental findings indicate that the IDR of certain TFs plays a pivotal role in the search process. However, the principles underlying the IDR’s role remain unclear. Here, we reveal key design principles of the IDR related to TF binding affinity and search time. Our results demonstrate that the IDR significantly enhances both of these aspects. Furthermore, our model shows good agreement with experimental results, and we propose further experiments to validate the model’s predictions.
Transcription Factors (TFs) are fundamental proteins that regulate gene expression: they bind to specific DNA sequences to control gene transcription [1]; they respond adaptively to various cellular signals, thereby modulating gene expression [2, 3]; they guide processes such as cell differentiation and development, and their malfunctions are often linked to disease onset [4, 5]. The regulation mechanism is via the binding of a TF to the regulatory DNA regions associated with the genes, the TF’s target. Effective regulation requires TFs to have sufficient binding affinity and a binding rate comparable to cellular processes. In eukaryotic cells, besides having a DNA-binding domain (DBD), approximately 80% of TFs also have one or more intrinsically disordered regions (IDRs), which lack a stable 3D structure and can be considered as flexible, unstructured polymers, hundreds of amino acids (AAs) in length [6–9]. Molecular dynamics simulations suggest that IDRs play a significant role in promoting target recognition [10], by increasing the area of effective interaction between the TF and the DNA. Upon binding, the search problem becomes effectively a one-dimensional diffusion process [11]. Recent in vivo experiments done in yeast also suggest that IDRs could take a key part in guiding TFs to the DNA regions containing their targets [12, 13].
The problem of TF search has been widely studied, theoretically, albeit with the majority of works focusing on parameter regimes relevant for bacteria. One important mechanism is that of facilitated diffusion, in which the TF performs 1D diffusion along the DNA and at any given moment can fall off and perform 3D diffusion until reattaching to the DNA (a 3D excursion). The 1D diffusion along the DNA and the 3D excursions alternate until the TF finds its target. This mechanism is well characterized theoretically [14–20] and supported by experiments in E. coli [21]. However, due to the significant structural differences, including larger genome size and chromatin structure, the facilitated diffusion mechanism cannot explain the fast search times observed in eukaryotes, and a qualitatively different mechanism is required [13, 22–25].
Here, we attempt to address the fundamental question of how IDRs can enhance the binding affinity of the TFs and speed up their search process, and to discover the associated design principles in eukaryotes. The problem of binding affinity (binding probability) exhibits the classical trade-off between energy and entropy: on one hand, the binding of IDR to the DNA is energetically favorable. On the other hand, this binding leads to constraints on the positions of the IDR (pinned to the DNA at the points of attachment) which leads to a reduction in the entropy associated with the possible polymer conformations - hence an increase in the free energy of the system. Therefore the effects of the disordered tail on affinity are non-trivial, leading to physical problems reminiscent of those associated with the well-studied problem of DNA unzipping [26]. We find that the binding probability increases dramatically for TFs having IDRs (see Fig. 1b) for a broad parameter regime. The optimal search process consists of one round of3D-1D search, where at first the TF performs 3D diffusion and then binds the antenna and performs 1D diffusion until finding its target, rarely unbinding for another 3D excursion. This process is presented in Fig. 1(a, c). We obtain an expression for the search time that shows good agreement with our numerical results.
Model illustration of a TF with the IDR and its search process.
a, Illustration of a TF locating its target site (highlighted in orange) on the DNA strand (depicted as a green curve). Surrounding the target is a region of length L, where the TF’s IDR interacts with the DNA. The TF performs 3D diffusion until it encounters and binds to this “antenna” region. b, Illustration of a TF that includes an IDR composed of AAs. On the right: a model of the TF, a polymer chain, comprising of a single binding site on the DNA Binding Domain (DBD) and multiple binding sites on the IDR. c, Once bound to the antenna, the TF performs effective 1D diffusion until it reaches its target. The 1D diffusion is via the binding and unbinding of sites along the IDR, a process we coin “octopusing”.
Results
Binding probability of the TF from the equilibrium approach
The experiment described in Ref. [12] studied the binding affinity of wild-type and mutated TFs to their corresponding targets. They found that shortening the IDR length results in weaker affinity. To qualitatively model the binding affinity of a TF to its target, we utilize a well-known physical model of polymers, the Rouse Model [27, 28] (used in the subsequent dynamical section as well). See Materials and Methods. We note that our conclusions do not hinge on this choice, and our results hold also when using other polymer models (see Supplementary S1). Since the IDR is thought to recognize a relatively short flanking DNA sequence of the DBD target [12], we model a DNA segment as a straight-line “antenna” region [17, 29–31], where the IDR targets reside. We assume that the interaction between TF and DNA outside the antenna region is negligible. Note that the interaction between IDR and DNA is a specific interaction, with genomic localization determined by multivalent interactions mediated by hydrophobic residues [32, 33].
To gain an intuition into the binding probability of TF, we first study a TF with a single IDR site. From here on, energies will be measured in units of kBT , to simplify our notation. We assume that the IDR has only one corresponding target. There are four possible states: neither the DBD nor the IDR are bound, the IDR is bound, the DBD is bound, and both are bound, which correspond to Pfree, PIDR, PDBD , and Pboth, respectively. The probability that the TF is bound at the DBD target is the sum of PDBD and Pboth. In thermal equilibrium, each of the 4 terms is given by the Boltzmann distribution, which can be solved analytically by integrating-out the additional degrees of freedom associated with the polymer (Supplementary S2). To better understand the associated design principles, it will be helpful to compare this probability with that of a “simple” TF (i.e., without the IDR), which we denote by Psimple. We find that:
where 𝒫 is an enhancement factor, governing the degree to which the IDR improved the binding affinity. Importantly, we find that:
where , V1 is the target volume, −EB is the binding energy for the IDR target (EB is always positive), d is the distance between the DBD target and the IDR target, and l0 is the distance between the DBD and the IDR binding site on the TF (l0 and d are also the distances for neighboring IDR targets and neighboring IDR sites, respectively, in the case of multiple IDR targets and sites). This shows that the presence of the IDR always increases the binding affinity since 𝒫 is always positive. However, the magnitude of enhancement can be negligible: For instance, even if EB is large, the term f(d, l0), associated with the entropic penalty (because the end-to-end distance is fixed, reducing the degrees of freedom compared to a free polymer), could lead to small values of 𝒫. The exponential decay of 𝒫 with d2 (through the term f(d, l0)), indicates that placing the IDR targets close to the DBD target is preferable. However, due to physical limitations, d cannot be arbitrarily small and in reality will have a lower bound. Once d is fixed, 𝒫 is maximal at l0 = d, as can be gleaned from equation (2). This condition minimizes the free energy penalty associated with the entropic contribution of the term f(d, l0). As we shall see, this design principle is also intact when multiple IDR sites and targets are involved, which is the scenario we discuss next.
A priori, it is unclear whether having more binding sites (nb > 1) on the IDR or more targets (nt > 1) on the DNA will be beneficial or not. We therefore sought to investigate how the binding affinity depends on these parameters, as well as the binding energy and positions of the targets and binding sites. This generalization requires taking into account all the possible number of bindings, n (ranging from 0 to min (nb, nt)), and all possibilities for binding orientations for each n, distinct configurations in total. For each of them, we may analytically integrate-out the polymer degrees of freedom and calculate explicitly the contribution to the partition function (Supplementary S2). Summing these terms together, numerically, allows us to explore how the different parameters affect the binding specificity, which we describe next. To characterize the binding affinity, we resort again to the ratio Q = PTF/ Psimple, measuring the effectiveness of the binding in comparison to the case with no IDRs. The length of TFs in our considerations, which varies with nb, is comparable to that in experiments [12].
We find the design principle: l0 = d, identical to what we showed for the case of a single IDR binding site. Specifically, Q is maximal at l0 ~ d for all combinations of nb and nt. Fig. 2a displays this principle for several combinations of nb and nt. All combinations are provided in Supplementary Fig. S3. This is plausible since for the IDR to be bound to the DNA at multiple places without paying a large entropic penalty, the distance between target sites should be compatible with the distance between binding sites. We therefore set l0 = d and search for other design principles. Fig. 2b shows a heatmap of Q varying with nb and nt. At large nt , as nb increases, the value of PTF exhibits an initial substantial increase, followed by a gradual rise until it reaches saturation (shown explicitly in Supplementary Fig. S4). A similar behavior is also observed when varying the value of nt. (When nt exceeds nb , the contribution from the states where the IDR is bound but the DBD is unbound increases with increasing nt, which slightly lowers PTF; see Supplementary Fig. S4.) In fact, the heatmap shown in Fig. 2b shows that the value of PTF is governed, approximately, by the minimum of nb and nt (this is reflected in the nearly symmetrical “L-shapes” of the contours). This places strong constraints on the optimization, and suggests that there will be little advantage in increasing the value of either nb or nt beyond the point nb = nt , revealing another design principle (under the assumption that it is preferable to have smaller values of nb and nt for a given level of affinity, in order to minimize the cellular costs involved). We verify this principle holds at additional values of EB. Fig. 2c shows that the contours corresponding to PTF = 0.9 at varying values of EB all take an approximate symmetrical L-shape, thus supporting the design principle nb = nt. For a given EB, we denote the optimal values as , and show them as hexagrams in the plot. We further find that decreases with increasing EB, as displayed in the inset. These relationships also hold at PTF = 0.5 (Supplementary S5).
We examine the variation of Q with respect to EB while keeping nb = nt constant. In Fig. 2d, as the binding energy EB increases, Q increases from a value of ~ 1 to its saturation value, 1/Psimple, in a switch-like manner: i.e., there is a characteristic energy scale, Eth. To estimate Eth, when Q ≪ 1/Psimple (i.e., PTF ≪ 1), we approximate the expression of Q as:
where lorn,i and rorn,i denote the segment length between two adjacent bound sites and the distance associated with their respective targets, respectively, and “orn” represents the orientation of the given configuration (Supplementary S2). Equation (3) reduces to equation (1) when nt = nb = 1. To substantially increase Q, the configuration that contributes the most to Q, would satisfy (𝒫(EB, d, d))nb ≫ 1. Thus, the characteristic energy Eth could be estimated using the formula 𝒫(Eth, d, d) = 1, resulting in:
where represents the proportion of IDR targets within the antenna. Equation (4) is a direct consequence of the energy-entropy trade-off. Increasing ϕ leads to a decrease in entropy loss due to binding the IDR, subsequently reducing the demands on binding energy. Given that ϕ ≪ 1, Eth is much greater than 1kBT. The vertical dashed line in Fig. 2d, obtained from equation (4), effectively captures the region of significant increase for large values of nb. Hence, we obtain the third design principle: EB > Eth, which ensures the significant binding advantage of TFs with IDRs.
Design principles across various parameters: l0/d, nb, nt, and EB.
a, Plots of Q = PTF/Psimple varying with l0/d are shown for several combinations of nt and nb for typical values of the relevant quantities: the nucleus volume is 1μm3, EDBD = 15, EB = 10, V1 = (0.34nm)3, and . b, A heatmap of Q vs. nb and nt at EB = 10 and d = l0. c, Contour lines at different EB at PTF = 0.9. Blue hexagrams represent the design principle , with the black dashed line as a visual guide. The inset shows the dependence of on energy. d, Q for varying EB at d = l0 for several nb = nt. The vertical line represents Eth as determined by equation (4), the solid curves illustrate the approximation of Q obtained from equation (3), and the horizontal line indicates where PTF = 1.
So far, we have found that the binding design principles for TFs with IDRs are as follows: (i) target distance d should be as small as possible, (ii) IDR segment binding distance l0 should be comparable to d, (iii) itis preferable to have fewer binding sites and binding targets as the binding strength EB increases, with , and (iv) EB should be larger than a threshold that solely depends on the proportion ϕ of IDR targets within the antenna, as indicated by Eq. (4). In our estimation, with ϕ2 = 0.02, we obtain Eth ≈ 8.5kBT. The above principles have also been validated in scenarios involving Poisson statistics of binding site distances and target distances, thus indicating their robust applicability (Supplementary Fig. S6).
Search time of the TF from the dynamics
In this section, we study the design principles of the TF search time, a dynamic process. Here, we confine the TF with ñ = nb+1 sites (nb IDR sites and one DBD site) to a sphere of radius R, the cell nucleus, and use over-damped coarse-grained molecular dynamics (MD) simulations. The sites and targets interact via a short-range interaction. The DBD target with a radius a is placed at the sphere’s center. See the sketch in Fig. 1a. Once the DBD finds its target, the search ends. We denote the mean total search time (i.e., the mean first passage time, MFPT), as ttotal. We estimate itby averaging 500 simulations, each starting from an equilibrium configuration of the TF at a random position in the nucleus. Simulation details are provided in the Materials and Methods section.
By conducting comprehensive simulations of ttotal across various parameter values of L, EB, and ñ, we were able to identify its minimum value, which is approximately at L* = 300bp, = 11, and ñ* = 4 for R = 500bp ≈ 1/6μm. Here, one base pair (bp) is equal to 0.34nm. To speed up the simulations, we utilized several computational tricks, including adaptive time steps and treating the TF as a point particle when it is far from the antenna (see Materials and Methods). Despite of this, the simulations require 105 CPU hours due to the IDR trapped in the binding targets on the antenna, which results in the slow dynamics as found in the previous simulation works [10, 34]. The probability distribution function (pdf) of individual search times exponentially decays with a typical time equal to ttotal, see inset of Fig. 3. This indicates that ttotal provides a good characterization of the search time.
We find that ttotal varies non-monotonically with L at fixed and ñ* as shown in Fig. 3. Once a TF attaches to one of its targets, it will most likely diffuse along the antenna until reaching the DBD target (Supplementary S7). It means that the optimal search process is a single round of 3D diffusion followed by 1D diffusion along the antenna. In Supplementary S8, by analytically calculating the mean total search time using a coarse-grained model, we find that a single round of 3D-1D search is typically the case in the optimal search process. Note that our model differs profoundly from the conventional facilitated diffusion (FD) model in bacteria: In the latter, the TF is assumed to bind non-specifically to the DNA. Within our model, the antenna length L can be tuned by changing the placement of the IDR targets on the DNA, making it an adjustable parameter rather than the entire genome length. As opposed to the FD model, which requires numerous rounds of DNA binding-unbinding to achieve minimal search time, within our model, the minimal search time is achieved for a single round of 3D and 1D search.
Interestingly, when we observe the motion of a TF along the antenna, IDR sites behave like “tentacles”, as illustrated in Fig. 1c, constantly binding and unbinding from the DNA. We refer to this 1D walk as “octopusing”, inspired by the famous “reptation” motion in entangled polymers suggested by de Gennes [35] (see Supplementary Video). We note that previous numerical studies [10, 11] used comprehensive, coarse-grained, MD simulations to demonstrate that including heterogeneously distributed binding sites on the IDR could facilitate intersegmental transfer between different regions of the DNA, which is reminiscent of our octopusing scenario.
The mean 3D search time, t3D, and the mean 1D search time, t1D are also shown in Fig. 3. To evaluate t3D, we take advantage of several insights: first, if the TF gyration radius, , is comparable or larger than the distance between targets, d, we may consider the chain of targets as, effectively, a continuous antenna. Making the targets denser would hardly accelerate the search time, reminiscent of the well-known chemoreception problem where high efficiency is achieved even with low receptor coverage [36]. Second, when the distance between the TF and the antenna is smaller than rp, one of the binding sites on the TF would almost surely attach to the targets. This allows us to approximately map the problem to the MFPT of a particle hitting an extremely thin ellipsoid with long-axis of length L and radius rp, the solution of which is ln(L/rp)/L [37]. We find that this scaling captures well the dependence of t3D on the parameters, with a prefactor α ~ O(1). For the 1D search from a single particle picture, since the TF is unlikely to detach once it attaches, it performs a 1D random walk. So, taken together, the mean total search time is
where D is the 3D diffusion coefficient of one site. For a TF with ñ sites, its 3D diffusion coefficient, D3, is D/ñ. α and β are fitting parameters. The semi-analytical formulas of t1D, t3D, and ttotal, denoted by the solid lines in Fig. 3, agree well with the simulation results for α = 0.5 and β = 14.5. The theoretical result for the time spent in 3D was originally calculated for a point searcher [37]. In that case, a value of α = 2/3 was obtained. β is larger at larger EB (see Supplementary S7).
We also examine the dependence of ttotal on the key parameters EB and ñ, in the vicinity of the optimal parameters which minimize the total search time (Supplementary S7). We find that the affinity of TFs to the antenna decreases rapidly with decreasing the IDR length [12] and EB. Consequently, when ñ < ñ* or , multiple search rounds are needed to reach the DBD target, resulting in a significant escalation of ttotal. When ñ > ñ*, ttotal, t3D, and t1D agree with equation (5). When , since t3D is independent of EB, the variation in ttotal stems from t1D which exponentially increases with EB.
Mean search time varying with the antenna length L.
ttotal, t3D and t1D vs. L at and ñ*. The solid curves correspond to ttotal , t3D and t1D in equation (5). Inset: probability density function (pdf) exponentially decays at large t. pdf ≈ exp(-t/ttotal)/ttotal (black solid line). The unit t0 represents the time for one AA to diffuse 1bp. a = 0.5bp, d= 10bp, and l0 = 5bp.
The minimal search time for a simple TF is tsimple = R3/(3Da) [37]. For the TF with an IDR, by minimizing the expression in equation (5), we find that the optimal length approximately scales as L* ∝ R, and therefore the minimal search time, tmin, is approximately proportional to R2. At R = 500 ≈ 1/6μm, the search time is more than ten times shorter than a simple TF, despite the latter’s diffusion coefficient being several times larger. Based on our analytic result, we expect ~ 50 fold enhancement for the experimental parameters of yeast (note that this formula is approximate since it does not take into account the persistence length of DNA [38], which is severalfold shorter than the optimal antenna length L*. We have verified that the mean total search time does not change significantly for a very curved antenna, as detailed in Supplementary S9.). We also verified that the 3D diffusion coefficient slightly changes under a complex DNA configuration, where a point-searcher TF non-specifically interacts with the entire DNA, indicating the robustness of our optimal search process (see Supplementary S10).
Next, we characterize the dynamics of the octopusing walk (Supplementary S11). During this process, different IDR binding sites constantly bind and unbind from the antenna, leading to an effective 1D diffusion. A priori, one may think that the TF faces conflicting demands: it has to rapidly diffuse along the DNA while also maintaining strong enough binding to prevent detaching from the DNA too soon. Indeed, within the context of TF search in bacteria, this is known as the speed-stability paradox [16]. Importantly, the simple picture of the octopusing walk explains an elegant mechanism to bypass this paradox, achieving a high diffusion coefficient with a low rate of detaching from the DNA. This comes about due to the compensatory nature of the dynamics: there is, in effect, a feedback mechanism whereby if only a few sites are bound to the DNA, the on-rate increases, and similarly, if too many are bound (thus prohibiting the 1D diffusion) the off-rate increases. Remarkably, this mechanism remains highly effective even when the typical number of binding sites is as low as two. Further details are provided in Supplementary S11.
Comparing with experimental results
In Ref. [12], the binding probability of the TF Msn2 in yeast was measured, where the native IDR was shortened by truncating an increasingly large portion of it. The results indicate that binding probability decreases with increasing truncation of the IDR, as illustrated in Fig. 4(a). We utilized our model to calculate the relative binding probability compared to PTF at a zero truncating length. Using the DBD binding energy EDBD as our only fitting parameter, our model achieves quantitative agreement with the experimental data. In Fig. 4(b), we estimate the search time as a function of L for R = 1μm and D0 = 5 × 10−10m2/s for a single AA. The estimated timescales are comparable to empirical results, which are 1 to 10 seconds [13, 39], as indicated by the shaded area. We have also estimated the on-rate kon and off-rate koff of the TF as they vary with IDR length, as shown in Fig. 4(c). In the kinetic proofreading mechanism [40], specificity is associated with the off-rate, but recent studies [41] suggest that in other scenarios, specificity is encoded in the on-rate. Therefore, it is important to explore how specificity is encoded within this mechanism. Our results indicate that the variation in the dissociation constant koff /kon is dominated by koff, suggesting that specificity is encoded in the off-rate. The magnitudes and variations of these rates could be directly compared to experimental results in future studies.
(a) Relative binding probability varies with the truncation length of the IDR, which quantitatively agrees with the experimental data in Ref. [12], where we only vary EDBD (binding energy of DBD) to ensure that the maximal value reaches 1 at zero truncation length. Antenna length L = 1000nm, EDBD = 22 and EB = 11. Other parameters are set the same as in Fig. 2. (b) The search time estimated by our model (from Eq. (5)) is quantitatively comparable with the experimental results, as guided by the shaded area. (c) The on-rate kon = ttotal/Vc and off-rate koff = (1 − PTF) / (PTFttotal) varying with the IDR length, indicating that kD = koff/kon ranges from 0.01 to 10 nM, with its variation primarily dominated by the off-rate.
In summary, we addressed the design principles of TFs containing IDRs, with respect to binding affinity and search time. We find that both characteristics are significantly enhanced compared to simple TFs lacking IDRs.
The simplified model we presented provides general design principles for the search process and testable predictions. One possible experiment that could shed light on one of our key assumptions is “scrambling” the DNA (randomizing the order of nucleotides). Our results depend greatly on our assumption that there exists a region on the DNA, close to the DBD target, that contains targets for the IDR binding sites. It is possible to shuffle, or even remove, regions of different lengths around the DBD target and to measure the resulting changes in binding affinity. According to our results, we would expect that shuffling a region smaller in length than the IDR should have minimal effect on the binding affinity and search time, but that shuffling a region longer than the IDR should have a noticeable effect.
Another useful test of our model would be to probe the dynamics of the TF search process by varying the waiting time before TF-DNA interactions are measured: Within the ChEC-seq technique [42] employed in Ref. [12], an enzyme capable of cutting the DNA is attached to the TF, and activated at will (using an external calcium signal). Sequencing then informs us of the places on the DNA at which the TF was bound at the point of activation. Measuring the binding probabilities at various waiting times — defined as the time between adding calcium ions and adding transcription factors, ranging from seconds to an hour, would inform us both of the TF search dynamics as well as the equilibrium properties: Shorter time scales would capture the dynamics of the search process, while longer time scales would reveal the equilibrium properties of the binding. At short waiting times (within several seconds), we would expect the binding probability to increase with waiting times. By analyzing a normalized binding probability curve as a function of the waiting time t, we could fit the curve using the function 1−exp(−t/tsearch). From this curve, we can infer the search time. Our results suggest that the presence of the IDR in a TF leads to increased binding probabilities over various waiting times and a reduced waiting time to reach a higher saturation binding probability. Furthermore, as shown in Fig. 4 obtained from our model, kon can be determined from the measurements of tsearch. Using the binding probability, which is associated with kD = koff /kon, koff can also be calculated.
While our work primarily focused on coarse-grained modeling of the search process and the hypothesized octopusing dynamics, higher-resolution molecular dynamics simulations could provide more detailed molecular insights and further validate our proposed design principles [10, 11]. Since these principles are grounded in fundamental physical concepts, such as the energy-entropy trade-off, they are likely to be robust across diverse molecular systems. We anticipate their applicability to other biophysical contexts, such as nonspecific RNA polymerase binding [43] and multivalent antibody binding [44], where the associated dynamic processes would also be of significant interest for future research.
Materials and methods
Model: The Rouse Model describes a polymer as a chain of point masses connected by springs and characterized by a statistical segment length l0 resulting from the rigidity of springs k and the thermal noise level kBT , = 3kBT/k [27, 28]. We coarse-grain the TF by representing it with a few binding sites for the IDR, and an additional site for the DBD, as illustrated in Fig. 1b. The targets are assumed to be positioned at equal spacing, with one target for the DBD and a few targets for the IDR, corresponding to the orange and black squares in Fig. 1c. We have also verified that relaxing this assumption, and using randomly positioned targets, does not affect our results (Supplementary S6). In the Rouse Model, the 3D probability density describing the relative distance in the equilibrium state, denoted as r, between any two sites can be derived through Gaussian integration [28], leading to:
where and nseg is the number of segments between the two sites.
Dynamical simulations: Each binding site of the TF at position ri follows over-damped dynamics, which reads:
where the random variable ξ follows the standard normal distribution. D is the diffusion coefficient of the coarsegrained binding site. The force acting on each site is f ({ri}) = −∂Utotal/∂ri, and
where Rj are the positions of the targets. wd = 1bp is the typical width of the short-range interaction between sites and their corresponding targets. The typical distance l0 between coarse-grained binding sites is 5bp. The stiffness of the springs is thus . To speed up the simulations, when the sites’ minimal distance to the antenna is larger than the threshold 10bp, the TF is regarded as a rigid body, and all sites move in sync with their center of mass. We verified that the resulting search time is insensitive to the value of this threshold. When the sites’ minimal distance to the antenna is less than 10bp, we consider the excluded volume effect to ensure that different binding sites are not trapped at the same target. Specifically, we use the WCA interaction [45]:
where σ = 1bp. In this region, we also adopted an adaptive time step: . Note that this choice ensures that the displacement of the sites at every step is much smaller than the binding site size wd. Due to the IDR trapped in the binding targets on the antenna, which results in slow dynamics, the simulation time becomes significantly longer, as found in previous simulation studies [10, 34]. This slowdown is similar to the behavior observed in glass-forming liquids [46]. As a result, the individual simulation time steps range from 109 to 1011.
Data and code availability
The data supporting this study’s findings and the codes used in this study are available from W. Ji and the corresponding author.
Additional information
Authors’ contributions: All authors contributed to the design of the work, analysis of the results, and the writing of the paper.
Acknowledgements
We thank Samuel Safran, David Mukamel, Yariv Kafri, Yaakov Levy, Luyi Qiu, and Zily Burstein for insightful discussions. We thank the Clore Center for Biological Physics for their generous support.
Supplementary materials
S1. Broad applicability of probability density f (r, l)
We verify the broad applicability of Eq. [1] in the main text, regardless of the specific details of microscopic interactions. To demonstrate this, we only require that the density function of the end-to-end distance r for a segment containing m amino acids (AAs), denoted as f(r, l0), complies with Eq. [1]. Specifically, it is.
where l0 is the typical length of the segment. When m is large, according to the central limit theorem, r should conform to the distribution in Eq. [S1] with a variance of , where aeff represents the effective length of an individual AA.
Next, we will consider a specific form of interaction, of springs with a finite rest-length. In one limit (where the rest length vanishes), this corresponds to the Rouse model (where Eq. [S1] is exact for any value of m, not only asymptotically) and in another limit (where the springs are very stiff) this corresponds to a chain of rods. As we shall see, Eq. [S1] provides an excellent approximation already for moderate values of m.
To proceed, we consider two adjacent AAs at positions and with an interaction denoted by . Their probability density is . can be calculated using :
Specifically, for a spring-like interaction with a stiffness constant k0 and a rest length b0, the interaction is given by . This formula can also be interpreted as a harmonic approximation of the interaction around the equilibrium distance b0. In this case, is derived analytically:
where and is the error function. When b0 = 0, we have aeff = a0, and Eq. [S1] holds for any m. When b0 ≫ a0, then aeff ≈ b0 corresponds to a chain of rods where the angle between consecutive rods is free to rotate. Fig. S1(a) shows that the numerical results obtained from this harmonic interaction are in good agreement with Eq. [S1] even at moderate values of m. The inset displays the variation of with b0/a0 obtained from Eq. [S3]. In Fig. S1(b), with b0 =a0, f(r, l0) rapidly converges to Eq. [S1].
f(r, l0) aligns well with Eq. [S1] (solid curves) for moderate m at various b0. Inset: vs. b0/a0 from Eq. [S3]. (b) Eq. [S1] serves as a good approximation (solid lines) even at small m.
S2. Derivation of the binding probability
In this section, we derive the generalized binding probability PTF for nb IDR binding sites and nt IDR targets. To achieve this generalization, we consider all the possible number of bindings, n from 0 to min(nb, nt), and all possibilities for binding orientations for each n. Note that the energy and entropy trade-off is naturally included in our model, and the generalized formula of PTF is also applicable for non-uniformly distributed binding targets and binding sites.
Based on the Rouse model, the system’s energy can be expressed as , where represents the position of the binding site i. The probability density in the absence of binding reads:
where .
Now we consider a given configuration with n IDR sites bound, where IDR sites q1 < … < qn are bound to targets s1, …, sn. The positions of these bound targets are at , …, . The DBD site is denoted by q0 = 0 and its target is denoted by s0 = 0. There are nb—n unbound IDR sites, indicated by . The corresponding binding probability is given by:
where “orn” stands for the orientation of the given configuration, Z is the normalization factor, and EDBD is the DBD site binding energy for its target. There are possible binding orientations at n. The key insight in evaluating this integral is to realize that we can use the result of Eq. [1] of the main text to integrate out the degrees of freedom associated with the sites between qi-1 and qi :
Since and are confined to a very small volume V1, to an excellent approximation we may simply evaluate in Eq. [S6] at and , to obtain:
where , rorn,i = |si — si-1| d, , and .
To give a specific example, consider the binding topology corresponding to Fig. S2. In this case, n = 3, lorn,1 = l0 (since there is one spring between the DBD and the first binding site), (since there are 3 springs between q1 and q2) and similarly lorn,3 = l0. Eq. [S6] implies that the contribution of this diagram to the binding probability is: . Note that while in this example, the order of the target sites is monotonic (i.e., as we increase the index of the binding sites, we progressively move further away from the DBD), in principle, we should sum over all diagrams, including those where the order is non-monotonic. At n = 0, .
A diagram of the bound configurations at n = 3.
Similarly, for the state where the DBD is unbound and n ≥ 2 IDR sites are bound, the binding probability reads:
where . For n = 1, . Here ntnb represents the number of possibilities for one of the nb IDR sites bound to one of the nt targets. For n = 0, .
Since the sum of all the possibilities is equal to 1, as given by: , where nmin = min{nb, nt}, we can determine the normalization factor. Therefore, the binding probability PTF for the DBD bound in its target can be expressed as:
When PTF ≪ 1, we can approximate PTF as
where we have considered the facts that rIDR < rDBD because EB < EDBD, and Psimple = rDBD/(1 + rDBD) ≈ rDBD due to rDBD ≪ 1. For instance, taking EDBD = 15, V1 = 1bp3 ≈ (0.34nm)3, and Vc = 1μm3, we find that rDBD ~10-4.
At nb=nt = 1, the generalized Eq. [S10] simplifies to the following expression:
Therefore, PTF can be approximated as (1+𝒫)Psimple, as illustrated in Eq. [6] of the main text. This approximation is also derivable from Eq. [S11].
S3. Q varying with l0/d for all combinations of nb and nt
Maximal Q is observed at l0 ~d for all combinations of nb and nt.
S4. PTF varying with nb and nt
(a) PTF varying with nb at nt = 12. PTF initially increases rapidly, until eventually saturating. (b) PTF varies non-monotonically with nt. PTF slightly decreases with increasing nt at nt > nb. Three examples at nb = 7, 8, 9 are displayed. PTF is denoted by pentagrams at nt = nb.
S5. Robustness of the relationships: and decreases with increasing EB
Fig. S5 shows the contour lines at PTF =0.9 and PTF =0.5 for various EB. All take an approximate symmetrical L-shape, indicating that for a given EB, the optimal values and are comparable, as guided by the black dashed line y = x. As EB increases, the contour lines shift towards lower values of nb and nt, indicating a decrease in as EB increases.
Contour lines at PTF =0.9 and PTF =0.5 on the plane defined by the number of binding sites, nb, and the number of binding targets, nt.
The black dashed line is added to guide the relationship nb = nt.
S6. Robustness of the binding design principles
In the main text, we employed uniform binding site distances in TFs, represented as l0 , and uniform binding target distances on the DNA, denoted as d, to obtain the design principles. Here, we also explore the scenario where both IDR binding sites and their targets are distributed as a Poisson process (i.e., the distance between neighboring sites is exponentially distributed), with mean values of l0 and d, respectively. Figure S6 is similar to Figure 2 in the main text. We observe that the design principles identified in the main text are applicable under these conditions as well.
Plots of the affinity enhancement, quantified by the ratio Q = PTF/Psimple, for the case where IDR sites as well as their targets are randomly positioned on the IDR and DNA, respectively.
See the analogous Figure 2 in the main text for the case of homogeneously distributed IDR sites and targets. (a) Q is maximal when l0 ~ d (shaded region) for different values of nt = nb. The average distance between IDR sites is l0 , and the average distance between IDR targets is d. (b) A heatmap of Q at EB = 10 and l0 =d. (c) Contour lines at different EB at PTF = 0.5 exhibit an approximate symmetrical L-shape, which indicates , as shown by the red triangles. decreases with increasing EB. (d) Q varies with EB for several different constants of nt = nb. The characteristic energy Eth, as predicted by Eq. [5] in the main text, corresponds to the region where a notable increase is observed.
S7. TF walking along the antenna once it attaches, and ttotal vs. EB and vs.
We show a typical example of the trajectory of a TF (red) in the cell nucleus (light blue) in Fig. S7(a) where the TF walks along the antenna to the DBD target once it attaches the IDR targets at and . The DBD target is at the origin. In Fig. S7(b), we find that the average number of rounds the TF hits the antenna, nhits, is close to 1, so it is fair to describe the overall search process as two distinct steps with the first step being purely a 3D search and the second a purely 1D search. This is expressed by Eq. [5] in the main text.
(a) A trajectory (red) of TF in the cell nucleus (light blue). rDNA is the average distance of the TF to the antenna, and Xc is the position of the center of mass along the antenna direction. The trajectory is displayed with a time interval of 106t0. (b) The average number of rounds the TF hits the antenna, nhits, before reaching the DBD target. The optimal mean total search time is obtained at L* = 300bp.
The minimal search time, tmin, is obtained at L*, , and . Fig. S8(a) shows ttotal vs. EB at fixed L* and . When , t3D does not depend on EB as we predict (red solid line) and t1D exponentially increases with increasing EB. The solid blue line shows the exponential dependence of t1D on EB, also shown in the inset. When , the number of rounds of hits on the antenna, nhits, is larger than 1, as displayed in the inset. In this scenario, equation (5) in the main text becomes ineffective, resulting in a significantly longer search time than tmin.
Fig. S8(b) shows ttotal vs. at fixed L* and . When , t3D and t1D agree with the prediction of Eq. [5] (red curve and blue curve). Interestingly, we find that t1D only weakly depends on . This arises due to the compensating effects of attaching and detaching: As increases, the TF attaches to new targets more easily, speeding up 1D motion. However, it also makes detaching from the current site more difficult, thereby slowing down 1D motion. When , Eq. [5] breaks down because of multiple rounds of hits, which is verified in the inset. As a result, both ttotal and t3D increases with decreasing , leading them to become significantly longer than tmin. If each round of 3D search is independent, the 3D search time should be nhits times the one round of 3D search time, displayed in the black curve.
(a) ttotal, t3D, and t1D vs. EB at fixed L* and . The red solid line is the prediction of t3D from Eq. [5], and the blue line (also in the inset) is the exponential fit for t1D. The black curve in the inset is the number of rounds the TF hits the antenna, nhits, vs. EB. (b) ttotal, t3D, and t1D vs. at fixed L* and . The red and blue solid lines represent the predictions of t3D and t1D from Eq. [5], respectively. nhits varying with is shown in the inset.
S8. A single round of 3D-1D search is typically the case in the optimal search process.
In this section, we will show that the optimal search process (i.e., minimizing the mean first passage time) generically takes place via a single round of 3D-1D search. To do so, we first need to derive the expression for the mean total search time, ttotal, and its dependence on the relevant parameters, including, importantly, EB and L. According to our model presented in the main text: A TF is initially in the 3D nucleus space; it can attach to the antenna and perform 1D diffusion with the coefficient denoted by D1. We assume the DBD target is placed in the middle of the antenna, as shown in Fig. 1 of the main text. The mean search time taken for a TF to move from 3D space and attach to the antenna is denoted by t3D. The TF can also detach from the antenna at a rate we denote by Γ, perform a 3D random walk, until ultimately attaching again to the antenna. In the following, we will make the assumption (often made in the context of FD model [14–20]) that the binding position after such a 3D excursion is uniformly distributed on the antenna. Both D1 and Γ are functions of EB. When the TF is on the antenna, its mean search time to reach the DBD target, located a distance x away on the antenna, is represented by 𝒯 (x). Naturally, 𝒯(x) is zero when x = 0, the location of the DBD target. The mean total search time ttotal is thus given by
Consider a time interval Δt. With probability ΓΔt the TF falls off and the mean first passage time to reach the target is ttotal + Δt. Otherwise, with probability 1 — ΓΔt, the mean first passage time would be the average of the two possible outcomes, namely, (𝒯(x+Δx) + 𝒯(x-Δx))/2 + Δt. This recursive reasoning leads us to the following equation:
In the continuous limit, we have
where D1 = (Δx)2/(2Δt). By applying the boundary conditions, specifically the absorbing boundary condition at x = 0 and the reflecting boundary conditions at x = ±L/2, 𝒯(x) is determined as
where Ls, representing the 1D diffusion distance, is defined by [47]. Therefore, ttotal can be self-consistently solved by substituting 𝒯(x) into Eq. [S13], presented as follows:
We can obtain ttotal in two limiting cases:
When L ≫ Ls, where multiple rounds of 3D-1D excursions are needed to find the target, we have ttotal = . This is the same as the results of the facilitated diffusion model [19, 20, 47, 48].
When L ≫ Ls, where a single round of 3D-1D search can find the target, we get . This matches Eq. [5] in the main text when L < 2R.
Note that the calculations above did not assume an optimal or efficient search process, but simply characterized the dependence of the mean first passage time on the model parameters. Next, based on the expression for ttotal in Eq. [S17], we will prove that the search process involving multiple rounds of 3D-1D search is not optimal. This conclusion is reached by showing that the minimum ttotal always occurs in the regime where L < 2Ls (since Ls is the typical distance covered by 1D diffusion on the antenna before falling off, and the maximal distance needed to be traversed on the antenna is L/2). We emphasize an important distinction between our model and the well-studied facilitated diffusion model (where it is established that multiple rounds of search are needed for efficient search): in our case the antenna length L is a variable that can be optimized over (since we assume evolution has optimized over the size of the region flanking the DBD in which binding sites for the IDR are located). In the facilitated diffusion model, L is assumed to be the length of the genome, since the interactions of the TF and the DNA are assumed non-specific.
To proceed, let us first note that the factor coth in Eq. [S17] quickly converges to 1 as the ratio of L/Ls increases. For instance, when the ratio is 2, we have coth(2) ≈ 1.04. Thus, for L > 2Ls, we could approximate ttotal as
Then, we consider how t3D varies with L. If the rate to hit the antenna after 3D diffusion was the sum of the rates to hit every infinitesimal segment of it, we would expect the rate to hit the antenna to be linear in L, and the mean first passage time to scale as 1/L. However, these processes are not uncorrelated. Interestingly, when L < 2R, and assuming a straight antenna, t3D is proportional to ln(L)/L (see Eq. [5] in the main text), implying that for a straight antenna the effect of these correlations is mild. However, when L > 2R, the antenna is curved rather than straight (due to the limitations imposed on the DNA conformation by the dimensions of the nucleus), and the dependence of t3D on L is weaker [49]. Thus, for any given L greater than 2Ls, t3D varies more weakly than 1/L, making the first term of ttotal in Eq. [S18] an increasing function of L. Clearly, the second term increases as well. Therefore, when L > 2Ls, regardless of the functional forms of D1 and Γ-1 on EB, ttotal decreases with decreasing L. Thus, the minimum ttotal occurs in the regime where L < 2Ls, and the optimal search typically consists of a single search round.
S9. Robustness of mean total search time with changing the antenna geometry
We checked that the mean total search time (Mean First Passage Time) for IDR targets, generated via the worm-like chain model - a random walk with a persistence length [50], does not significantly differ from the scenario where targets are arranged linearly, as presented in the main text. See Fig. S9 for details.
Left: an example of the target configuration (the antenna) generated by the worm-like chain model, where L = 300bp, d = 10bp, and the persistence length also equals 10bp. Right: Mean total search times weakly vary with persistence length when targets are generated using the worm-like chain model. The values are represented by the black dots, with parameters set to EB = 11, L = 300 bp, and = 4. For each persistence length, we obtained ten mean total search times by considering ten different target configurations. The mean total search times do not exhibit significant differences compared to that for targets arranged linearly (indicated by the horizontal orange line).
S10. MSD in a complex DNA configuration
In Fig. S10, we run simulations to examine how the Mean Square Displacement (MSD) of a point searcher is influenced by the binding energy (Enonsp) between the TF and the entire DNA in a complex DNA configuration. Fig. S10(a) shows the DNA configuration generated via the worm-like chain model with a persistence length of 150 bp [50], where the volume fraction v = 0.008 is comparable to the experimental yeast DNA volume fraction. Fig. S10(b) shows the MSD measured at different values of Enonsp for v = 0.008. When Enonsp ~ 1kBT, which corresponds to the non-specific binding energy regime in experiments [51], the effective 3D diffusion coefficient is slightly modified, which does not affect our conclusion on the optimal search process. Additionally, we observed that the MSD curve bends when Enonsp exceeds the typical range of non-specific binding energies.
(a) Configuration of DNA (in red) at v = 0.008 with a complex structure and a trajectory of TF (in black). A TF acts as a point searcher that diffuses in 3D space and becomes trapped once it hits the DNA, which is 1nm wide. It detaches at a rate of . (b) MSD at different Enonsp.
S11. Properties of the octopusing walk
Once an IDR site binds to the antenna, the TF performs an octopusing walk, whereby different IDR binding sites constantly bind and unbind from the DNA. As we shall see, this will lead to an effective 1D diffusion of the TF. This indicates that during the octopusing walk, the TF neither gets trapped on a single target nor fully detaches from the antenna, allowing for rapid dynamics (in fact, the 1D diffusion coefficient can be comparable to the free 3D diffusion coefficient). We shall also show that the number of bound sites on the DNA is described by a probability distribution that is in steady-state and obeys detailed balance.
We denote that time-dependent probability distribution of the number of bound sites by P(n, t) (with n ranging from 0 to ). The dynamics of this probability distribution is given by:
where W is called transition rate matrix, and Wn,n = -Wn,n-1 - Wn,n+1. Intuitively, we anticipate that Wn,n-1, corresponding to the process of detachment of one of the sites, is linear in n, since we expect the different sites to be approximately independent of each other. Similarly, if we consider the different IDR sites as independent “searchers”, the on-rate (corresponding to Wn,n+1) to be linear in (). Indeed, Fig. S11(a) corroborates this intuition to an excellent approximation. As the dynamics proceeds, Eq. [S19] would reach a steady-state distribution governed by the matrix W. Fig. S11(b) shows the excellent agreement between the steady-state associated with the matrix W and that found directly from the MD simulation. This confirms that TF movement along the antenna is in a dynamic steady state. We also note that since our transition matrix is tri-diagonal, it satisfies the detailed balance condition, i.e., in the steady-state reached, we have: P (n)Wn,n-1 = P (n—1)Wn-1,n, where P(n) is the steady-state distribution.
A priori, one may think that the TF is facing two contradictory demands: on the one hand, it has to perform rapid diffusion along the DNA. On the other hand, it needs to bind sufficiently strongly so that it does not fall off the DNA too soon. Indeed, this is known as the speed-stability paradox within the context of TF search in bacteria. Importantly, the simple picture we described above explains an elegant mechanism to bypass this paradox, achieving a high diffusion coefficient with a low rate of detaching from the DNA. This comes about, due to the compensatory nature of the dynamics outlined above: there is, in effect, a feedback mechanism whereby if only few sites are attached to the DNA, the on-rate increases, and, similarly, if too many are attached (thus prohibiting the 1D diffusion) the off-rate increases. This is manifested by the peaked steady-state distribution shown in Fig. S11(b). In fact, this mechanism is remarkably effective even when the typical (i.e., most probable) number of binding sites is as small as 2! (note that in this case, as shown in the inset of Fig. S11(a), the TF may cover a distance of more than 100bp before falling off).
(a) Wn,n+1 (or Wn,n-1) denoted by “on” (or “off”) varies with n for EB = 6 and = 12. Inset: Mean square displacement (MSD) suggests diffusive motion, as indicated by the dashed line following MSD ∝ t. (b) Steady probability distribution of bound sites.
Department of Physics of Complex Systems, Weizmann Institute of Science, Rehovot, Israel, John A. Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, United States
For correspondence: wencheng.ji@weizmann.ac.il
Ori Hachmo
Department of Physics of Complex Systems, Weizmann Institute of Science, Rehovot, Israel
Naama Barkai
Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, Israel
Ariel Amir
Department of Physics of Complex Systems, Weizmann Institute of Science, Rehovot, Israel, John A. Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, United States
For correspondence: ariel.amir@weizmann.ac.il
Author Notes
Competing interests: No competing interests declared
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics
views
90
downloads
10
citations
0
Views, downloads and citations are aggregated across all versions of this paper published by eLife.