Design principles of transcription factors with intrinsically disordered regions

eLife Assessment

This paper presents an important theoretical exploration of how a flexible protein domain with multiple DNA binding sites may simultaneously provide stability to the DNA-bound state and enables exploration of the DNA strand. The authors propose a mechanism ("octopusing") for protein doing a random walk while bound to DNA which simultaneously enables exploration of the DNA strand and enhances the stability of the bound state. This study presents compelling evidence that their findings has implications for the way intrinsically disordered regions (IDR) of transcription factors proteins (TF) can enhance their ability to efficiently find their binding site on the DNA from which they exert control over the transcription of their target gene. The paper concludes with a comparison of model predictions with experimental data which gives further support to the proposed model.

https://doi.org/10.7554/eLife.104956.3.sa0

Significance of the findings:

Important: Findings that have theoretical or practical implications beyond a single subfield

Landmark
Fundamental
Important
Valuable
Useful

Strength of evidence:

Compelling: Evidence that features methods, data and analyses more rigorous than the current state-of-the-art

Exceptional
Compelling
Convincing
Solid
Incomplete
Inadequate

During the peer-review process the editor and reviewers write an eLife Assessment that summarises the significance of the findings reported in the article (on a scale ranging from landmark to useful) and the strength of the evidence (on a scale ranging from exceptional to inadequate). Learn more about eLife Assessments

Abstract

Transcription factors (TFs) are proteins crucial for regulating gene expression. Effective regulation requires the TFs to rapidly bind to their correct target, enabling the cell to respond efficiently to stimuli such as nutrient availability or the presence of toxins. However, the search process is hindered by slow diffusive movement and the presence of ‘false’ targets – DNA segments that are similar to the true target. In eukaryotic cells, most TFs contain an intrinsically disordered region (IDR), which is commonly assumed to behave as a long, flexible polymeric tail composed of hundreds of amino acids. Recent experimental findings indicate that the IDR of certain TFs plays a pivotal role in the search process. However, the principles underlying the IDR’s role remain unclear. Here, we reveal key design principles of the IDR related to TF binding affinity and search time. Our results demonstrate that the IDR significantly enhances both of these aspects. Furthermore, our model shows good agreement with experimental results, and we propose further experiments to validate the model’s predictions.

Introduction

Transcription factors (TFs) are fundamental proteins that regulate gene expression: they bind to specific DNA sequences to control gene transcription Latchman, 1997; they respond adaptively to various cellular signals, thereby modulating gene expression Mitchell and Tjian, 1989; Ptashne and Gann, 1997; they guide processes such as cell differentiation and development, and their malfunctions are often linked to disease onset (Lambert et al., 2018; Stadhouders et al., 2019). The principles of gene regulation were first formulated in the 1960s through pioneering studies of the E. coli Lac operon (Jacob and Monod, 1961), establishing that the regulation mechanism is via the binding of a TF to the regulatory DNA regions associated with the genes, the TF’s target. Effective regulation requires TFs to have sufficient binding affinity and a binding rate comparable to cellular processes. Despite these decades of research, the factors determining the binding strength and search efficiency of eukaryotic TFs remain incompletely understood (Ferrie et al., 2022; Jonas et al., 2025). In eukaryotic cells, besides having a DNA-binding domain (DBD), approximately 80% of TFs also have one or more intrinsically disordered regions (IDRs), which lack a stable 3D structure and can be assumed as flexible, unstructured polymers, hundreds of amino acids (AAs) in length (Liu et al., 2006; Fuxreiter et al., 2011; Guo et al., 2011; Wright and Dyson, 2015). Eukaryotic DBDs often exhibit weaker affinity to their recognition sequences compared to their bacterial counterparts, indicating that IDRs are necessary for achieving stable binding (Garcia et al., 2021). However, IDRs are challenging to study because their functions can be maintained despite rapid sequence divergence, complicating traditional sequence-based analyses (Brown et al., 2010; Zarin et al., 2019). Molecular dynamics simulations suggest that IDRs play a significant role in promoting target recognition (Vuzman and Levy, 2010), by increasing the area of effective interaction between the TF and the DNA. Upon binding, the search problem becomes effectively a one-dimensional diffusion process (Vuzman and Levy, 2012). Recent in vivo experiments done in yeast also suggest that IDRs could take a key part in guiding TFs to the DNA regions containing their targets (Brodsky et al., 2020; Kumar et al., 2023). Truncation studies of IDRs in various TFs have demonstrated that removing these regions often results in reduced binding affinity, suggesting they contribute significantly to stabilizing TF-DNA interactions (Brodsky et al., 2020). Different mechanisms have been proposed in the literature to explain IDR contributions: some suggest non-specific electrostatic interactions (Iwahara et al., 2006), others propose phase separation that concentrates TFs near their targets (Boija et al., 2018), while a third view suggests specific IDR-DNA interactions analogous to those of structured domains (Chappleboim et al., 2024).

The problem of TF search has been widely studied, theoretically, albeit with the majority of works focusing on parameter regimes relevant for bacteria. One important mechanism is that of facilitated diffusion, in which the TF performs 1D diffusion along the DNA and at any given moment can fall off and perform 3D diffusion until reattaching to the DNA (a 3D excursion). The 1D diffusion along the DNA and the 3D excursions alternate until the TF finds its target. This mechanism is well characterized theoretically (Berg et al., 1981; Berg and von Hippel, 1987; Slutsky and Mirny, 2004; Mirny et al., 2009; Li et al., 2009; Bénichou et al., 2011; Sheinman et al., 2012) and supported by experiments in E. coli (Hammar et al., 2012). However, due to the significant structural differences, including larger genome size and chromatin structure, the facilitated diffusion mechanism cannot explain the fast search times observed in eukaryotes, and a qualitatively different mechanism is required (Dror et al., 2016; Jana et al., 2021; Ferrie et al., 2022; Kumar et al., 2023; Wagh et al., 2023).

Here, we attempt to address the fundamental question of how IDRs can enhance the binding affinity of the TFs and speed up their search process and to discover the associated design principles in eukaryotes. The problem of binding affinity (binding probability) exhibits the classical trade-off between energy and entropy: on one hand, the binding of IDR to the DNA is energetically favorable. On the other hand, this binding leads to constraints on the positions of the IDR (pinned to the DNA at the points of attachment), which leads to a reduction in the entropy associated with the possible polymer conformations – hence an increase in the free energy of the system. Therefore, the effects of the disordered tail on affinity are non-trivial, leading to physical problems reminiscent of those associated with the well-studied problem of DNA unzipping (Danilowicz et al., 2004). We find that the binding probability increases dramatically for TFs having IDRs for a broad parameter regime. The optimal search process consists of one round of 3D-1D search, where at first the TF performs 3D diffusion and then binds the antenna and performs 1D diffusion until finding its target, rarely unbinding for another 3D excursion. This process is presented in Figure 1a and c. We obtain an expression for the search time that shows good agreement with our numerical results.

Figure 1

Download asset Open asset

Model illustration of a TF with the IDR and its search process.

(a) Illustration of a TF locating its target site (highlighted in orange) on the DNA strand (depicted as a green curve). Surrounding the target is a region of length $L$ , where the TF’s IDR interacts with the DNA. The TF performs 3D diffusion until it encounters and binds to this ‘antenna’ region. (b) Illustration of a TF that includes an IDR composed of AAs. On the right: a model of the TF, a polymer chain, comprising a single binding site on the DNA Binding Domain (DBD), and multiple binding sites on the IDR (also known as short linear motifs Jonas et al., 2025). (c) Once bound to the antenna, the TF performs effective 1D diffusion until it reaches its target. The 1D diffusion is via the binding and unbinding of sites along the IDR, a process we coin ‘octopusing’. $V_{1}$ is the targets’ volume, $d$ the separation, and $E_{B}$ the binding energy per binding site, where each site corresponds to a short linear motif.

Results

Binding probability of the TF from the equilibrium approach

The experiment described in Brodsky et al., 2020 studied the binding affinity of wild-type and mutated TFs to their corresponding targets. They found that shortening the IDR length results in weaker affinity. To qualitatively model the binding affinity of a TF to its target, we utilize a well-known physical model of polymers, the Rouse Model (Prince, 1953; Doi and Edwards, 1988; used in the subsequent dynamical section as well). See Materials and methods. We note that our conclusions do not hinge on this choice, and our results hold also when using other polymer models (see Appendix Appendix 1). Broad applicability of probability density $f (r, l)$ . Since the IDR is thought to recognize a relatively short flanking DNA sequence of the DBD target (Brodsky et al., 2020), we model a DNA segment as a straight-line ‘antenna’ region (Shimamoto, 1999; Hu and Shklovskii, 2007; Mirny et al., 2009; Castellanos et al., 2020), where the IDR targets reside. We assume that the interaction between TF and DNA outside the antenna region is negligible. Note that the interaction between IDR and DNA is a specific interaction, with genomic localization determined by multivalent interactions mediated by hydrophobic residues (Jonas et al., 2023; Mindel et al., 2024).

To gain an intuition into the binding probability of TF, we first study a TF with a single IDR site. From here on, energies will be measured in units of $k_{B} T$ , to simplify our notation. We assume that the IDR has only one corresponding target. There are four possible states: neither the DBD nor the IDR is bound, the IDR is bound, the DBD is bound, and both are bound, which correspond to $P_{f r e e}$ , $P_{I D R}$ , $P_{D B D}$ , and $P_{b o t h}$ , respectively. The probability that the TF is bound at the DBD target is the sum of $P_{D B D}$ and $P_{b o t h}$ . In thermal equilibrium, each of the 4 terms is given by the Boltzmann distribution, which can be solved analytically by integrating out the additional degrees of freedom associated with the polymer (Appendix 2. Derivation of the binding probability). To better understand the associated design principles, it will be helpful to compare this probability with that of a ‘simple’ TF (i.e. without the IDR), which we denote by $P_{s i m p l e}$ . We find that:

\frac{P_{T F}}{P_{s i m p l e}} \approx (1 + P),

where $P$ is an enhancement factor, governing the degree to which the IDR improved the binding affinity. Importantly, we find that:

P (E_{B}, d, l_{0}) = e^{E_{B}} V_{1} f (d, l_{0}),

where $f (d, l_{0}) = {(\frac{3}{2 π l_{0}^{2}})}^{3 / 2} \exp (- \frac{3 d^{2}}{2 l_{0}^{2}})$ , is the probability density, with units of inverse volume, $V_{1}$ is the targets’ volume, on the order of ${1 - 10 b p}^{3}$ , as shown in Figure 1c, $- E_{B}$ is the binding energy for per IDR target ( $E_{B}$ is always positive), $d$ is the distance between the DBD target and the IDR target, and $l_{0}$ is the distance between the DBD and the IDR binding site on the TF ( $l_{0}$ and $d$ are also the distances for neighboring IDR targets and neighboring IDR sites, respectively, in the case of multiple IDR targets and sites).

This shows that the presence of the IDR always increases the binding affinity since $P$ is always positive. However, the magnitude of enhancement can be negligible: For instance, even if $E_{B}$ is large, the term $f (d, l_{0})$ , associated with the entropic penalty (because the end-to-end distance is fixed, reducing the degrees of freedom compared to a free polymer), could lead to small values of $P$ .

The exponential decay of $P$ with $d^{2}$ (through the term $f (d, l_{0})$ ), indicates that placing the IDR targets close to the DBD target is preferable. However, due to physical limitations, $d$ cannot be arbitrarily small and in reality will have a lower bound. Once $d$ is fixed, $P$ is maximal at $l_{0} = d$ , as can be gleaned from Equation 2. This condition minimizes the free energy penalty associated with the entropic contribution of the term $f (d, l_{0})$ .

As we shall see, this design principle is also intact when multiple IDR sites and targets are involved, which is the scenario we discuss next.

A priori, it is unclear whether having more binding sites ( $n_{b} > 1$ ) on the IDR or more targets ( $n_{t} > 1$ ) on the DNA will be beneficial or not. We therefore sought to investigate how the binding affinity depends on these parameters, as well as the binding energy and positions of the targets and binding sites. This generalization requires taking into account all the possible number of bindings, $n$ (ranging from 0 to $min (n_{b}, n_{t})$ ), and all possibilities for binding orientations for each $n$ , $(\binom{n_{b}}{n}) (\binom{n_{t}}{n}) n!$ distinct configurations in total. For each of them, we may analytically integrate out the polymer degrees of freedom and calculate explicitly the contribution to the partition function (Appendix 2. Derivation of the binding probability). Summing these terms together, numerically, allows us to explore how the different parameters affect the binding specificity, which we describe next. To characterize the binding affinity, we resort again to the ratio $Q \equiv P_{T F} / P_{s i m p l e}$ , measuring the effectiveness of the binding in comparison to the case with no IDRs. The length of TFs in our considerations, which varies with $n_{b}$ , is comparable to that in experiments (Brodsky et al., 2020).

We find the design principle: $l_{0} = d$ , identical to what we showed for the case of a single IDR binding site. Specifically, $Q$ is maximal at $l_{0} \sim d$ for all combinations of $n_{b}$ and $n_{t}$ . Figure 2a displays this principle for several combinations of $n_{b}$ and $n_{t}$ . All combinations are provided in Appendix 3—figure 1. This is plausible since for the IDR to be bound to the DNA at multiple places without paying a large entropic penalty, the distance between target sites should be compatible with the distance between binding sites.

Figure 2

Download asset Open asset

Design principles across various parameters: $l_{0} / d$ , $n_{b}$ , $n_{t}$ , and $E_{B}$ .

(a) Plots of $Q \equiv P_{T F} / P_{s i m p l e}$ varying with $l_{0} / d$ are shown for several combinations of $n_{t}$ and $n_{b}$ for typical values of the relevant quantities: the nucleus volume is $1 {μ m}^{3}$ , $E_{D B D} = 15$ , $E_{B} = 10$ , $V_{1} = (0.34 n m)^{3}$ , and $d = \sqrt{50} \cdot 0.34 n m$ . (b) A heatmap of $Q$ vs. $n_{b}$ and $n_{t}$ at $E_{B} = 10$ and $d = l_{0}$ . (c) Contour lines at different $E_{B}$ at $P_{T F} = 0.9$ . Blue hexagrams represent the design principle $n_{b}^{*} = n_{t}^{*}$ , with the black dashed line as a visual guide. The inset shows the dependence of $n_{b}^{*}$ on energy. (d) $Q$ for varying $E_{B}$ at $d = l_{0}$ for several $n_{b} = n_{t}$ . The vertical line represents $E_{t h}$ as determined by Equation 4, the solid curves illustrate the approximation of $Q$ obtained from Equation 3, and the horizontal line indicates where $P_{T F} = 1$ .

We therefore set $l_{0} = d$ and search for other design principles. Figure 2b shows a heatmap of $Q$ varying with $n_{b}$ and $n_{t}$ . At large $n_{t}$ , as $n_{b}$ increases, the value of $P_{T F}$ exhibits an initial substantial increase, followed by a gradual rise until it reaches saturation (shown explicitly in Appendix 4—figure 1).

A similar behavior is also observed when varying the value of $n_{t}$ . (When $n_{t}$ exceeds $n_{b}$ , the contribution from the states where the IDR is bound but the DBD is unbound increases with increasing $n_{t}$ , which slightly lowers $P_{T F}$ ; see Appendix 4—figure 1.)

In fact, the heatmap shown in Figure 2b shows that the value of $P_{T F}$ is governed, approximately, by the minimum of $n_{b}$ and $n_{t}$ (this is reflected in the nearly symmetrical ‘L-shapes’ of the contours). This places strong constraints on the optimization and suggests that there will be little advantage in increasing the value of either $n_{b}$ or $n_{t}$ beyond the point $n_{b} = n_{t}$ , revealing another design principle (under the assumption that it is preferable to have smaller values of $n_{b}$ and $n_{t}$ for a given level of affinity, in order to minimize the cellular costs involved).

We verify this principle holds at additional values of $E_{B}$ . Figure 2c shows that the contours corresponding to $P_{T F} = 0.9$ at varying values of $E_{B}$ all take an approximate symmetrical L-shape, thus supporting the design principle $n_{b} = n_{t}$ . For a given $E_{B}$ , we denote the optimal values as $n_{b}^{*} = n_{t}^{*}$ , and show them as hexagrams in the plot.

We further find that $n_{b}^{*}$ decreases with increasing $E_{B}$ , as displayed in the inset. These relationships also hold at $P_{T F} = 0.5$ (Appendix 5. Robustness of the relationships: $n_{b}^{*} \sim n_{t}^{*}$ and $n_{b}^{*}$ decreases with increasing $E_{B}$ ).

We examine the variation of $Q$ with respect to $E_{B}$ while keeping $n_{b} = n_{t}$ constant. In Figure 2d, as the binding energy $E_{B}$ increases, $Q$ increases from a value of $\sim 1$ to its saturation value, $1 / P_{s i m p l e}$ , in a switch-like manner: that is, there is a characteristic energy scale, $E_{t h}$ . To estimate $E_{t h}$ , when $Q ≪ 1 / P_{s i m p l e}$ (i.e. $P_{T F} ≪ 1$ ), we approximate the expression of $Q$ as:

Q \approx 1 + Σ_{n = 1}^{n_{b}} Σ_{o r n} \prod_{i = 1}^{n} P (E_{B}, r_{o r n, i}, l_{o r n, i}),

where $l_{o r n, i}$ and $r_{o r n, i}$ denote the segment length between two adjacent bound sites and the distance associated with their respective targets, respectively, and ‘orn’ represents the orientation of the given configuration (Appendix 2. Derivation of the binding probability). Equation 3 reduces to Equation 1 when $n_{t} = n_{b} = 1$ . To substantially increase $Q$ , the configuration that contributes the most to $Q$ , would satisfy $(P (E_{B}, d, d))^{n_{b}} ≫ 1$ . Thus, the characteristic energy $E_{t h}$ could be estimated using the formula $P (E_{t h}, d, d) = 1$ , resulting in:

E_{t h} = \frac{3}{2} [\ln (\frac{2 π}{3 ϕ^{2}}) + 1],

where $ϕ = V_{1}^{1 / 3} / d$ represents the proportion of IDR targets within the antenna. Equation 4 is a direct consequence of the energy-entropy trade-off. Increasing $ϕ$ leads to a decrease in entropy loss due to binding the IDR, subsequently reducing the demands on binding energy. Given that $ϕ^{2} = 0.02$ , $E_{t h}$ is about $8.5 k_{B} T$ . The vertical dashed line in Figure 2d, obtained from Equation 4, effectively captures the region of significant increase for large values of $n_{b}$ . Hence, we obtain the third design principle: $E_{B} > E_{t h}$ , which ensures the significant binding advantage of TFs with IDRs.

So far, we have found that the binding design principles for TFs with IDRs are as follows: (i) target distance $d$ should be as small as possible, (ii) IDR segment binding distance $l_{0}$ should be comparable to $d$ , (iii) it is preferable to have fewer binding sites $n_{b}^{*}$ and binding targets $n_{t}^{*}$ as the binding strength $E_{B}$ increases, with $n_{b}^{*} \sim n_{t}^{*}$ , and (iv) should be larger than a threshold that solely depends on the proportion $ϕ$ of IDR targets within the antenna, as indicated by Equation 4. In our estimation, with $ϕ^{2} = 0.02$ , we obtain $E_{t h} \approx 8.5 k_{B} T$ . The above principles have also been validated in scenarios involving Poisson statistics of binding site distances and target distances, thus indicating their robust applicability (Appendix 6—figure 1).

Search time of the TF from the dynamics

In this section, we study the design principles of the TF search time, a dynamic process. Here, we confine the TF with $\tilde{n} \equiv n_{b} + 1$ sites ( $n_{b}$ IDR sites and one DBD site) to a sphere of radius $R$ , the cell nucleus, and use over-damped coarse-grained molecular dynamics (MD) simulations. The sites and targets interact via a short-range interaction. The DBD target with a radius $a$ is placed at the sphere’s center. See the sketch in Figure 1a. Once the DBD finds its target, the search ends. We denote the mean total search time (i.e. the mean first passage time, MFPT), as $t_{t o t a l}$ . We estimate it by averaging 500 simulations, each starting from an equilibrium configuration of the TF at a random position in the nucleus. Simulation details are provided in the Materials and Methods section.

By conducting comprehensive simulations of $t_{t o t a l}$ across various parameter values of $L$ , $E_{B}$ , and $\tilde{n}$ , we were able to identify its minimum value, which is approximately at $L^{*} = 300 b p$ , $E_{B}^{*} = 11$ , and ${\tilde{n}}^{*} = 4$ for $R = 500 b p \approx 1 / 6 μ m$ . Here, one base pair ( $b p$ ) is equal to $0.34 n m$ . To speed up the simulations, we utilized several computational tricks, including adaptive time steps and treating the TF as a point particle when it is far from the antenna (see Materials and methods). Despite this, the simulations require 10⁵ CPU hours due to the IDR trapped in the binding targets on the antenna, which results in the slow dynamics as found in the previous simulation works (Vuzman and Levy, 2010; Marcovitz and Levy, 2013). The probability distribution function (pdf) of individual search times exponentially decays with a typical time equal to $t_{t o t a l}$ , see inset of Figure 3. This indicates that $t_{t o t a l}$ provides a good characterization of the search time.

Figure 3

Download asset Open asset

Mean search time varying with the antenna length , $t_{3 D}$ and $t_{1 D}$ vs. at $E_{B}^{*}$ and ${\tilde{n}}^{*}$ .

The solid curves correspond to $t_{t o t a l}$ , $t_{3 D}$ and $t_{1 D}$ in Equation 5. Inset: probability density function (pdf) exponentially decays at large $t . p d f \approx e x p (- t / t_{t o t a l}) / t_{t o t a l}$ (black solid line). The unit $t_{0}$ represents the time for one AA to diffuse $1 b p$ . $a = 0.5 b p$ , $d = 10 b p$ , and $l_{0} = 5 b p$ .

We find that $t_{t o t a l}$ varies non-monotonically with $L$ at fixed $E_{B}^{*}$ and ${\tilde{n}}^{*}$ as shown in Figure 3. Once a TF attaches to one of its targets, it will most likely diffuse along the antenna until reaching the DBD target (Appendix 7. TF walking along the antenna once it attaches, and $t_{t o t a l}$ vs. $E_{B}$ and vs. $\tilde{n}$ ). It means that the optimal search process is a single round of 3D diffusion followed by 1D diffusion along the antenna. In Appendix 8, a single round of 3D-1D search is typically the case in the optimal search process., by analytically calculating the mean total search time using a coarse-grained model, we find that a single round of 3D-1D search is typically the case in the optimal search process. Note that our model differs profoundly from the conventional facilitated diffusion (FD) model in bacteria: In the latter, the TF is assumed to bind non-specifically to the DNA. Within our model, the antenna length $L$ can be tuned by changing the placement of the IDR targets on the DNA, making it an adjustable parameter rather than the entire genome length. As opposed to the FD model, which requires numerous rounds of DNA binding-unbinding to achieve minimal search time, within our model, the minimal search time is achieved for a single round of 3D and 1D search.

Interestingly, when we observe the motion of a TF along the antenna, IDR sites behave like ‘tentacles’, as illustrated in Figure 1c, constantly binding and unbinding from the DNA. We refer to this 1D walk as ‘octopusing’, inspired by the famous ‘reptation’ motion in entangled polymers suggested by de Gennes (de Gennes and Leger, 1982; see Video 1). We note that previous numerical studies Vuzman and Levy, 2010; Vuzman and Levy, 2012 used comprehensive, coarse-grained, MD simulations to demonstrate that including heterogeneously distributed binding sites on the IDR could facilitate intersegmental transfer between different regions of the DNA, which is reminiscent of our octopusing scenario.

Video 1

Download asset

posterframe for video — Video of the 1D octopusing walk simulation.

The mean 3D search time, $t_{3 D}$ , and the mean 1D search time, $t_{1 D}$ are also shown in Figure 3. To evaluate $t_{3 D}$ , we take advantage of several insights: first, if the TF gyration radius, $r_{p} = \sqrt{\tilde{n} / 6} l_{0}$ , is comparable or larger than the distance between targets, $d$ , we may consider the chain of targets as, effectively, a continuous antenna. Making the targets denser would hardly accelerate the search time, reminiscent of the well-known chemoreception problem where high efficiency is achieved even with low receptor coverage (Berg and Purcell, 1977). Second, when the distance between the TF and the antenna is smaller than $r_{p}$ , one of the binding sites on the TF would almost surely attach to the targets. This allows us to approximately map the problem to the MFPT of a particle hitting an extremely thin ellipsoid with long axis of length $L$ and radius $r_{p}$ , the solution of which is $\ln (L / r_{p}) / L$ (Berg, 2018). We find that this scaling captures well the dependence of $t_{3 D}$ on the parameters, with a prefactor $α \sim O (1)$ .

For the 1D search from a single particle picture, since the TF is unlikely to detach once it attaches, it performs a 1D random walk. So, taken together, the mean total search time is

t_{t o t a l} = t_{3 D} + t_{1 D} \approx α \frac{\tilde{n} R^{3}}{D L} \ln (L / r_{p}) + β \frac{L^{2}}{D},

where $D$ is the 3D diffusion coefficient of one site. For a TF with $\tilde{n}$ sites, its 3D diffusion coefficient, $D_{3}$ , is $D / \tilde{n}$ and $β$ are fitting parameters.

The semi-analytical formulas of $t_{1 D}$ , $t_{3 D}$ , and $t_{t o t a l}$ , denoted by the solid lines in Figure 3, agree well with the simulation results for $α = 0.5$ and $β = 14.5$ .

The theoretical result for the time spent in 3D was originally calculated for a point searcher (Berg, 2018). In that case, a value of $α = 2 / 3$ was obtained. $β$ is larger at larger $E_{B}$ (see Appendix 7. TF walking along the antenna once it attaches, and $t_{t o t a l}$ vs. $E_{B}$ and vs. $\tilde{n}$ ).

We also examine the dependence of $t_{t o t a l}$ on the key parameters $E_{B}$ and $\tilde{n}$ , in the vicinity of the optimal parameters which minimize the total search time (Appendix 7. TF walking along the antenna once it attaches, and $t_{t o t a l}$ vs. $E_{B}$ and vs. $\tilde{n}$ ). We find that the affinity of TFs to the antenna decreases rapidly with decreasing the IDR length (Brodsky et al., 2020) and $E_{B}$ . Consequently, when $\tilde{n} < {\tilde{n}}^{*}$ or $E_{B} < E_{B}^{*}$ , multiple search rounds are needed to reach the DBD target, resulting in a significant escalation of $t_{t o t a l}$ . When $\tilde{n} > {\tilde{n}}^{*}$ , $t_{t o t a l}$ , $t_{3 D}$ , and $t_{1 D}$ agree with Equation 5. When $E_{B} > E_{B}^{*}$ , since $t_{3 D}$ is independent of $E_{B}$ , the variation in $t_{t o S 7 t a l}$ stems from $t_{1 D}$ which exponentially increases with $E_{B}$ .

The minimal search time for a simple TF is $t_{s i m p l e} = R^{3} / (3 D a)$ (Berg, 2018).

For the TF with an IDR, by minimizing the expression in Equation 5, we find that the optimal length approximately scales as $L^{*} \propto R$ , and therefore the minimal search time, $t_{min}$ , is approximately proportional to $R^{2}$ .

At $R = 500 b p \approx 1 / 6 μ m$ , the search time is more than ten times shorter than a simple TF, despite the latter’s diffusion coefficient being several times larger. Based on our analytic result, we expect $\sim 50$ fold enhancement for the experimental parameters of yeast note that we assume the scaling $t_{min} \sim R^{2}$ still holds in this regime; this formula is approximate since it does not take into account the persistence length of DNA (Guilbaud et al., 2019), which is several-fold shorter than the optimal antenna length $L^{*}$ . We have verified that reducing the DNA persistence length, which promotes increased DNA coiling, results in only a modest increase in mean search time. Even under extreme coiling conditions, the increase remains below 30% of the baseline value, as detailed in Appendix 9. Robustness of mean total search time with changing the antenna geometry. Converting $t_{min}$ into a second-order on-rate yields approximately $10^{8} - 10^{9} M^{- 1} s^{- 1}$ , assuming $D \approx 10^{- 13} m^{2} / s$ and a TF copy number of 100. We also verified that the 3D diffusion coefficient slightly changes under a complex DNA configuration, where a point-searcher TF non-specifically interacts with the entire DNA, indicating the robustness of our optimal search process (see Appendix 10. MSD in a complex DNA configuration).

Next, we characterize the dynamics of the octopusing walk (Appendix 11. Properties of the octopusing walk). During this process, different IDR binding sites constantly bind and unbind from the antenna, leading to an effective 1D diffusion. A priori, one may think that the TF faces conflicting demands: it has to rapidly diffuse along the DNA while also maintaining strong enough binding to prevent detaching from the DNA too soon. Indeed, within the context of TF search in bacteria, this is known as the speed-stability paradox (Slutsky and Mirny, 2004). Importantly, the simple picture of the octopusing walk explains an elegant mechanism to bypass this paradox, achieving a high diffusion coefficient with a low rate of detaching from the DNA. This comes about due to the compensatory nature of the dynamics: there is, in effect, a feedback mechanism whereby if only a few sites are bound to the DNA, the on-rate increases, and similarly, if too many are bound (thus prohibiting the 1D diffusion) the off-rate increases. Remarkably, this mechanism remains highly effective even when the typical number of binding sites is as low as two. Further details are provided in Appendix 11. Properties of the octopusing walk.

Comparing with experimental results

In Brodsky et al., 2020, the binding probability of the TF Msn2 in yeast was measured, where the native IDR was shortened by truncating an increasingly large portion of it. The results indicate that binding probability decreases with increasing truncation of the IDR, as illustrated in Figure 4a. We utilized our model to calculate the relative binding probability compared to $P_{T F}$ at a zero truncating length. Using the DBD binding energy $E_{D B D}$ as our only fitting parameter, our model achieves quantitative agreement with the experimental data. In Figure 4b, we estimate the search time as a function of $L$ for $R = 1 μ m$ and $D_{0} = 10^{- 10} m^{2} / s$ for a single AA. The estimated timescales are comparable to empirical results, which are 1 to 10 seconds (Kumar et al., 2023; Larson et al., 2011), as indicated by the shaded area. We have also estimated the on-rate $k_{o n}$ and off-rate $k_{o f f}$ of the TF as they vary with IDR length, as shown in Figure 4c. In the kinetic proofreading mechanism (Hopfield, 1974), specificity is associated with the off-rate, but recent studies (Marklund et al., 2022) suggest that in other scenarios, specificity is encoded in the on-rate. Therefore, it is important to explore how specificity is encoded within this mechanism. Our results indicate that the variation in the dissociation constant $k_{o f f} / k_{o n}$ is dominated by $k_{o f f}$ , suggesting that specificity is encoded in the off-rate. The magnitudes and variations of these rates could be directly compared to experimental results in future studies.

Figure 4

Download asset Open asset

Comparisons with experimental results.

(a) Relative binding probability varies with the truncation length of the IDR, which quantitatively agrees with the experimental data in Brodsky et al., 2020, where we only vary $E_{D B D}$ (binding energy of DBD) to ensure that the maximal value reaches 1 at zero truncation length. Antenna length $L = 1000 n m$ , $E_{D B D} = 22$ and $E_{B} = 11$ . Other parameters are set the same as in Figure 2. (b) The search time estimated by our model (from Equation 5 and divided by a TF copy number of 100) is quantitatively comparable with the experimental results (Larson et al., 2011), as guided by the shaded area. c, The on-rate $k_{o n} = t_{t o t a l} / V_{c}$ and off-rate $k_{o f f} = (1 - P_{T F}) / (P_{T F} t_{t o t a l})$ varying with the IDR length, indicating that $k_{D} \equiv k_{o f f} / k_{o n}$ ranges from $0.01$ to 10 nM, with its variation primarily dominated by the off-rate.

© 2020, Elsevier. Panel A was reprinted with permission from Brodsky et al., 2020 with permission from Elsevier. It is not covered by the CC-BY 4.0 license and further reproduction of this panel would need permission from the copyright holder.

Discussion

In summary, we addressed the design principles of TFs containing IDRs, with respect to binding affinity and search time. We find that both characteristics are significantly enhanced compared to simple TFs lacking IDRs.

The simplified model we presented provides general design principles for the search process and testable predictions. One possible experiment that could shed light on one of our key assumptions is ‘scrambling’ the DNA (randomizing the order of nucleotides). Our results depend greatly on our assumption that there exists a region on the DNA, close to the DBD target, that contains targets for the IDR binding sites. It is possible to shuffle, or even remove, regions of different lengths around the DBD target and to measure the resulting changes in binding affinity. According to our results, we would expect that shuffling a region smaller in length than the IDR should have minimal effect on the binding affinity and search time, but that shuffling a region longer than the IDR should have a noticeable effect.

Another useful test of our model would be to probe the dynamics of the TF search process by varying the waiting time before TF-DNA interactions are measured: Within the ChEC-seq technique (Zentner et al., 2015) employed in Brodsky et al., 2020, an enzyme capable of cutting the DNA is attached to the TF and activated at will (using an external calcium signal). Sequencing then informs us of the places on the DNA at which the TF was bound at the point of activation. Measuring the binding probabilities at various waiting times — defined as the time between adding calcium ions and adding transcription factors, ranging from seconds to an hour, would inform us both of the TF search dynamics as well as the equilibrium properties: Shorter time scales would capture the dynamics of the search process, while longer time scales would reveal the equilibrium properties of the binding. At short waiting times (within several seconds), we would expect the binding probability to increase with waiting times. By analyzing a normalized binding probability curve as a function of the waiting time $t$ , we could fit the curve using the function $1 - \exp (- t / t_{s e a r c h})$ . From this curve, we can infer the search time. Our results suggest that the presence of the IDR in a TF leads to increased binding probabilities over various waiting times and a reduced waiting time to reach a higher saturation binding probability. Furthermore, as shown in Figure 4 obtained from our model, $k_{o n}$ can be determined from the measurements of $t_{s e a r c h}$ . Using the binding probability, which is associated with $k_{D} = k_{o f f} / k_{o n}$ , $k_{o f f}$ can also be calculated.

While our work primarily focused on coarse-grained modeling of the search process and the hypothesized octopusing dynamics, higher-resolution molecular dynamics simulations could provide more detailed molecular insights and further validate our proposed design principles (Vuzman and Levy, 2010; Vuzman and Levy, 2012). Since these principles are grounded in fundamental physical concepts, such as the energy-entropy trade-off, they are likely to be robust across diverse molecular systems. We anticipate their applicability to other biophysical contexts, such as nonspecific RNA polymerase binding (Tenenbaum et al., 2023) and multivalent antibody binding (Einav et al., 2019), where the associated dynamic processes would also be of significant interest for future research. Beyond our octopusing mechanism, other mechanisms may also contribute to the search and binding process. IDRs could facilitate complex formation and cooperativity between multiple transcription factors, as suggested by studies showing co-localization at common target promoters (Morgunova and Taipale, 2017). IDRs might also enhance efficiency by directing transcription factors to specific nuclear compartments or biomolecular condensates, thereby reducing the effective search space and time (Sabari et al., 2018; Boija et al., 2018; Kent et al., 2020). These complementary mechanisms (Ferrie et al., 2022; Jonas et al., 2025; Holehouse and Kragelund, 2024) suggest that our findings represent one component of a broader regulatory framework governing transcriptional control in eukaryotic cells.

Materials and methods

Model

Request a detailed protocol

The Rouse Model describes a polymer as a chain of point masses connected by springs and characterized by a statistical segment length $l_{0}$ resulting from the rigidity of springs $k$ and the thermal noise level $k_{B} T$ , $l_{0}^{2} = 3 k_{B} T / k$ (Prince, 1953; Doi and Edwards, 1988). We coarse-grain the TF by representing it with a few binding sites for the IDR, and an additional site for the DBD, as illustrated in Figure 1b. The targets are assumed to be positioned at equal spacing, with one target for the DBD and a few targets for the IDR, corresponding to the orange and black squares in Figure 1c. We have also verified that relaxing this assumption, and using randomly positioned targets, does not affect our results (Appendix 6. Robustness of the binding design principles). In the Rouse Model, the 3D probability density describing the relative distance in the equilibrium state, denoted as $r$ , between any two sites can be derived through Gaussian integration (Doi and Edwards, 1988), leading to:

f (r, l) = {(\frac{3}{2 π l^{2}})}^{\frac{3}{2}} \exp (- \frac{3 r^{2}}{2 l^{2}}),

where $l = \sqrt{n_{s e g}} l_{0}$ and $n_{s e g}$ is the number of segments between the two sites.

Dynamical simulations

Request a detailed protocol

Each binding site of the TF at position $r_{i}$ follows over-damped dynamics, which reads:

Δ r_{i} = \frac{D}{k_{B} T} f ({r_{i}}) Δ t + \sqrt{2 D Δ t} ξ,

where the random variable $ξ$ follows the standard normal distribution. $D$ is the diffusion coefficient of the coarse-grained binding site. The force acting on each site is $f ({r_{i}}) \equiv - \partial U_{t o t a l} / \partial r_{i}$ , and

U_{t o t a l} = \sum_{i = 0}^{n_{b} - 1} \frac{1}{2} k (r_{i + 1} - r_{i})^{2} + \sum_{i, j} E_{B} \exp [- \frac{{(r_{i} - R_{j})}^{2}}{2 w_{d}^{2}}],

where $R_{j}$ are the positions of the targets. $w_{d} = 1 b p$ is the typical width of the short-range interaction between sites and their corresponding targets. The typical distance $l_{0}$ between coarse-grained binding sites is $5 b p$ . The stiffness of the springs is thus $3 k_{B} T / l_{0}^{2} = 3 k_{B} T / 25 b p^{2}$ . To speed up the simulations, when the sites’ minimal distance to the antenna is larger than the threshold $10 b p$ , the TF is regarded as a rigid body, and all sites move in sync with their center of mass. We verified that the resulting search time is insensitive to the value of this threshold. When the sites’ minimal distance to the antenna is less than $10 b p$ , we consider the excluded volume effect to ensure that different binding sites are not trapped at the same target. Specifically, we use the WCA interaction (Weeks et al., 1971):

U_{W C A} = {\begin{cases} \frac{σ^{12}}{{| r_{i} - r_{j} |}^{12}} - \frac{2 σ^{6}}{{| r_{i} - r_{j} |}^{6}} + 1, & | r_{i} - r_{j} | < σ \\ 0, & | r_{i} - r_{j} | \geq σ \end{cases}

where $σ = 1 b p$ . In this region, we also adopted an adaptive time step: $Δ t / (b p^{2} / D) = min {1, \frac{k_{B} T}{max | f_{i} | b p}} \times 0.01$ . Note that this choice ensures that the displacement of the sites at every step is much smaller than the binding site size $w_{d}$ . Due to the IDR trapped in the binding targets on the antenna, which results in slow dynamics, the simulation time becomes significantly longer, as found in previous simulation studies (Vuzman and Levy, 2010; Marcovitz and Levy, 2013). This slowdown is similar to the behavior observed in glass-forming liquids (Berthier and Biroli, 2011). As a result, the individual simulation time steps range from 10⁹ to 10¹¹.

Appendix 1

Broad applicability of probability density $f (r, l)$

We verify the broad applicability of Equation 6 in the main text, regardless of the specific details of microscopic interactions. To demonstrate this, we only require that the density function of the end-to-end distance $r$ for a segment containing $m$ amino acids (AAs), denoted as $f (r, l_{0})$ , complies with Equation 6. Specifically, it is.

f (r, l_{0}) \approx {(\frac{3}{2 π l_{0}^{2}})}^{3 / 2} \exp (- \frac{3 r^{2}}{2 l_{0}^{2}}),

where $l_{0}$ is the typical length of the segment. When $m$ is large, according to the central limit theorem, $r$ should conform to the distribution in Equation 10 with a variance of $l_{0}^{2} = m a_{e f f}^{2}$ , where $a_{e f f}$ represents the effective length of an individual AA.

Next, we will consider a specific form of interaction, of springs with a finite rest length. In one limit (where the rest length vanishes), this corresponds to the Rouse model (where Equation 10 is exact for any value of $m$ , not only asymptotically) and in another limit (where the springs are very stiff) this corresponds to a chain of rods. As we shall see, Equation 10 provides an excellent approximation already for moderate values of $m$ .

To proceed, we consider two adjacent AAs at positions ${\vec{r}}_{i}$ and ${\vec{r}}_{i + 1}$ with an interaction denoted by $u (| {\vec{r}}_{i} - {\vec{r}}_{i + 1} |)$ . Their probability density is $ρ ({\vec{r}}_{i}, {\vec{r}}_{i + 1}) \propto \exp (- u (| {\vec{r}}_{i} - {\vec{r}}_{i + 1} |) / (k_{B} T))$ . $a_{e f f}^{2}$ can be calculated using $ρ ({\vec{r}}_{i}, {\vec{r}}_{i + 1})$ :

a_{e f f}^{2} = \int d \vec{r} \int d {\vec{r}}_{i} \int d {\vec{r}}_{i + 1} | {\vec{r}}_{i} - {\vec{r}}_{i + 1} |^{2} ρ ({\vec{r}}_{i}, {\vec{r}}_{i + 1}) δ (\vec{r} - ({\vec{r}}_{i} - {\vec{r}}_{i + 1})) .

Specifically, for a spring-like interaction with a stiffness constant $k_{0}$ and a rest length $b_{0}$ , the interaction is given by $u = \frac{1}{2} k_{0} {(| {\vec{r}}_{i} - {\vec{r}}_{i + 1} | - b_{0})}^{2}$ . This formula can also be interpreted as a harmonic approximation of the interaction around the equilibrium distance $b_{0}$ . In this case, $a_{e f f}^{2}$ is derived analytically:

a_{e f f}^{2} = a_{0}^{2} + b_{0}^{2} + \frac{2 b_{0}^{2} a_{0}^{2}}{a_{0}^{2} + 3 b_{0}^{2}} (1 + \frac{2 a_{0}^{3}}{6 b_{0}^{2} a_{0} + \sqrt{6 π} b_{0} (a_{0}^{2} + 3 b_{0}^{2}) (1 + {\rm Erf} (\sqrt{\frac{3}{2}} \frac{b_{0}}{a_{0}})) \exp (\frac{3 b_{0}^{2}}{2 a_{0}^{2}})}),

where $a_{0}^{2} \equiv 3 k_{0} / k_{B} T$ and $E r f (z) = \frac{2}{\sqrt{π}} \int_{0}^{z} \exp (- x^{2} / 2) d x$ is the error function. When $b_{0} = 0$ , we have $a_{e f f} = a_{0}$ , and Equation 10holds for any $m$ . When $b_{0} ≫ a_{0}$ , then $a_{e f f} \approx b_{0}$ corresponds to a chain of rods where the angle between consecutive rods is free to rotate.

Appendix 1—figure 1a shows that the numerical results obtained from this harmonic interaction are in good agreement with Equation 10 even at moderate values of $m$ . The inset displays the variation of $a_{e f f}^{2} / a_{0}^{2}$ with $b_{0} / a_{0}$ obtained from Equation 12. In Appendix 1—figure 1b, with $b_{0} = a_{0}$ , $f (r, l_{0})$ rapidly converges to Equation 10.

Appendix 1—figure 1

Download asset Open asset

$f (r, l_{0})$ aligns well with Equation 10 (solid curves) for moderate $m$ at various.

$b_{0}$ Inset: $a_{e f f}^{2} / a_{0}^{2}$ vs. $b_{0} / a_{0}$ from Equation 12. (b) Equation 10 serves as a good approximation (solid lines) even at small $m$ .

Appendix 2

Derivation of the binding probability

In this section, we derive the generalized binding probability $P_{T F}$ for $n_{b}$ IDR binding sites and $n_{t}$ IDR targets. To achieve this generalization, we consider all the possible number of bindings, $n$ from 0 to $min (n_{b}, n_{t})$ , and all possibilities for binding orientations for each $n$ . Note that the energy and entropy trade-off is naturally included in our model, and the generalized formula of $P_{T F}$ is also applicable for non-uniformly distributed binding targets and binding sites.

Based on the Rouse model, the system’s energy can be expressed as $\frac{k}{2} \sum_{i = 1}^{n_{b}} ({\vec{r}}_{i} - {\vec{r}}_{i - 1})^{2} = \frac{3 k_{B} T}{2 l_{0}^{2}} \sum_{i = 1}^{n_{b}} ({\vec{r}}_{i} - {\vec{r}}_{i - 1})^{2}$ , where ${\vec{r}}_{i}$ represents the position of the binding site $i$ . The probability density in the absence of binding reads:

ρ ({\vec{r}}_{0}, . . ., {\vec{r}}_{n_{b}}) = \frac{1}{V_{c}} {(\frac{3}{2 π l_{0}^{2}})}^{\frac{3}{2} n_{b}} e^{- \frac{3}{2 l_{0}^{2}} \sum_{i = 1}^{n_{b}} ({\vec{r}}_{i} - {\vec{r}}_{i - 1})^{2}},

where $\int d^{3} {\vec{r}}_{0} . . . \int d^{3} {\vec{r}}_{n_{b}} ρ ({\vec{r}}_{0}, . . ., {\vec{r}}_{n_{b}}) = 1$ .

Now we consider a given configuration with $n$ IDR sites bound, where IDR sites $q_{1} < . . . < q_{n}$ are bound to targets $s_{1}, . . ., s_{n} .$ The positions of these bound targets are at ${\vec{R}}_{s_{1}}, . . ., {\vec{R}}_{s_{n}}$ . The DBD site is denoted by $q_{0} = 0$ and its target is denoted by $s_{0} = 0$ . There are $n_{b} - n$ unbound IDR sites, indicated by ${\bar{q}}_{1} < . . . < {\bar{q}}_{n_{b} - n}$ . The corresponding binding probability is given by:

P_{bound} (orn, n) = \frac{1}{Z} \prod_{i = 0}^{n} \int_{{\vec{r}}_{q_{i}} \in V_{1}} d^{3} {\vec{r}}_{q_{i}} \prod_{ℓ = 1}^{n_{b} - n} \int_{{\vec{r}}_{{\bar{q}}_{ℓ}} \notin V_{1}} d^{3} {\vec{r}}_{{\bar{q}}_{ℓ}} ρ ({\vec{r}}_{0}, \dots, {\vec{r}}_{n_{b}}) e^{E_{DBD} + n E_{B}}

where ‘ $o r n$ ’ stands for the orientation of the given configuration, $Z$ is the normalization factor, and $E_{D B D}$ is the DBD site binding energy for its target. There are $(\binom{n_{b}}{n}) (\binom{n_{t}}{n}) n!$ possible binding orientations at $n$ . The key insight in evaluating this integral is to realize that we can use the result of Equation 6 of the main text to integrate out the degrees of freedom associated with the sites between $q_{i - 1}$ and $q_{i}$ :

\prod_{{\bar{q}}_{ℓ} = q_{i - 1} + 1}^{q_{i} - 1} [\int d^{3} {\vec{r}}_{{\bar{q}}_{ℓ}} {(\frac{3}{2 π l_{0}^{2}})}^{3 / 2} \exp (- \frac{3 ({\vec{r}}_{{\bar{q}}_{ℓ}} - {\vec{r}}_{{\bar{q}}_{ℓ} - 1})^{2}}{2 l_{0}^{2}})] = f (| {\vec{r}}_{q_{i}} - {\vec{r}}_{q_{i - 1}} |, \sqrt{q_{i} - q_{i - 1}} l_{0}) .

Since $r_{q_{i - 1}}$ and $r_{q_{i}}$ are confined to a very small volume $V_{1}$ , to an excellent approximation we may simply evaluate $f (| {\vec{r}}_{q_{i}} - {\vec{r}}_{q_{i - 1}} |, \sqrt{q_{i} - q_{i - 1}} l_{0})$ in Equation 15 at ${\vec{r}}_{q_{i - 1}} = {\vec{R}}_{s_{i - 1}}$ and ${\vec{r}}_{q_{i}} = {\vec{R}}_{s_{i}}$ , to obtain:

P_{b o u n d} (o r n, n) ≃ \frac{1}{Z} \frac{V_{1}^{n + 1}}{V_{c}} \prod_{i = 1}^{n} f (| s_{i} - s_{i - 1} | d, \sqrt{q_{i} - q_{i - 1}} l_{0}) e^{E_{D B D} + n E_{B}} = \frac{1}{Z} r_{D B D} \prod_{i = 1}^{n} P (E_{B}, r_{o r n, i}, l_{o r n, i}),

where $r_{D B D} \equiv \frac{V_{1}}{V_{c}} e^{E_{D B D}}$ , $r_{o r n, i} = | s_{i} - s_{i - 1} | d$ , $l_{o r n, i} = \sqrt{q_{i} - q_{i - 1}} l_{0}$ , and $P (E_{B}, r_{o r n, i}) = e^{E_{B}} V_{1} f (r_{o r n, i}, l_{o r n, i})$ .

To give a specific example, consider the binding topology corresponding to Appendix 2—figure 1. In this case, $n = 3$ , $l_{o r n, 1} = l_{0}$ (since there is one spring between the DBD and the first binding site), $l_{o r n, 2} = \sqrt{3} l_{0}$ (since there are three springs between $q_{1}$ and $q_{2}$ ) and similarly $l_{o r n, 3} = l_{0}$ . Equation 15 implies that the contribution of this diagram to the binding probability is: $\frac{1}{Z} \frac{V_{1}^{4}}{V_{c}} \frac{27}{{(2 π l_{0}^{2})}^{9 / 2}} e^{3 E_{B} + E_{D B D} - 5 d^{2} / l_{0}^{2}}$ . Note that while in this example, the order of the target sites is monotonic (i.e. as we increase the index of the binding sites, we progressively move further away from the DBD), in principle, we should sum over all diagrams, including those where the order is non-monotonic. At $n = 0$ , $P_{b o u n d} (o r n, 0) = \frac{1}{Z} r_{D B D}$ .

Appendix 2—figure 1

Download asset Open asset

A diagram of the bound configurations at $n = 3$ .

Similarly, for the state where the DBD is unbound and $n \geq 2$ IDR sites are bound, the binding probability reads:

\begin{aligned} P_{u n b o u n d} (o r n, n) = & \frac{1}{Z} \int_{{\vec{r}}_{0} \notin V_{1}} d^{3} {\vec{r}}_{0} \prod_{ℓ = 1}^{n} \int_{{\vec{r}}_{q_{ℓ}} \in V_{1}} d^{3} {\vec{r}}_{q_{ℓ}} \prod_{ℓ = 1}^{n_{b} - n} \int_{{\vec{r}}_{{\bar{q}}_{ℓ}} \notin V_{1}} d^{3} {\vec{r}}_{{\bar{q}}_{ℓ}} ρ ({\vec{r}}_{0}, . . ., {\vec{r}}_{n_{b}}) e^{n E_{B}} \end{aligned}

\begin{aligned} ≃ & \frac{1}{Z} r_{I D R} \prod_{i = 2}^{n} P (E_{B}, r_{o r n, i}, l_{o r n, i}), \end{aligned}

where $r_{I D R} \equiv \frac{V_{1}}{V_{c}} e^{E_{B}}$ . For $n = 1$ , $P_{u n b o u n d} (o r n, 1) = \frac{n_{t} n_{b}}{Z} r_{I D R}$ . Here $n_{t} n_{b}$ represents the number of possibilities for one of the $n_{b}$ IDR sites bound to one of the $n_{t}$ targets. For $n = 0$ , $P_{u n b o u n d} (o r n, 0) = \frac{1}{Z} \prod_{i = 0}^{n_{b}} \int_{{\vec{r}}_{i} \notin V_{1}} d^{3} {\vec{r}}_{i} ρ ({\vec{r}}_{0}, . . ., {\vec{r}}_{n_{b}}) ≃ \frac{1}{Z}$ .

Since the sum of all the possibilities is equal to 1, as given by: $\sum_{n = 0}^{n_{min}} \sum_{o r n} (P_{b o u n d} + P_{u n b o u n d}) = 1$ , where $n_{min} = min {n_{b}, n_{t}}$ , we can determine the normalization factor. Therefore, the binding probability $P_{T F}$ for the DBD bound in its target can be expressed as:

P_{T F} = \frac{r_{D B D} (1 + \sum_{n = 1}^{n_{min}} \sum_{o r n} \prod_{i = 1}^{n} P (E_{B}, r_{o r n, i}, l_{o r n, i}))}{r_{D B D} (1 + \sum_{n = 1}^{n_{min}} \sum_{o r n} \prod_{i = 1}^{n} P (E_{B}, r_{o r n, i}, l_{o r n, i})) + 1 + r_{I D R} (n_{t} n_{b} + \sum_{n = 2}^{n_{min}} \sum_{o r n} \prod_{i = 2}^{n} P (E_{B}, r_{o r n, i}, l_{o r n, i}))} .

When $P_{T F} ≪ 1$ , we can approximate $P_{T F}$ as

P_{T F} \approx (1 + \sum_{n = 1}^{n_{min}} \sum_{o r n} \prod_{i = 1}^{n} P (E_{B}, r_{o r n, i}, l_{o r n, i})) P_{s i m p l e},

where we have considered the facts that $r_{I D R} < r_{D B D}$ because $E_{B} < E_{D B D}$ , and $P_{s i m p l e} = r_{D B D} / (1 + r_{D B D}) \approx r_{D B D}$ due to $r_{D B D} ≪ 1$ . For instance, taking $E_{D B D} = 15$ , $V_{1} = 1 {b p}^{3} \approx (0.34 n m)^{3}$ , and $V_{c} = 1 μ m^{3}$ , we find that $r_{D B D} \sim 10^{- 4}$ .

At $n_{b} = n_{t} = 1$ , the generalized Equation 19 simplifies to the following expression:

P_{T F} = \frac{r_{D B D} [1 + e^{E_{B}} V_{1} f (d, l_{0})]}{r_{D B D} [1 + e^{E_{B}} V_{1} f (d, l_{0})] + r_{I D R} + 1} .

Therefore, $P_{T F}$ can be approximated as $(1 + P) P_{s i m p l e}$ , as illustrated in Equation 1 of the main text. This approximation is also derivable from Equation 20.

Appendix 3

$Q$ varying with $l_{0} / d$ for all combinations of $n_{b}$ and $n_{t}$

Appendix 3—figure 1

Download asset Open asset

Maximal $Q$ is observed at $l_{0} \sim d$ for all combinations of $n_{b}$ and $n_{t}$ .

Appendix 4

$P_{T F}$ varying with $n_{b}$ and $n_{t}$

Appendix 4—figure 1

Download asset Open asset

Binding probability varying with $n_{b}$ and $n_{1}$ .

(a) $P_{T F}$ varying with $n_{b}$ at $n_{t} = 12$ . $P_{T F}$ initially increases rapidly, until eventually saturating. (b) $P_{T F}$ varies non-monotonically with $n_{t}$ . $P_{T F}$ slightly decreases with increasing $n_{t}$ at $n_{t} > n_{b}$ . Three examples at $n_{b} = 7, 8, 9$ are displayed. $P_{T F}$ is denoted by pentagrams at $n_{t} = n_{b}$ .

Appendix 5

Robustness of the relationships: $n_{b}^{} \sim n_{t}^{}$ and $n_{b}^{*}$ decreases with increasing $E_{B}$

Appendix 5—figure 1 shows the contour lines at $P_{T F} = 0.9$ and $P_{T F} = 0.5$ for various $E_{B}$ . All take an approximate symmetrical L-shape, indicating that for a given $E_{B}$ , the optimal values $n_{b}^{*}$ and $n_{t}^{*}$ are comparable, as guided by the black dashed line $y = x$ . As $E_{B}$ increases, the contour lines shift towards lower values of $n_{b}$ and $n_{t}$ , indicating a decrease in $n_{b}^{*}$ as $E_{B}$ increases.

Appendix 5—figure 1

Download asset Open asset

Contour lines at $P_{T F} = 0.9$ and $P_{T F} = 0.5$ on the plane defined by the number of binding sites, $n_{b}$ , and the number of binding targets, $n_{t}$ .

The black dashed line is added to guide the relationship $n_{b} = n_{t}$ .

Appendix 6

Robustness of the binding design principles

In the main text, we employed uniform binding site distances in TFs, represented as $l_{0}$ , and uniform binding target distances on the DNA, denoted as $d$ , to obtain the design principles. Here, we also explore the scenario where both IDR binding sites and their targets are distributed as a Poisson process (i.e. the distance between neighboring sites is exponentially distributed), with mean values of $l_{0}$ and $d$ , respectively. Appendix 6—figure 1 is similar to Figure 2 in the main text. We observe that the design principles identified in the main text are applicable under these conditions as well.

Appendix 6—figure 1

Download asset Open asset

Plots of the affinity enhancement, quantified by the ratio $Q \equiv P_{T F} / P_{s i m p l e}$ , for the case where IDR sites as well as their targets are randomly positioned on the IDR and DNA, respectively.

See the analogous Figure 2 in the main text for the case of homogeneously distributed IDR sites and targets. (a) $Q$ is maximal when $l_{0} \sim d$ (shaded region) for different values of $n_{t} = n_{b}$ . The average distance between IDR sites is $l_{0}$ , and the average distance between IDR targets is $d$ . (b) A heatmap of $Q$ at $E_{B} = 10$ and $l_{0} = d$ . (c) Contour lines at different $E_{B}$ at $P_{T F} = 0.5$ exhibit an approximate symmetrical L-shape, which indicates $n_{b}^{*} = n_{t}^{*}$ , as shown by the red triangles. $n_{b}^{*}$ decreases with increasing $E_{B}$ . (d) $Q$ varies with $E_{B}$ for several different constants of $n_{t} = n_{b}$ . The characteristic energy $E_{t h}$ , as predicted by Equation 4 in the main text, corresponds to the region where a notable increase is observed.

Appendix 7

TF walking along the antenna once it attaches, and $t_{t o t a l}$ vs. $E_{B}$ and vs. $\tilde{n}$

We show a typical example of the trajectory of a TF (red) in the cell nucleus (light blue) in Appendix 7—figure 1a where the TF walks along the antenna to the DBD target once it attaches the IDR targets at ${\tilde{n}}^{*}$ and $E_{B}^{*}$ . The DBD target is at the origin. In Appendix 7—figure 1b, we find that the average number of rounds the TF hits the antenna, $n_{h i t s}$ , is close to 1, so it is fair to describe the overall search process as two distinct steps with the first step being purely a 3D search and the second a purely 1D search. This is expressed by Equation 5 in the main text.

Appendix 7—figure 1

Download asset Open asset

Demonstration of a single round of 3D-1D search in simulations.

(a) A trajectory (red) of TF in the cell nucleus (light blue). is the average distance of the TF to the antenna, and $X_{c}$ is the position of the center of mass along the antenna direction. The trajectory is displayed with a time interval of $10^{6} t_{0}$ . (b) The average number of rounds the TF hits the antenna, $n_{h i t s}$ , before reaching the DBD target. The optimal mean total search time is obtained at $L^{*} = 300 b p$ .

The minimal search time, $t_{min}$ , is obtained at $L^{*}$ , $E_{B}^{*}$ , and ${\tilde{n}}^{*}$ . Appendix 7—figure 2a shows $t_{t o t a l}$ vs. $E_{B}$ at fixed $L^{*}$ and ${\tilde{n}}^{*}$ . When $E_{B} > E_{B}^{*}$ , $t_{3 D}$ does not depend on $E_{B}$ as we predict (red solid line) and $t_{1 D}$ exponentially increases with increasing $E_{B}$ . The solid blue line shows the exponential dependence of $t_{1 D}$ on $E_{B}$ , also shown in the inset. When $E_{B} < E_{B}^{*}$ , the number of rounds of hits on the antenna, $n_{h i t s}$ , is larger than 1, as displayed in the inset. In this scenario, Equation 5 in the main text becomes ineffective, resulting in a significantly longer search time than $t_{min}$ .

Appendix 7—figure 2b shows $t_{t o t a l}$ vs. $\tilde{n}$ at fixed $L^{*}$ and $E_{B}^{*}$ . When $\tilde{n} > {\tilde{n}}^{*}$ , $t_{3 D}$ and $t_{1 D}$ agree with the prediction of Equation 5 (red curve and blue curve). Interestingly, we find that $t_{1 D}$ only weakly depends on $\tilde{n}$ . This arises due to the compensating effects of attaching and detaching: As $\tilde{n}$ increases, the TF attaches to new targets more easily, speeding up 1D motion. However, it also makes detaching from the current site more difficult, thereby slowing down 1D motion.

When $\tilde{n} < {\tilde{n}}^{*}$ , Equation 5 breaks down because of multiple rounds of hits, which is verified in the inset. As a result, both $t_{t o t a l}$ and $t_{3 D}$ increases with decreasing $\tilde{n}$ , leading them to become significantly longer than $t_{min}$ .

If each round of 3D search is independent, the 3D search time should be $n_{h i t s}$ times the one round of 3D search time, displayed in the black curve.

Appendix 7—figure 2

Download asset Open asset

Search times as a function of $E_{B}$ and $\tilde{n}$ .

(a) $t_{t o t a l}$ , $t_{3 D}$ , and $t_{1 D}$ vs. at fixed $L^{*}$ and ${\tilde{n}}^{*}$ . The red solid line is the prediction of $t_{3 D}$ from Equation 5, and the blue line (also in the inset) is the exponential fit for $t_{1 D}$ . The black curve in the inset is the number of rounds the TF hits the antenna, $n_{h i t s}$ , vs. $E_{B}$ . (b) $t_{t o t a l}$ , $t_{3 D}$ and $t_{1 D}$ vs. $\tilde{n}$ at fixed $L^{*}$ and $E_{B}^{*}$ . The red and blue solid lines represent the predictions of $t_{3 D}$ and $t_{1 D}$ from Equation 5, respectively. $n_{h i t s}$ varying with $\tilde{n}$ is shown in the inset.

Appendix 8

A single round of 3D-1D search is typically the case in the optimal search process

In this section, we will show that the optimal search process (i.e. minimizing the mean first passage time) generically takes place via a single round of 3D-1D search. To do so, we first need to derive the expression for the mean total search time, $t_{t o t a l}$ , and its dependence on the relevant parameters, including, importantly, $E_{B}$ and $L$ . According to our model presented in the main text: A TF is initially in the 3D nucleus space; it can attach to the antenna and perform 1D diffusion with the coefficient denoted by $D_{1}$ . We assume the DBD target is placed in the middle of the antenna, as shown in Figure 1 of the main text. The mean search time taken for a TF to move from 3D space and attach to the antenna is denoted by $t_{3 D}$ . The TF can also detach from the antenna at a rate we denote by $Γ$ , perform a 3D random walk, until ultimately attaching again to the antenna. In the following, we will make the assumption (often made in the context of FD model Berg et al., 1981; Berg and von Hippel, 1987; Slutsky and Mirny, 2004; Mirny et al., 2009; Li et al., 2009; Bénichou et al., 2011; Sheinman et al., 2012) that the binding position after such a 3D excursion is uniformly distributed on the antenna.

Both $D_{1}$ and $Γ$ are functions of $E_{B}$ . When the TF is on the antenna, its mean search time to reach the DBD target, located a distance $x$ away on the antenna, is represented by $T (x)$ . Naturally, $T (x)$ is zero when $x = 0$ , the location of the DBD target. The mean total search time $t_{t o t a l}$ is thus given by

t_{t o t a l} = \frac{1}{L} \int_{- L / 2}^{L / 2} T (x) d x + t_{3 D} .

Consider a time interval $Δ t$ . With probability $Γ Δ t$ the TF falls off, and the mean first passage time to reach the target is $t_{t o t a l} + Δ t$ . Otherwise, with probability $1 - Γ Δ t$ , the mean first passage time would be the average of the two possible outcomes, namely, $(T (x + Δ x) + T (x - Δ x)) / 2 + Δ t$ . This recursive reasoning leads us to the following equation:

T (x) = Δ t + \frac{T (x - Δ x) + T (x + Δ x)}{2} (1 - Γ Δ t) + t_{t o t a l} Γ Δ t .

In the continuous limit, we have

D_{1} \frac{\partial^{2} T (x)}{\partial x^{2}} - Γ T (x) + Γ t_{t o t a l} + 1 = 0,

where $D_{1} \equiv (Δ x)^{2} / (2 Δ t)$ . By applying the boundary conditions, specifically the absorbing boundary condition at $x = 0$ and the reflecting boundary conditions at $x = \pm L / 2$ , $T (x)$ is determined as

T (x) = (t_{t o t a l} + Γ^{- 1}) [1 - \frac{\cosh (\frac{L - 2 | x |}{L_{s}})}{\cosh (\frac{L}{L_{s}})}],

where $L_{s}$ , representing the 1D diffusion distance, is defined by $L_{s} \equiv 2 \sqrt{D_{1} Γ^{- 1}}$ (Wunderlich and Mirny, 2008). Therefore, $t_{t o t a l}$ can be self-consistently solved by substituting $T (x)$ into Equation 22, presented as follows:

t_{t o t a l} = \frac{L}{L_{s}} \coth (\frac{L}{L_{s}}) t_{3 D} + [\frac{L}{L_{s}} \coth (\frac{L}{L_{s}}) - 1] Γ^{- 1} .

We can obtain $t_{t o t a l}$ in two limiting cases:

When $L ≫ L_{s}$ , where multiple rounds of 3D-1D excursions are needed to find the target, we have $t_{t o t a l} = \frac{L}{L_{s}} (t_{3 D} + Γ^{- 1})$ . This is the same as the results of the facilitated diffusion model (Wunderlich and Mirny, 2008; Bénichou et al., 2011; Sheinman et al., 2012; Hachmo and Amir, 2023).
When $L ≪ L_{s}$ , where a single round of 3D-1D search can find the target, we get $t_{t o t a l} = t_{3 D} + \frac{L^{2}}{12 D_{1}}$ . This matches Equation 5 in the main text when $L < 2 R$ .

Note that the calculations above did not assume an optimal or efficient search process, but simply characterized the dependence of the mean first passage time on the model parameters. Next, based on the expression for $t_{t o t a l}$ in Equation 26, we will prove that the search process involving multiple rounds of 3D-1D search is not optimal. This conclusion is reached by showing that the minimum $t_{t o t a l}$ always occurs in the regime where $L < 2 L_{s}$ (since $L_{s}$ is the typical distance covered by 1D diffusion on the antenna before falling off, and the maximal distance needed to be traversed on the antenna is $L / 2$ ). We emphasize an important distinction between our model and the well-studied facilitated diffusion model (where it is established that multiple rounds of search are needed for efficient search): in our case, the antenna length $L$ is a variable that can be optimized over (since we assume evolution has optimized over the size of the region flanking the DBD in which binding sites for the IDR are located). In the facilitated diffusion model, $L$ is assumed to be the length of the genome, since the interactions of the TF and the DNA are assumed non-specific. To proceed, let us first note that the factor $\coth (\frac{L}{L_{s}})$ in Equation 26 quickly converges to 1 as the ratio of $L / L_{s}$ increases. For instance, when the ratio is 2, we have $\coth (2) \approx 1.04$ . Thus, for $L > 2 L_{s}$ , we could approximate $t_{t o t a l}$ as

t_{t o t a l} \approx \frac{L}{L_{s}} t_{3 D} + (\frac{L}{L_{s}} - 1) Γ^{- 1} .

Then, we consider how $t_{3 D}$ varies with $L$ . If the rate to hit the antenna after 3D diffusion was the sum of the rates to hit every infinitesimal segment of it, we would expect the rate to hit the antenna to be linear in $L$ , and the mean first passage time to scale as $1 / L$ . However, these processes are not uncorrelated. Interestingly, when $L < 2 R$ , and assuming a straight antenna, $t_{3 D}$ is proportional to $\ln (L) / L$ (see Equation 5 in the main text), implying that for a straight antenna the effect of these correlations is mild. However, when $L > 2 R$ , the antenna is curved rather than straight (due to the limitations imposed on the DNA conformation by the dimensions of the nucleus), and the dependence of $t_{3 D}$ on $L$ is weaker (Hu et al., 2006). Thus, for any given $L$ greater than $2 L_{s}$ , $t_{3 D}$ varies more weakly than $1 / L$ , making the first term of $t_{t o t a l}$ in Equation 27 an increasing function of $L$ . Clearly, the second term increases as well. Therefore, when $L > 2 L_{s}$ , regardless of the functional forms of $D_{1}$ and $Γ^{- 1}$ on $E_{B}$ , $t_{t o t a l}$ decreases with decreasing $L$ . Thus, the minimum $t_{t o t a l}$ occurs in the regime where $L < 2 L_{s}$ , and the optimal search typically consists of a single search round.

Appendix 9

Robustness of mean total search time with changing the antenna geometry

We checked that the mean total search time (Mean First Passage Time) for IDR targets, generated via the worm-like chain model – a random walk with a persistence length (Milstein and Meiners, 2013), does not significantly differ from the scenario where targets are arranged linearly, as presented in the main text. See Appendix 9—figure 1 for details.

Appendix 9—figure 1

Download asset Open asset

Left: an example of the target configuration (the antenna) generated by the worm-like chain model, where $L = 300 b p$ , $d = 10 b p$ , and the persistence length also equals $10 b p$ .

Right: Mean total search times weakly vary with persistence length when targets are generated using the worm-like chain model. The values are represented by the black dots, with parameters set to $E_{B} = 11$ , $L = 300$ bp, and $\tilde{n} = 4$ . For each persistence length, we obtained ten mean total search times by considering ten different target configurations. The mean total search times do not exhibit significant differences compared to that for targets arranged linearly (indicated by the horizontal orange line).

Appendix 10

MSD in a complex DNA configuration

In Appendix 10—figure 1, we run simulations to examine how the Mean Square Displacement (MSD) of a point searcher is influenced by the binding energy ( $E_{n o n s p}$ ) between the TF and the entire DNA in a complex DNA configuration. Appendix 10—figure 1a shows the DNA configuration generated via the worm-like chain model with a persistence length of 150 bp (Milstein and Meiners, 2013), where the volume fraction $ν = 0.008$ is comparable to the experimental yeast DNA volume fraction. Appendix 10—figure 1b shows the MSD measured at different values of $E_{n o n s p}$ for $ν = 0.008$ . When $E_{n o n s p} \sim 1 k_{B} T$ , which corresponds to the non-specific binding energy regime in experiments (Afek et al., 2014), the effective 3D diffusion coefficient is slightly modified, which does not affect our conclusion on the optimal search process. Additionally, we observed that the MSD curve bends when $E_{n o n s p}$ exceeds the typical range of non-specific binding energies.

Appendix 10—figure 1

Download asset Open asset

Search in a complex DNA configuration.

(a) Configuration of DNA (in red) at $ν = 0.008$ with a complex structure and a trajectory of TF (in black). A TF acts as a point searcher that diffuses in 3D space and becomes trapped once it hits the DNA, which is $1 n m$ wide. It detaches at a rate of $\frac{D_{3}}{1 n m^{2}} \exp (- E_{n o n s p} / k_{B} T)$ . (b) MSD at different $E_{n o n s p}$ .

Appendix 11

Properties of the octopusing walk

Once an IDR site binds to the antenna, the TF performs an octopusing walk, whereby different IDR binding sites constantly bind and unbind from the DNA. As we shall see, this will lead to an effective 1D diffusion of the TF. This indicates that during the octopusing walk, the TF neither gets trapped on a single target nor fully detaches from the antenna, allowing for rapid dynamics (in fact, the 1D diffusion coefficient can be comparable to the free 3D diffusion coefficient). We shall also show that the number of bound sites on the DNA is described by a probability distribution that is in steady state and obeys detailed balance. We denote that time-dependent probability distribution of the number of bound sites by $P (n, t)$ (with $n$ ranging from 0 to $\tilde{n}$ ). The dynamics of this probability distribution is given by:

\frac{d P (n, t)}{d t} = P (n - 1, t) W_{n - 1, n} + P (n + 1, t) W_{n + 1, n} + P (n, t) W_{n, n},

where $W$ is called transition rate matrix, and $W_{n, n} = - W_{n, n - 1} - W_{n, n + 1}$ . Intuitively, we anticipate that $W_{n, n - 1}$ , corresponding to the process of detachment of one of the sites, is linear in $n$ , since we expect the different sites to be approximately independent of each other. Similarly, if we consider the different IDR sites as independent ‘searchers’, the on-rate (corresponding to $W_{n, n + 1}$ ) to be linear in $(\tilde{n} - n)$ . Indeed, Fig. Appendix 11—figure 1a corroborates this intuition to an excellent approximation. As the dynamics proceeds, Equation 28 would reach a steady-state distribution governed by the matrix $W$ . Appendix 11—figure 1b shows the excellent agreement between the steady-state associated with the matrix $W$ and that found directly from the MD simulation. This confirms that TF movement along the antenna is in a dynamic steady state. We also note that since our transition matrix is tri-diagonal, it satisfies the detailed balance condition, that is, in the steady-state reached, we have: $P (n) W_{n, n - 1} = P (n - 1) W_{n - 1, n}$ , where $P (n)$ is the steady-state distribution. A priori, one may think that the TF is facing two contradictory demands: on the one hand, it has to perform rapid diffusion along the DNA. On the other hand, it needs to bind sufficiently strongly so that it does not fall off the DNA too soon. Indeed, this is known as the speed-stability paradox within the context of TF search in bacteria. Importantly, the simple picture we described above explains an elegant mechanism to bypass this paradox, achieving a high diffusion coefficient with a low rate of detaching from the DNA. This comes about due to the compensatory nature of the dynamics outlined above: there is, in effect, a feedback mechanism whereby if only a few sites are attached to the DNA, the on-rate increases, and, similarly, if too many are attached (thus prohibiting the 1D diffusion), the off-rate increases. This is manifested by the peaked steady-state distribution shown in Appendix 11—figure 1b. In fact, this mechanism is remarkably effective even when the typical (i.e., most probable) number of binding sites is as small as 2 (note that in this case, as shown in the inset of Appendix 11—figure 1a, the TF may cover a distance of more than 100 bp before falling off).

Appendix 11—figure 1

Download asset Open asset

Property of 1D octopusing walk.

(a) $W_{n, n + 1}$ (or $W_{n, n - 1}$ ) denoted by ‘on’ (or ‘off’) varies with $n$ for $E_{B} = 6$ and $\tilde{n} = 12$ . Inset: Mean square displacement (MSD) suggests diffusive motion, as indicated by the dashed line following $M S D \propto t$ . (b) Steady probability distribution of bound sites.

Data availability

Data and code availability: The data supporting this study's findings and the codes used in this study are available from https://github.com/wenchengJi/code_TF (copy archived at Ji, 2024).

References

1. Afek A
2. Schipper JL
3. Horton J
4. Gordân R
5. Lukatsky DB
(2014) Protein−DNA binding in the absence of specific base-pair recognition
PNAS 111:17140–17145.

https://doi.org/10.1073/pnas.1410569111
- Google Scholar
(2011) Intermittent search strategies
Reviews of Modern Physics 83:81–129.

https://doi.org/10.1103/RevModPhys.83.81
- Google Scholar
1. Berg HC
2. Purcell EM
(1977) Physics of chemoreception
Biophysical Journal 20:193–219.

https://doi.org/10.1016/S0006-3495(77)85544-6
- PubMed
- Google Scholar
(1981) Diffusion-driven mechanisms of protein translocation on nucleic acids. 1. models and theory
Biochemistry 20:6929–6948.

https://doi.org/10.1021/bi00527a028
- Google Scholar
1. Berg OG
2. von Hippel PH
(1987) Selection of DNA binding sites by regulatory proteins
Journal of Molecular Biology 193:723–743.

https://doi.org/10.1016/0022-2836(87)90354-8
- Google Scholar
Book
1. Berg HC
(2018) Random Walks in Biology
Princeton University Press.

https://doi.org/10.2307/j.ctv7r40w6
- Google Scholar
1. Berthier L
2. Biroli G
(2011) Theoretical perspective on the glass transition and amorphous materials
Reviews of Modern Physics 83:587–645.

https://doi.org/10.1103/RevModPhys.83.587
- Google Scholar
1. Boija A
2. Klein IA
3. Sabari BR
4. Dall’Agnese A
5. Coffey EL
6. Zamudio AV
7. Li CH
8. Shrinivas K
9. Manteiga JC
10. Hannett NM
11. Abraham BJ
12. Afeyan LK
13. Guo YE
14. Rimel JK
15. Fant CB
16. Schuijers J
17. Lee TI
18. Taatjes DJ
19. Young RA
(2018) Transcription factors activate genes through the phase-separation capacity of their activation domains
Cell 175:1842–1855.

https://doi.org/10.1016/j.cell.2018.10.042
- PubMed
- Google Scholar
1. Brodsky S
2. Jana T
3. Mittelman K
4. Chapal M
5. Kumar DK
6. Carmi M
7. Barkai N
(2020) Intrinsically disordered regions direct transcription factor in vivo binding specificity
Molecular Cell 79:459–471.

https://doi.org/10.1016/j.molcel.2020.05.032
- PubMed
- Google Scholar
(2010) Comparing models of evolution for ordered and disordered proteins
Molecular Biology and Evolution 27:609–621.

https://doi.org/10.1093/molbev/msp277
- PubMed
- Google Scholar
(2020) Eukaryotic transcription factors can track and control their target genes using DNA antennas
Nature Communications 11:540.

https://doi.org/10.1038/s41467-019-14217-8
- Google Scholar
(2024) Ordered and disordered regions of the origin recognition complex direct differential in vivo binding at distinct motif sequences
Nucleic Acids Research 52:5720–5731.

https://doi.org/10.1093/nar/gkae249
- Google Scholar
1. Danilowicz C
2. Kafri Y
3. Conroy RS
4. Coljee VW
5. Weeks J
6. Prentiss M
(2004) Measurement of the phase diagram of DNA unzipping in the temperature-force plane
Physical Review Letters 93:078101.

https://doi.org/10.1103/PhysRevLett.93.078101
- PubMed
- Google Scholar
1. de Gennes PG
2. Leger L
(1982) Dynamics of entangled polymer chains
Annual Review of Physical Chemistry 33:49–61.

https://doi.org/10.1146/annurev.pc.33.100182.000405
- Google Scholar
Book
1. Doi M
2. Edwards SF
(1988)
The Theory of Polymer Dynamics

Oxford University Press.
- Google Scholar
(2016) How motif environment influences transcription factor search dynamics: finding a needle in a haystack
BioEssays 38:605–612.

https://doi.org/10.1002/bies.201600005
- PubMed
- Google Scholar
1. Einav T
2. Yazdi S
3. Coey A
4. Bjorkman PJ
5. Phillips R
(2019) Harnessing avidity: quantifying the entropic and energetic effects of linker length and rigidity for multivalent binding of antibodies to HIV-1
Cell Systems 9:466–474.

https://doi.org/10.1016/j.cels.2019.09.007
- PubMed
- Google Scholar
1. Ferrie JJ
2. Karr JP
3. Tjian R
4. Darzacq X
(2022) “Structure”-function relationships in eukaryotic transcription factors: The role of intrinsically disordered regions in gene regulation
Molecular Cell 82:3970–3984.

https://doi.org/10.1016/j.molcel.2022.09.021
- PubMed
- Google Scholar
(2011) Dynamic protein-DNA recognition: beyond what can be seen
Trends in Biochemical Sciences 36:415–423.

https://doi.org/10.1016/j.tibs.2011.04.006
- PubMed
- Google Scholar
1. Garcia DA
2. Johnson TA
3. Presman DM
4. Fettweis G
5. Wagh K
6. Rinaldi L
7. Stavreva DA
8. Paakinaho V
9. Jensen RAM
10. Mandrup S
11. Upadhyaya A
12. Hager GL
(2021) An intrinsically disordered region-mediated confinement state contributes to the dynamics and function of transcription factors
Molecular Cell 81:1484–1498.

https://doi.org/10.1016/j.molcel.2021.01.013
- PubMed
- Google Scholar
(2019) Dependence of DNA persistence length on ionic strength and ion type
Physical Review Letters 122:028102.

https://doi.org/10.1103/PhysRevLett.122.028102
- PubMed
- Google Scholar
Conference
(2011) Intrinsic disorder within and flanking the DNA-binding domains of human transcription factors
Biocomputing 2012: Proceedings of the Pacific Symposium. pp. 104–115.

https://doi.org/10.1142/9789814366496_0011
- Google Scholar
1. Hachmo O
2. Amir A
(2023) Conditional probability as found in nature: facilitated diffusion
American Journal of Physics 91:653–658.

https://doi.org/10.1119/5.0123866
- Google Scholar
1. Hammar P
2. Leroy P
3. Mahmutovic A
4. Marklund EG
5. Berg OG
6. Elf J
(2012) The lac repressor displays facilitated diffusion in living cells
Science 336:1595–1598.

https://doi.org/10.1126/science.1221648
- PubMed
- Google Scholar
1. Holehouse AS
2. Kragelund BB
(2024) The molecular basis for cellular function of intrinsically disordered protein regions
Nature Reviews. Molecular Cell Biology 25:187–211.

https://doi.org/10.1038/s41580-023-00673-0
- PubMed
- Google Scholar
1. Hopfield JJ
(1974) Kinetic proofreading: a new mechanism for reducing errors in biosynthetic processes requiring high specificity
PNAS 71:4135–4139.

https://doi.org/10.1073/pnas.71.10.4135
- Google Scholar
(2006) How proteins search for their specific sites on DNA: the role of DNA conformation
Biophysical Journal 90:2731–2744.

https://doi.org/10.1529/biophysj.105.078162
- PubMed
- Google Scholar
1. Hu T
2. Shklovskii BI
(2007) Kinetics of viral self-assembly: role of the single-stranded RNA antenna
Physical Review E 75:051901.

https://doi.org/10.1103/PhysRevE.75.051901
- Google Scholar
(2006) NMR structural and kinetic characterization of a homeodomain diffusing and hopping on nonspecific DNA
PNAS 103:15062–15067.

https://doi.org/10.1073/pnas.0605868103
- Google Scholar
1. Jacob F
2. Monod J
(1961) François Jacob and Jacques Monod: genetic regulatory mechanisms in the synthesis of proteins
Journal of Molecular Biology 3:318–356.

https://doi.org/10.1016/S0022-2836(61)80072-7
- Google Scholar
(2021) Speed-specificity trade-offs in the transcription factors search for their genomic binding sites
Trends in Genetics 37:421–432.

https://doi.org/10.1016/j.tig.2020.12.001
- PubMed
- Google Scholar
Software
1. Ji W
(2024) Code_TF, version swh:1:rev:12f0237a8a1903ff573593fc0ea800274bc64262
Software Heritage.

https://archive.softwareheritage.org/swh:1:dir:e2f82b046ba75fed9ba8c3106b890a701df2358d;origin=https://github.com/wenchengJi/code_TF;visit=swh:1:snp:a2c2dc75e15d41c84fa3eb4b6d977e0e7981bfb6;anchor=swh:1:rev:12f0237a8a1903ff573593fc0ea800274bc64262
1. Jonas F
2. Carmi M
3. Krupkin B
4. Steinberger J
5. Brodsky S
6. Jana T
7. Barkai N
(2023) The molecular grammar of protein disorder guiding genome-binding locations
Nucleic Acids Research 51:4831–4844.

https://doi.org/10.1093/nar/gkad184
- PubMed
- Google Scholar
1. Jonas F
2. Navon Y
3. Barkai N
(2025) Intrinsically disordered regions as facilitators of the transcription factor target search
Nature Reviews. Genetics 26:424–435.

https://doi.org/10.1038/s41576-025-00816-3
- PubMed
- Google Scholar
1. Kent S
2. Brown K
3. Yang C-H
4. Alsaihati N
5. Tian C
6. Wang H
7. Ren X
(2020) Phase-separated transcriptional condensates accelerate target-search process revealed by live-cell single-molecule imaging
Cell Reports 33:108248.

https://doi.org/10.1016/j.celrep.2020.108248
- PubMed
- Google Scholar
1. Kumar DK
2. Jonas F
3. Jana T
4. Brodsky S
5. Carmi M
6. Barkai N
(2023) Complementary strategies for directing in vivo transcription factor binding through DNA binding domains and intrinsically disordered regions
Molecular Cell 83:1462–1473.

https://doi.org/10.1016/j.molcel.2023.04.002
- PubMed
- Google Scholar
1. Lambert SA
2. Jolma A
3. Campitelli LF
4. Das PK
5. Yin Y
6. Albu M
7. Chen X
8. Taipale J
9. Hughes TR
10. Weirauch MT
(2018) The human transcription factors
Cell 172:650–665.

https://doi.org/10.1016/j.cell.2018.01.029
- PubMed
- Google Scholar
1. Larson DR
2. Zenklusen D
3. Wu B
4. Chao JA
5. Singer RH
(2011) Real-time observation of transcription initiation and elongation on an endogenous yeast gene
Science 332:475–478.

https://doi.org/10.1126/science.1202142
- PubMed
- Google Scholar
1. Latchman DS
(1997) Transcription factors: an overview
The International Journal of Biochemistry & Cell Biology 29:1305–1312.

https://doi.org/10.1016/S1357-2725(97)00085-X
- Google Scholar
1. Li GW
2. Berg OG
3. Elf J
(2009) Effects of macromolecular crowding and DNA looping on gene regulation kinetics
Nature Physics 5:294–297.

https://doi.org/10.1038/nphys1222
- Google Scholar
1. Liu J
2. Perumal NB
3. Oldfield CJ
4. Su EW
5. Uversky VN
6. Dunker AK
(2006) Intrinsic disorder in transcription factors
Biochemistry 45:6873–6888.

https://doi.org/10.1021/bi0602718
- PubMed
- Google Scholar
1. Marcovitz A
2. Levy Y
(2013) Weak frustration regulates sliding and binding kinetics on rugged protein-DNA landscapes
The Journal of Physical Chemistry. B 117:13005–13014.

https://doi.org/10.1021/jp402296d
- PubMed
- Google Scholar
1. Marklund E
2. Mao G
3. Yuan J
4. Zikrin S
5. Abdurakhmanov E
6. Deindl S
7. Elf J
(2022) Sequence specificity in DNA binding is mainly governed by association
Science 375:442–445.

https://doi.org/10.1126/science.abg7427
- PubMed
- Google Scholar
Book
1. Milstein JN
2. Meiners JC
(2013) Worm-Like Chain (WLC) Model
Springer.

https://doi.org/10.1007/978-3-642-16712-6_502
- Google Scholar
1. Mindel V
2. Brodsky S
3. Cohen A
4. Manadre W
5. Jonas F
6. Carmi M
7. Barkai N
(2024) Intrinsically disordered regions of the Msn2 transcription factor encode multiple functions using interwoven sequence grammars
Nucleic Acids Research 52:2260–2272.

https://doi.org/10.1093/nar/gkad1191
- PubMed
- Google Scholar
1. Mirny L
2. Slutsky M
3. Wunderlich Z
4. Tafvizi A
5. Leith J
6. Kosmrlj A
(2009) How a protein searches for its site on DNA: the mechanism of facilitated diffusion
Journal of Physics A 42:434013.

https://doi.org/10.1088/1751-8113/42/43/434013
- Google Scholar
1. Mitchell PJ
2. Tjian R
(1989) Transcriptional regulation in mammalian cells by sequence-specific DNA binding proteins
Science 245:371–378.

https://doi.org/10.1126/science.2667136
- PubMed
- Google Scholar
1. Morgunova E
2. Taipale J
(2017) Structural perspective of cooperative transcription factor binding
Current Opinion in Structural Biology 47:1–8.

https://doi.org/10.1016/j.sbi.2017.03.006
- PubMed
- Google Scholar
1. Prince ER
(1953) A theory of the linear viscoelastic properties of dilute solutions of coiling polymers
The Journal of Chemical Physics 21:1272–1280.

https://doi.org/10.1063/1.1699180
- Google Scholar
1. Ptashne M
2. Gann A
(1997) Transcriptional activation by recruitment
Nature 386:569–577.

https://doi.org/10.1038/386569a0
- PubMed
- Google Scholar
1. Sabari BR
2. Dall’Agnese A
3. Boija A
4. Klein IA
5. Coffey EL
6. Shrinivas K
7. Abraham BJ
8. Hannett NM
9. Zamudio AV
10. Manteiga JC
11. Li CH
12. Guo YE
13. Day DS
14. Schuijers J
15. Vasile E
16. Malik S
17. Hnisz D
18. Lee TI
19. Cisse II
20. Roeder RG
21. Sharp PA
22. Chakraborty AK
23. Young RA
(2018) Coactivator condensation at super-enhancers links phase separation and gene control
Science 361:eaar3958.

https://doi.org/10.1126/science.aar3958
- PubMed
- Google Scholar
(2012) Classes of fast and specific search mechanisms for proteins on DNA
Reports on Progress in Physics 75:026601.

https://doi.org/10.1088/0034-4885/75/2/026601
- PubMed
- Google Scholar
1. Shimamoto N
(1999) One-dimensional diffusion of proteins along DNA: its biological and chemical significance revealed by single-molecule measurements
The Journal of Biological Chemistry 274:15293–15296.

https://doi.org/10.1074/jbc.274.22.15293
- PubMed
- Google Scholar
1. Slutsky M
2. Mirny LA
(2004) Kinetics of protein-DNA interaction: facilitated target location in sequence-dependent potential
Biophysical Journal 87:4021–4035.

https://doi.org/10.1529/biophysj.104.050765
- PubMed
- Google Scholar
(2019) Transcription factors and 3D genome conformation in cell-fate decisions
Nature 569:345–354.

https://doi.org/10.1038/s41586-019-1182-7
- PubMed
- Google Scholar
1. Tenenbaum D
2. Inlow K
3. Friedman LJ
4. Cai A
5. Gelles J
6. Kondev J
(2023) RNA polymerase sliding on DNA can couple the transcription of nearby bacterial operons
PNAS 120:2301402120.

https://doi.org/10.1073/pnas.2301402120
- Google Scholar
1. Vuzman D
2. Levy Y
(2010) DNA search efficiency is modulated by charge composition and distribution in the intrinsically disordered tail
PNAS 107:21004–21009.

https://doi.org/10.1073/pnas.1011775107
- Google Scholar
1. Vuzman D
2. Levy Y
(2012) Intrinsically disordered regions as affinity tuners in protein-DNA interactions
Molecular bioSystems 8:47–57.

https://doi.org/10.1039/c1mb05273j
- PubMed
- Google Scholar
(2023) Transcription factor dynamics: one molecule at a time
Annual Review of Cell and Developmental Biology 39:277–305.

https://doi.org/10.1146/annurev-cellbio-022823-013847
- PubMed
- Google Scholar
(1971) Role of repulsive forces in determining the equilibrium structure of simple liquids
The Journal of Chemical Physics 54:5237–5247.

https://doi.org/10.1063/1.1674820
- Google Scholar
1. Wright PE
2. Dyson HJ
(2015) Intrinsically disordered proteins in cellular signalling and regulation
Nature Reviews. Molecular Cell Biology 16:18–29.

https://doi.org/10.1038/nrm3920
- PubMed
- Google Scholar
1. Wunderlich Z
2. Mirny LA
(2008) Spatial effects on the speed and reliability of protein-DNA search
Nucleic Acids Research 36:3570–3578.

https://doi.org/10.1093/nar/gkn173
- PubMed
- Google Scholar
(2019) Proteome-wide signatures of function in highly diverged intrinsically disordered regions
eLife 8:e46883.

https://doi.org/10.7554/eLife.46883
- PubMed
- Google Scholar
1. Zentner GE
2. Kasinathan S
3. Xin B
4. Rohs R
5. Henikoff S
(2015) ChEC-seq kinetics discriminates transcription factor binding sites by DNA sequence and shape in vivo
Nature Communications 6:8733.

https://doi.org/10.1038/ncomms9733
- PubMed
- Google Scholar

Article and author information

Author details

Wencheng Ji
1. Department of Physics of Complex Systems, Weizmann Institute of Science, Rehovot, Israel
2. John A Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, United States
Contribution
Investigation, Writing – original draft, Writing – review and editing

For correspondence
wencheng.ji@weizmann.ac.il

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0003-1465-3953
Ori Hachmo

Department of Physics of Complex Systems, Weizmann Institute of Science, Rehovot, Israel

Contribution
Investigation, Writing – original draft, Writing – review and editing

Competing interests
No competing interests declared
Naama Barkai

Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, Israel

Contribution
Investigation, Writing – original draft, Writing – review and editing

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0002-2444-6061
Ariel Amir
1. Department of Physics of Complex Systems, Weizmann Institute of Science, Rehovot, Israel
2. John A Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, United States
Contribution
Investigation, Writing – original draft, Writing – review and editing

For correspondence
ariel.amir@weizmann.ac.il

Competing interests
Reviewing editor, eLife

"This ORCID iD identifies the author of this article:" 0000-0003-2611-0139

Funding

No external funding was received for this work.

Acknowledgements

We thank Samuel Safran, David Mukamel, Yariv Kafri, Yaakov Levy, Luyi Qiu, and Zily Burstein for insightful discussions. We thank the Clore Center for Biological Physics for their generous support.

Version history

Preprint posted: November 26, 2024
Sent for peer review: November 26, 2024
Reviewed Preprint version 1: April 14, 2025
Reviewed Preprint version 2: August 27, 2025
Version of Record published: September 30, 2025

Cite all versions

You can cite all versions using the DOI https://doi.org/10.7554/eLife.104956. This DOI represents all versions, and will always resolve to the latest one.

Copyright

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

2,782

views
184

downloads
2

citations

Views, downloads and citations are aggregated across all versions of this paper published by eLife.

Citations by DOI

2

citations for umbrella DOI https://doi.org/10.7554/eLife.104956

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Article PDF

Open citations (links to open the citations from this article in various online reference manager services)

Mendeley

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

Wencheng Ji
Ori Hachmo
Naama Barkai
Ariel Amir

(2025)

Design principles of transcription factors with intrinsically disordered regions

eLife 14:RP104956.

https://doi.org/10.7554/eLife.104956.3

Categories and tags

Research organism

S. cerevisiae

Share this article

Cite this article

Model illustration of a TF with the IDR and its search process.

Design principles across various parameters: l0/d, nb, nt, and EB.

Mean search time varying with the antenna length , t3D and t1D vs. at EB∗ and n~∗.

Video of the 1D octopusing walk simulation.

Comparisons with experimental results.

f(r,l0) aligns well with Equation 10 (solid curves) for moderate m at various.

A diagram of the bound configurations at n=3.

Maximal Q is observed at l0∼d for all combinations of nb and nt.

Binding probability varying with nb and n1.

Contour lines at PTF=0.9 and PTF=0.5 on the plane defined by the number of binding sites, nb, and the number of binding targets, nt.

Plots of the affinity enhancement, quantified by the ratio Q≡PTF/Psimple, for the case where IDR sites as well as their targets are randomly positioned on the IDR and DNA, respectively.

Demonstration of a single round of 3D-1D search in simulations.

Search times as a function of EB and n~.

Left: an example of the target configuration (the antenna) generated by the worm-like chain model, where L=300bp, d=10bp, and the persistence length also equals 10bp.

Search in a complex DNA configuration.

Property of 1D octopusing walk.

Author details

Wencheng Ji

Contribution

For correspondence

Competing interests

Ori Hachmo

Contribution

Competing interests

Naama Barkai

Contribution

Competing interests

Ariel Amir

Contribution

For correspondence

Competing interests

Citations by DOI

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

Categories and tags

Research organism

Design principles across various parameters: $l_{0} / d$ , $n_{b}$ , $n_{t}$ , and $E_{B}$ .

Mean search time varying with the antenna length , $t_{3 D}$ and $t_{1 D}$ vs. at $E_{B}^{}$ and ${\tilde{n}}^{}$ .

$f (r, l_{0})$ aligns well with Equation 10 (solid curves) for moderate $m$ at various.

A diagram of the bound configurations at $n = 3$ .

Maximal $Q$ is observed at $l_{0} \sim d$ for all combinations of $n_{b}$ and $n_{t}$ .

Binding probability varying with $n_{b}$ and $n_{1}$ .

Contour lines at $P_{T F} = 0.9$ and $P_{T F} = 0.5$ on the plane defined by the number of binding sites, $n_{b}$ , and the number of binding targets, $n_{t}$ .

Plots of the affinity enhancement, quantified by the ratio $Q \equiv P_{T F} / P_{s i m p l e}$ , for the case where IDR sites as well as their targets are randomly positioned on the IDR and DNA, respectively.

Search times as a function of $E_{B}$ and $\tilde{n}$ .

Left: an example of the target configuration (the antenna) generated by the worm-like chain model, where $L = 300 b p$ , $d = 10 b p$ , and the persistence length also equals $10 b p$ .