Author response:
Public Review
Joint Public Review:
This manuscript presents an algorithm for identifying network topologies that exhibit a desired qualitative behaviour, with a particular focus on oscillations. The approach is first demonstrated on 3-node networks, where results can be validated through exhaustive search, and then extended to 5-node networks, where the search space becomes intractable. Network topologies are represented as directed graphs, and their dynamical behaviour is classified using stochastic simulations based on the Gillespie algorithm. To efficiently explore the large design space, the authors employ reinforcement learning via Monte Carlo Tree Search (MCTS), framing circuit design as a sequential decision-making process.
This work meaningfully extends the range of systems that can be explored in silico to uncover non-linear dynamics and represents a valuable methodological advance for the fields of systems and synthetic biology.
Strengths
The evidence presented is strong and compelling. The authors validate their results for 3-node networks through exhaustive search, and the findings for 5-node networks are consistent with previously reported motifs, lending credibility to the approach. The use of reinforcement learning to navigate the vast space of possible topologies is both original and effective, and represents a novel contribution to the field. The algorithm demonstrates convincing efficiency, and the ability to identify robust oscillatory topologies is particularly valuable. Expanding the scale of systems that can be systematically explored in silico marks a significant advance for the study of complex gene regulatory networks.
Weaknesses
The principal weakness of the manuscript lies in the interpretation of biological robustness. The authors identify network topologies that sustain oscillatory behaviour despite perturbations to the system or parameters. However, in many cases, this persistence is due to the presence of partially redundant oscillatory motifs within the network. While this observation is interesting and of clear value for circuit design, framing it as evidence of evolutionary robustness may be misleading. The "mutant" systems frequently exhibit altered oscillatory properties, such as changes in frequency or amplitude. From a functional cellular perspective, mere oscillation is insufficient - preservation of specific oscillation characteristics is often essential. This is particularly true in systems like circadian clocks, where misalignment with environmental cycles can have deleterious effects. Robustness, from an evolutionary standpoint, should therefore be framed as the capacity to maintain the functional phenotype, not merely the qualitative behaviour.
A secondary limitation is that, despite the methodological advances, the scale of the systems explored remains modest. While moving from 3- to 5-node systems is non-trivial, five elements still represent a relatively small network. It is somewhat surprising that the algorithm does not scale further, particularly when considering the performance of MCTS in other domains - for instance, modern chess engines routinely explore far larger decision trees. A discussion on current performance bottlenecks and potential avenues for improving scalability would be valuable.
Finally, it is worth noting that the emergence of oscillations in a model often depends not only on the topology but also critically on parameter choices and the nature of the nonlinearities. The use of Hill functions and high Hill coefficients is a common strategy to induce oscillatory dynamics. Thus, the reported results should be interpreted within the context of the modelling assumptions and parameter regimes employed in the simulations.
We thank the reviewers for their careful consideration of our work and for the interesting feedback and scientific discussion. We are working on a revised text based on their recommendations, which will include some of the discussion below.
This work meaningfully extends the range of systems that can be explored in silico to uncover non-linear dynamics and represents a valuable methodological advance for the fields of systems and synthetic biology.
We thank the reviewers for their positive assessment of our work’s impact!
The use of reinforcement learning to navigate the vast space of possible topologies is both original and effective, and represents a novel contribution to the field. The algorithm demonstrates convincing efficiency, and the ability to identify robust oscillatory topologies is particularly valuable. Expanding the scale of systems that can be systematically explored in silico marks a significant advance for the study of complex gene regulatory networks.
We appreciate these kind comments about our work’s merits. We are excited to share our reinforcement learning (RL) based method with the fields of systems and synthetic biology, and we consider it a valuable tool for the systematic analysis and design of larger-scale regulatory networks!
The principal weakness of the manuscript lies in the interpretation of biological robustness. The authors identify network topologies that sustain oscillatory behaviour despite perturbations to the system or parameters… [However, these] "mutant" systems frequently exhibit altered oscillatory properties, such as changes in frequency or amplitude. From a functional cellular perspective, mere oscillation is insufficient - preservation of specific oscillation characteristics is often essential. This is particularly true in systems like circadian clocks, where misalignment with environmental cycles can have deleterious effects. Robustness, from an evolutionary standpoint, should therefore be framed as the capacity to maintain the functional phenotype, not merely the qualitative behaviour.
We thank the reviewers for their attention to this point. In the large-scale circuit search, summarized in Figures 4A and 4B, we ran a search for 5-component oscillators that can spontaneously oscillate even when subjected to the deletion of a random gene. Some of the best performing circuits under these conditions exhibited a design feature we call “motif multiplexing,” in which multiple smaller motifs are interleaved in a way that makes oscillation possible under many different mutational scenarios. Interestingly, despite not selecting for preservation of frequency, the 3Ai+3Rep circuit (a 5-gene circuit highlighted in Figure 5) anecdotally appears to have a natural frequency that is robust to partial gene knockdowns, although not to complete gene deletions. As shown in Figure 5C, this circuit has a natural frequency of 6 cycles/hr (with one particular parameterization), and it can sustain a knockdown of any of its 5 genes to 50% of the wild-type transcription rate without altering the natural frequency by more than 20%.
However, we agree that there are salient differences between this training scenario and natural evolution. The revised text will clarify that these differences limit what conclusions can be drawn about biological evolution by analogy. As the reviewers point out, we use the presence of spontaneous oscillations (with or without the deletion) as a measure of fitness, regardless of frequency, so as to screen for designs with promising behavior. Also, the deletion mutations introduced during training likely represent larger perturbations to the system than a typical mutation encountered during genome replication (for example, a point mutation in a response element leading to a moderate change in binding affinity). Finally, we do not introduce any entrainment. Real circadian oscillators are aligned to a 24-hour period (“entrained”) by environmental inputs such as light and temperature. For this reason, natural circadian clocks may have natural frequencies that are slightly shorter or longer than 24 hours, although a close proximity to the 24-hour period does seem to be an important selective factor [1].
...despite the methodological advances, the scale of the systems explored remains modest. While moving from 3- to 5-node systems is non-trivial, five elements still represent a relatively small network. It is somewhat surprising that the algorithm does not scale further, particularly when considering the performance of MCTS in other domains - for instance, modern chess engines routinely explore far larger decision trees. A discussion on current performance bottlenecks and potential avenues for improving scalability would be valuable.
We thank the reviewers for their attention to this point. The main limitation we encountered to exploring circuits with more than 5 nodes in this work was the poor computational scaling of the Gillespie stochastic simulation algorithm, rather than a limitation of MCTS itself. While the average runtime of a 3-node circuit simulation was roughly 7 seconds, this number increased to 18-20 seconds with 5-node circuits. For this reason, we limited the search to topologies with ≤15 interaction arrows (15 sec/simulation). In general, the simulation time was proportional to the square of the number of transcription factors (TFs). We will revise the text to include the reason for stopping at 5 nodes, which is significant for understanding CircuiTree’s scaling properties.
With regards to scaling, an important advantage of CircuiTree is its ability to generate useful candidate designs after exploring only a portion of the search space. Like exhaustive search, given enough time, MCTS will comprehensively explore the search space and find all possible solutions. However, for large search spaces, RL-based agents are generally given a finite number of simulations (or time) to learn as much as possible.
Across machine learning (ML) applications [2] and particularly with RL models [3], this training time tends to obey a power law with respect to the underlying complexity of the problem. Thus we can use the complexity of the 3-node and 5-node searches to infer the current scaling limits of CircuiTree. The first oscillator topology was discovered after 2,280 simulations for the 3-node search, and in the 5-node search, the first oscillator using 5 nodes appeared at ~8e5 simulations, resulting in a power law of Y ~ 84.4 X0.333. Thus, useful candidate designs may be found for 6-node and 7-node searches after 4.5e7 and 5.26e9 simulations, respectively, even though these spaces contain 1.5e17 and 2.5e23 topologies, respectively. Thus, running a 7-node search with the current implementation of CircuiTree would require resources close to the current boundaries of computation, requiring roughly 1.8 million CPU-hours, or 2 weeks on 5,000 CPUs, assuming a 1-second simulation. These points will be incorporated into both the results and discussion sections in our revised text.
However, we are optimistic about CircuiTree’s potential to scale to much larger circuits with modifications to its algorithm. CircuiTree uses the original (so-called “vanilla”) implementation of MCTS, which has not been used in professional game-playing AIs in over a decade. Contemporary RL-based game-playing engines leverage deep neural networks to dramatically reduce the training time, using value networks to identify game-winning positions and policy networks to find game-winning moves. AlphaZero, developed by Google DeepMind to learn games by self-play and without domain knowledge, outperformed all other chess AIs after 44 million training games, much smaller than the 10^43 possible chess states [4]. Similarly, the game of go has 10170 possible states, but AlphaZero outperformed other AIs after only 140 million games [4]. Large circuits live in similarly large search spaces; for example, 19-node and 20-node circuits represent spaces of 10172 and 10190 possible topologies. The revised text will include this discussion and identify value and policy networks, as well as more scalable simulation paradigms such as ODEs and neural ODEs, as our future directions for improving CircuiTree’s scalability.
Finally, our revised discussion will note some important differences between game-playing and biological circuit design. Unlike deterministic games like chess, the final value of a circuit topology is determined stochastically, by running a simulation whose fitness depends on the parameter set and initial conditions. Thus, state-for-state, it is possible that training an agent for circuit design may inherently require more simulations to achieve the same level of certainty compared to classical games. Additionally, while we often possess a priori knowledge about a game such as its overall difficulty or certain known strategies, we lack this frame of reference when searching for circuit designs. Thus, it remains challenging to know if and when a large space of designs has been “satisfactorily” or “comprehensively” searched, since the answer depends on data that are unknown, namely the quantity, quality, and location of solutions residing in the search space.
Not accounting for redundancy due to structural symmetries
Finally, it is worth noting that the emergence of oscillations in a model often depends not only on the topology but also critically on parameter choices and the nature of the nonlinearities. The use of Hill functions and high Hill coefficients is a common strategy to induce oscillatory dynamics. Thus, the reported results should be interpreted within the context of the modelling assumptions and parameter regimes employed in the simulations.
In our dynamical modeling of transcription factor (TF) networks, we do not rely on continuum assumptions about promoter occupancy such as Hill functions. Rather, we model each reaction - transcription, translation, TF binding/unbinding, and degradation - explicitly, and individual molecules appear and disappear via stochastic birth and death events. Many natural TFs are homodimers that bind cooperatively to regulate transcription; similarly, we assume that pairs of TFs bind more stably to their response element than individual TFs. Thus, our model has similar cooperativity to a Hill function, and it can be shown that in the continuum limit, the effective Hill coefficient is always ≤2. Our revision will clarify this aspect of the modeling and include a derivation of this property. Currently, the parameter values used in the figures are shown in Table 2. In the revised text, these will be displayed in the body of the text as well for clarity.
Bibliography
(1) Spoelstra, K., Wikelski, M., Daan, S., Loudon, A. S. I., & Hau, M. (2015). Natural selection against a circadian clock gene mutation in mice. PNAS, 113(3), 686–691. https://doi.org/https://doi.org/10.1073/pnas.1516442113
(2) Neumann, O., & Gros, C. (2023). Scaling Laws for a Multi-Agent Reinforcement Learning Model. The Eleventh International Conference on Learning Representations. Retrieved from https://openreview.net/forum?id=ZrEbzL9eQ3W
(3) Jones, A. L. (2021). Scaling Scaling Laws with Board Games. arXiv [Cs.LG]. Retrieved from http://arxiv.org/abs/2104.03113
(4) Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., Lanctot, M., Sifre, L., Kumaran, D., Graepel, T., Lillicrap, T., Simonyan, K., & Hassabis, D. (2018). A general reinforcement learning algorithm that Masters Chess, Shogi, and go through self-play. Science, 362(6419), 1140–1144. https://doi.org/10.1126/science.aar6404