Overview of the proposed framework.

(a) MOTIVATION: We often focus on studying the navigation and behavior of organisms in conventional three-dimensional environments, neglecting the intelligence underlying competencies at sub-organismal scales [32]. To better understand navigation competencies in unconventional organisms solving problems in unconventional spaces (e.g., embryos in morphological space), it is essential to construct comprehensive “behavioral catalogs” for these novel entities, which in turn requires sophisticated exploration methods to discover the extent of possible behaviors. Images are taken and adapted from [80][85]. (b) EXPERIMENTAL DESIGNS: We formalize GRN behavior as a navigation task and propose to investigate it by defining abstract and observer-dependent “problem spaces” that we use to organize the observed biological behaviors and their exploration in practice. (c) AUTOMATED EXPERIMENTATION: Pseudo-code of the curiosity-driven goal exploration process we use to automate the discovery of behavioral abilities that the GRN can exhibit in behavior space. (d) EMPIRICAL TESTS: We use a battery of empirical tests to identify the robust goal states of the systems, i.e. the one that can be attained under a wide variety of perturbation (including noise in gene expression, and pushes or walls during traversal of transcription space). (e) PERSPECTIVES: We explore several potential reuses of the discovered “behavioral catalog” and proposed framework across evolutionary biology, biomedicine and bioengineering contexts.

Illustration of the experimental setup and chosen problem spaces on an example GRN model which has 10 nodes and models the influence of RKIP on the ERK Signaling Pathway [99].

(a) Time-course evolution of the different nodes y1, …, y10 (one color per node) when starting from the default initial conditions (as provided in [99]). The observation captures the states taken through time o=[y(t=0), …, y(t=T)] where y=[y1, …, y10]. (b) Corresponding trajectory in transcriptional space (phase space), for two target nodes (ERK, RKIPP_RP), from t=0 (A, in red) to T=1000 seconds (B, in cyan). We can see that the trajectory converges to endpoint B in less than 100 seconds, and then stay there. The behavior (or reached goal state) is the endpoint B = [yERK(T), yRKIPRP(T)], where T is chosen big enough to ensure convergence. (c) The intervention is setting the initial state of the system trajectory (for all nodes): i = [y1(t=0), …, y10(t=0)]. (d-e) Example of perturbations used in this paper. (d) Noise perturbation, here applied to all 10 nodes every 5 secs until t=80 secs. (e) Push perturbation, here applied to the two target nodes (ERK, RKIPP_RP) at t=3 seconds. (f) Wall perturbation, also applied to the two target nodes (ERK, RKIPP_RP), here at 10% and 90% of the total distance traveled. Supplementary Figure S1 shows examples of other possible “drug” or “genome” interventions that can be implemented in the accompanying software, as well as the possibility to perform interventions (or perturbations) in parallel using batched computations.

Glossary of terms used in this paper, with the proposed isomorphism which generalizes concepts from dynamical complex systems and behavioral sciences under a common navigation task perspective.

Problem spaces used in this study.

Curiosity search uncovers a wide spectrum of reachable states in behavior space Z.

(a) Diversity of endpoints discovered by random search (pink) and curiosity search (blue) for the 432 systems. Diversity is measured as the volume of the union of the set of hyperballs of radius ϵ that have for centers the discovered endpoints {zZ} as depicted by the shaded area in (b-c) with ϵ = 0. 05. (a-left) Mean and standard deviation curves of the diversity of behaviors discovered throughout exploration (with random search having twice more experiments n=900). Dots indicate significance (p<0.05) when testing curiosity search (n) against random search (n) in brown, and against random search (n=900) in black, with a Welch's t-test. Standard deviation is divided by 4 for visibility. (a-right) Detail of the diversity obtained in the left plot for all 432 systems at n=450 and n=900, where *** indicate significance (p<0.001). (b-c) Discovered endpoints at the end of exploration (n=450) by random search (pink) and curiosity search (blue) for 6 example systems of our database. (b) Examples of systems for which curiosity search is much more sample-efficient than random search in finding a diversity of reachable states in behavior space Z. (c) Examples of systems with low-redundancy mapping I -> Z such that random search in I is already quite efficient in covering behavior space Z, and curiosity search performs equivalently.

Illustration of the non linearity and redundancy of the I->Z mapping, and of the interest of using goal-directed exploration strategies.

Plot shows the reachable points discovered by curiosity search (a) and by random search (b) in the behavior space Z and their corresponding starting points in the intervention space I, for the RKIP-ERK signaling pathway system [99]. The intervention space is 10-dimensional, and here we show the TSNE reduction in 2D. We apply HDBSCAN clustering [106] on the points discovered in Z, which produced 4 clusters for curiosity search (displayed in gray, green, purple and orange; non-assigned points are displayed in light blue) and 2 clusters for random search (displayed in light and dark orange). We then visualize where those regions in behavior space mapped back in the intervention space, by applying the same coloring. (a) Looking at the curiosity search discoveries, we can see the non-linearity of the I->Z mapping, where small regions of intervention space can map to large regions of the behavior space (like the orange area) and reversely (gray area). We can also see the redundancy of the behavior space which is clearly concentrated in the left border of the space (ERK close to zero) which can seemingly be reached from very large portions of the intervention space (gray area). (b) Looking at random search discoveries, we can understand that it is very inefficient as it spends most of its exploration budget in the region of intervention space that converges to the left border in Z, and fails to explore the orange, purple and green regions discovered by curiosity search which seemingly lead to the more novelty in Z.

Identification of robust traversal strategies in transcriptional space.

(a) Violin plots show, for each of the 432 systems (one point per system), the median sensitivity (over the K representative goal states) to the noise (green), push (gray) and wall (yellow) perturbation families. Violin plots on the right detail the median sensitivity for the 18 sub-families. (b-g) Each row provides examples of perturbed trajectories of either extremely-robust or extremely-sensitive example (GRN, Z) system (on average over the K goal states) for the three families of perturbations, as shown by annotations in (a). For instance, the first row (b) shows perturbed trajectories of the (model #10, nodes (3,7)) system which has the highest sensitivity to noise whereas the last row (g) shows trajectories of the (model #272, nodes (2,3)) system which has a nearly perfect robustness to walls. Each image contains an example trajectory for a given (i, u), and one u per sub-family is shown per column. For instance, in the first row (b), the trajectories are perturbed with the different sub-families of noise (σ ∈ [0. 001, 0. 005, 0. 1], pn ∈ [10, 5, 1]) which can be seen as various levels of difficulty. For each trajectory we annotate the starting position (A), endpoint prior perturbation (B), and endpoint after perturbation (B’), and show the original trajectory in black. The perturbed trajectory is shown in colorscale (from red at t=0 to cyan at t=3000 secs). (b) Except for few cases (trajectory #43), the system (model #10, nodes (3,7)) system is not robust to noise as its trajectories are easily deviated from the original endpoint. (c) The (model #52, nodes (4,7)) system however, except for rare cases (trajectory #35), consistently reaches its original target despite encountering various amounts of noise. Interestingly, trajectories #36 and #40 consistently follows a complex up->right-down->left path, despite the induced noise. (d) The (model #647, nodes (2,10)) system, except for few cases (trajectory #0), is typically deviated from its original trajectory when being pushed away. Interestingly though, it seems to follow similar (parallel) trajectories. (e) The (model #284, nodes (4,6)) system, is an example of an extremely robust system which, despite many push configurations (in magnitude and number), consistently returns to its original trajectory. Interestingly, the trajectories of this system are relatively complex with several loops and detours. (f) The (model #84, nodes (4,6)) system is not very robust to walls, and typically deviates or blocked when it encounters a wall. (g) The (model #272, nodes (2,3)) system is another example of an extremely robust system which, despite many wall configurations (in length and number), consistently returns to its original path. Once again interestingly, the trajectories of this system are relatively complex with several loops and detours.

Energy landscape visualization based on the trajectory-based landscape generation method [113], and constructed from different set of GRN trajectories, respectively trajectories generated (a) by the random search exploration, (b) by the curiosity-driven exploration, and (c) by the robustness tests experiments.

Analysis and comparison of the degree of sophistication, in terms of versatility and robustness, between different classes of GRN.

We categorize the GRNs by class of organism they belong to: plant, bacteria, slime mold, amphibian, rodent, homo sapiens, or generic. “n/a” refers to network models for which this information is not available. (a) Violin plots show the versatility of the 432 systems (one point per system) for each class. Versatility of one system is measured as the area covered by all the goal states discovered by curiosity search (equivalent to what we call diversity in Figure 3). (b) Trade-off (aka Pareto) mean and standard deviation curves that represent the trade-off among versatility and wall robustness performances as taken by the different classes of GRNs (standard deviation is divided by 4 for visibility). For each system, versatility (y-value) is measured as the area covered by the set of robustly achieved goal states, where the criterion of goal-achievement is a binary which tests whether the goal-reaching sensitivity (on average overall wall perturbations) is below a certain threshold (x-values). Violin plots in (a) are ordered in ascending order according to the class mean y-value at x=0.4 in (b).

Identification of stimuli-based stepwise intervention triggering robust re-set of disease states into healthy physiological states.

(a) The 10 most robust identified goal states (average sensitivity <0.05) and the corresponding reaching trajectories are displayed for the example RKIP-ERK signaling pathway [99]. We can see that most of them converge toward attractors in the “disease” region (orange). (b) Discovered stepwise stimuli intervention on MEKPP which we apply on states stuck in the “disease” region for 100 seconds. (c) The discovered intervention successfully brings back all points from the “disease” region closer to the target endpoint in the “healthy” region, and this under various tested perturbations (as shown in Supplementary Figure S2). The optimization procedure that led to the discovery of this intervention is described in the main text.

Comparison of three alternative strategies for the design of oscillator circuits: curiosity search (blue), random search (pink), and gradient descent (orange).

(a-c) Given a budget of 5000 experiments, curiosity search is able to find 1167 oscillator circuits (ones showing sustained oscillations), whereas random search only finds 42 oscillators and gradient descent does not discover any (starting from a single random initialization). (a) 3D scatter plot of the 42 random search discoveries (pink) and 1167 curiosity search ones (blue) in the (amplitude, main frequency, offset) analytic behavior space. (b) Box plots projecting points from the 3D scatter plot into the respective (amplitude, main frequency, offset) axes. (c) Diversity discovered throughout exploration, where diversity is measured with a binning-based space coverage metric (20 bins per dimension). (d) Evolution of the training loss L for the three exploration strategies. (e-f-g) Corresponding best discoveries (for which L is minimal) for the three exploration strategies. (h-i) Local training loss and resulting finetuning of the best discoveries with gradient descent.

Examples of interventions that can be implemented within the accompanying AutodiscJax software.

All those examples can be reproduced in the accompanying tutorial 1. (a) Numerical simulations with interventions can be performed in parallel by vectorizing simulations over different intervention parameters, simply using the jax vmap operator. This offers a convenient (and fast) way to test several interventions in the biological network, as shown here for testing the network under various initial conditions in batch mode. Examples of other possible “drug” or “genome” interventions that can be implemented in the accompanying software, as well as the possibility to perform interventions (or perturbations) in parallel using batched computations. In this example, despite the numerous interventions, the GRN trajectories still converge to the same endpoint B. (b) Example intervention where species amounts are clamped to specific values. In this example the node MEKPP is clamped to 2. 5µM for 10 seconds at t=0 and then to 1µM for 10 additional seconds at t=400. In this example, after the first clamping the GRN trajectory still follows a similar S-shape curve and arrives close to the original endpoint B but after the second clamping, ERK expression levels are shifted to a higher steady state B’. (c) Example intervention where the numerical value of one kinematic parameter of the model (k5) is changed from 0.0315 to 0.1. In this example we can see that changing the parameter k5 shifts the trajectory end point quite significantly, but qualitatively the trajectory seems to preserve a similar S-shape.

List of biological networks from Biomodels used in this study. The resulting database includes 30 biological networks (one row per network) and a total of 432 systems, which is defined as a (GRN model, behavior space (Z)) tuple and where the pairs of observed nodes (used as behavior spaces) per network are given in the last column.

Additional results complementing Figure 8 of the main paper.

This figure shows the resulting trajectories after applying the discovered stimuli-based intervention (shown in Figure 8-b) to the example RKIP-ERK signaling pathway [99] for the 6 “disease” trajectories originally discovered in the behavioral catalog (shown in Figure 8-a). (a) For each trajectory (one per row), we see that the intervention successfully re-sets the disease setpoint (startpoint of the trajectory shown in red in the orange region) to a healthy set-point (endpoint of the trajectory shown in cyan in the green region). (b-c) Similar results are achieved despite adding push perturbations (b) or wall perturbations (c) in addition to the stimuli-based intervention.

Wall implementation.

Walls are implemented within the 2D space spanned by the 2 observed nodes. Within that space, we can interpret the node activation levels y(0), ···, y(t) as the trajectory of a particle moving. In order to simulate the interaction with “walls” in that space, several implementations could be envisaged. Within the accompanying software AutoDiscJax we propose two possible variants: perfectly elastic collision (equivalent to a discontinuous force field) and some continuous force field variant. The second variant (continuous force field) is employed for the main results of this paper. (a) For the first variant, we consider a perfectly elastic collision without loss of kinetic energy. In that case, when the trajectory is touching the wall at position p with speed ν = ν + νǁ we deviate the trajectory in such a way that is “bouncing” against the wall such that νǁ is unchanged and ν ← − ν. To implement it, we simply check whether the segment [y(t), y(t + Δ t)] intersect the wall at each time step. It it does, we compute the intersection point p and time t1, and set the activation level y(t + Δt) to p + (Δtt1) · [− ν + νǁ]. (b) For the second variant, we implement walls as energy barriers acting as a new force field in the environment, constraining the GRN traversal of the space. This time, instead of having a discontinuous effect on the perpendicular speed ν we define a wall force ƒ = ± αν (+ if ν is going toward wall, - otherwise ) and use it to update the perpendicular component of the trajectory speed as ν ← ν + ƒ · ΔT. Here α ∈ [0, − 2] and is calculated as a function of the distance between the point and the wall. As illustrated in the figure, this basically results in a stadium-shaped force field around the wall.