Introduction

Let us consider a scenario in which, during brain development, hypothesis 1 - h1. it becomes possible to manipulate, prior to the formation of synapses, the gene expression profiles of individual neurons. Furthermore, let us assume that h2. we possess the capability to influence these expression profiles in a manner that directs synaptogenesis towards a specific neuronal network topology. In this context, h3. we might be able to obtain the optimal computational graph, expressed as a composition of functions that represent the behaviour of neurons, required to solve a task of interest. Small living organisms, or organoids (Bhaduri et al., 2020), could be, in principle, genetically programmed to develop into neuronal networks capable of solving pre-specified tasks. Such technology would lead to disruptive applications-e.g., extreme low-power computing, micro-devices for the control of biological systems, or novel biological testing platforms capable of accelerating drug discovery. To date, hypothesis h1. seems to be verified, (Nishikawa et al., 2014) while in the case of h3. we can partially rely on artificial neural networks and optimization techniques (Kingma and Ba, 2015; Graves, 2014).

In this work, we take a step toward the realization of the joint technology conceptualized in h2. and h3. by proposing SynaptoGen (Figure 1), a model that links, by means of differentiable functions, vector representations of gene expression profiles (i.e., the pattern of genes actively being transcribed into RNA in a cell) and genetic rules (i.e., interaction probabilities of protein pairs involved in synaptogenesis) to the average number of synaptic connections between pairs of neurons as well as their synaptic conductance values. We substantiate our work through theoretical development, which hinges on novel propositions and related mathematical proofs. SynaptoGen is compatible with backpropagation and can be inserted in learning frameworks where optimization is performed through gradient descent, enabling management of network sizes and task complexities beyond the capabilities of other optimization techniques. Finally, SynaptoGen is designed with flexibility in mind, allowing practitioners to choose which biological quantities to optimize (e.g., genetic rules, expression profiles, or both).

Multi-scale overview of the SynaptoGen model.

A. In neural networks, synaptogenesis-the formation of synapses-can be approximated as the outcome of interactions (e.g., molecular binding) between proteins translated from gene pairs. SynaptoGen models this process as the realization of a random variable defined as the sum of multiple binomial random variables (e.g., BAB + BBC), one for each potentially interacting gene pair. In the notation used, the uv subscript indicates the neuronal pair taken into consideration. B. The use of binomial random variables stems from the idea of modeling synapse formation for each gene pair, as a process akin to flipping nij biased coins, each representing a Bernoulli(pij) random variable. C. When working with matrix representations of the genetic factors in the panel, it is mathematically provable (see Section Methods) that a series of matrix multiplications and point-wise operations yields a matrix where the entry in the u-th row and v-th column contains the expected number of synapses formed between neurons u and v, multiplied by their average synaptic conductance. D. This resulting matrix, W ( in Section Methods), can be interpreted as a weight matrix and integrated into architectures such as multilayer perceptrons (MLPs). Here, x denotes an MLP layer’s input, while a the resulting activations.

Related Work

Interest in leveraging organoid production for computational purposes has surged since 2022, following the introduction of DishBrain (Kagan et al., 2022) by Cortical Labs. DishBrain combines in-vitro neural networks, cultured from human or rodent sources, with a simulated environment-specifically the game “Pong”-via a high-density multielectrode array. The system employs the free energy principle, which suggests that neural networks adapt by minimizing the unpredictability of their sensory inputs through belief updating and environmental interactions. Although the neuron cultures in DishBrain exhibited statistically significant improvements in gameplay performance (e.g., average rally length, % of aces and % of long-rallies) compared to the control conditions defined by the authors (i.e., random agents), it is difficult to conclude that the neuronal networks fully mastered the task.

In contrast to DishBrain, our research aims to “train” networks of neurons by influencing synap-togenesis through genetic manipulations at the individual neuron level, a largely unexplored task in the literature, with only a limited number of closely related studies available. The basis for our work is outlined in (Barabási and Czégel, 2021; Barabási and Barabási, 2020), which introduces methods for constructing networks based on genetic encodings inspired by the wiring rules of the brain. These methods were further elaborated in the Connectome Model (CM) (Kovács et al., 2020), where the authors decomposed the adjacency matrix of a connectome into the product of three matrices representing specific genetic quantities. Another development was presented in (Barabási et al., 2023), where the CM’s matrix entries were treated as learnable parameters, resulting in the weight matrix of a Multilayer Perceptron (MLP) within the context of training neural networks. While this methodology has proven effective in producing parameter-efficient neural networks, it maintains a notable distance from the biological intricacies of real neuronal networks. A distinct generalization of the CM has also been proposed for the computational inference of synaptic polarities (Harris et al., 2022), a quantity not considered in the original CM.

Similarly, our work draws inspiration from the CM but is geared towards a more bio-plausible computational modeling of synaptogenesis, with the novel elements extensively discussed in Section Methods.

Methods

In 2020, Kovács et al. proposed the CM, a novel strategy to link a brain’s connectome (B) to the expression patterns of individual neurons (X)and existing biological mechanisms- or genetic rules- O:

In the CM’s first interpretation, each row of X referred to a specific neuron while the i-th entry of the row described the binary expression (1-”expressed”-or 0-”not expressed”) of gene i, one of the genes involved in synapse formation. Matrix O, instead, represented interaction compatibility, in the synapse formation process, for proteins translated from all gene pairs. Hence, X ∈ {0, 1}N × G and O ∈ {0, 1}G×3 were defined as binary matrices while the entries of B, of shape N × N, belonged to +; with N and G denoting the number of neurons and genes, respectively. When G ≪ N however, a very common scenario in nature (Koulakov et al., 2022), not all possible connectomes can be decomposed through (1). For this reason, the authors of the CM went on to relaxing the genetic rules matrix to O ∈ [0, 1]G×G, interpreting its entries as probabilities, which generates the following approximation:

where || • || is intended as the Frobenius norm.

In this paper, we formulate a more general alternative to this framework by building a model that takes into account synaptic multiplicity and conductance. Starting from (1), we design two novel interpretations tightly linked to the formalism with which the quantities of interest (i.e., the number of synapses between neurons and their conductances) have been represented in our model. Our theoretical framework is as follows.

Let the number of synaptic connections between two neurons be represented by the following:

where Bij is a binomial random variable that expresses the contribution of the (i,j) gene pair to the total synaptic count:

And let x ∈ ℝ+G and y ∈ ℝ+G be vector representations of gene expression in the pre- and post- synaptic neurons, respectively.

Proposition 1

If the product between the i-th entry of x and the j-th entry of y denotes the number of independent experiments that characterizes Bij-i.e., xiyj = nij-and entry Oij corresponds to probability pij, then the expected number of synapses between two neurons can be calculated as:

Proof. From probability theory,

and due to the linearity of expectation, we have

On the other hand,

Recalling that xiyj = nij and Oij = pij, the proof is concluded.

In different terms, if the hypotheses of Proposition 1 are verified, gene expression in a pair of genes tells us how many attempts we can make to place a synapse between a pre- and a post-synaptic neuron; the genetic rule, instead, describes the probability of success, conditioned on the interaction between the proteins translated from the corresponding genes, for each attempt. It is worth noting that (6) represents the average number of links between two specific nodes of the connectome. Keeping in mind that genetic rules are shared across neurons, the equation can easily be generalized to the whole connectome:

where X is obtained by stacking the expression profiles of all neurons (e.g., XT = […, x, … ,y,…]).

In order to model synaptic conductances, a slightly more complex formalism is required. We restrict ourselves to chemical synapses, which are the result of the interplay between neurotransmitters released by pre-synaptic neurons and receptors in post-synaptic neurons. We also account for the fact that a chemical synapse can also have an excitatory or inhibitory effect depending on the nature of the receptor that receives a specific neurotransmitter (Fenyves et al., 2020; Harris et al., 2022). The way in which synapses work in our framework is described by the following equation:

where Iv is the current injected into post-synaptic neuron v while Vu is an input voltage from pre-synaptic neuron u; Guv, is the equivalent conductance that takes care of all the synapses formed between u and v. To model the possibility of characterizing synapses by different neurotransmitterreceptor pairs, we rely again on random variables as follows:

Let τ be a multinomial random variable representing the process of randomly picking, from u, a synaptic vesicle filled with a specific neurotransmitter. And let R be a multinomial random variable representing the process of randomly selecting a specific receptor from the membrane of v. We define vectors q ∈ [0, 1]L and r ∈ [0, 1]M as the probability distributions associated with τ and R; where L denotes the total number of neurotransmitters while M the number of receptors. We also define A ∈ {−1, 0, 1}L×M as the polarity matrix (for further details, refer to Appendix The Polarity Matrix) and K ∈ ℝ+L×M as the conductance matrix. In detail, entry Aij defines the polarity of synapses derived from the interaction of the i-th neurotransmitter with the j-th receptor (Aij = 0 if the considered neurotransmitter and receptor are not compatible) while Kij stores its linked conductance. We finally set 𝒢 = f (τ, R), with f (i, j) = AijKij. In other words, 𝒢 represents the “signed” conductance of a synapse randomly selected from the ones that connect neurons u and v.

Proposition 2

If τ and R are independent (i.e., the distribution of receptors in the post-synaptic neuron does not depend on the neurotransmitters synthesized by the pre-synaptic neuron), the expected “signed” conductance of a randomly picked synapse can be calculated as:

Proof. By expanding the matrix multiplications in (9), we have:

Thanks to the independence of τ and R:

where ℙ[•] stands for “probability of”. Hence,

that corresponds exactly to the definition of 𝔼[𝒢].

As for (7), also (9) can be generalized by stacking the neurotransmitter distributions of all pre-synaptic neurons in Q = […, q,…]T and the receptor distributions of post-synaptic neurons in R = […,r,…]T:

As a next step, (7) and (10) can be inserted into the core equation of our model, which follows:

Summarizing, through (11) we are able to express the average equivalent conductance between all pairs of neurons as a differentiable function of their gene expression profiles and distributions of synthesized neurotransmitters and receptors, which, in turn, depend on gene expression. Furthermore, in our formalism, synaptogenesis can be simulated by sampling from the random variables we have defined. For instance, the simplest approximation of synaptogenesis can be obtained as follows (further details in Appendices Simulating Synaptogenesis, A Problem of Quantization):

with

where ~ stands for “sampled from”.

Results

To validate the proposed framework and assess its applicability in real-world scenarios, we conducted a series of experiments that involved simulating synaptogenesis in small populations of neurons. In simple terms, we simulated the formation of synapses in neuron populations where genes and gene expression were manipulated at the level of individual neurons. These manipulations followed the genetic rules and expression profiles optimized by our model with the aim of enabling the resulting fully-developed neuronal networks to perform effectively in a pre-specified task.

Our approach begins with a simplified setup, where a neuronal population comprises spatially-separated layers. We assumed that neurons in one layer could only attempt to form synapses with neurons in adjacent layers. This restriction allowed us to focus on multipartite network topologies and implement SynaptoGen as a customized MLP, where its weight matrices are decomposed according to (11), with genetic rules shared across layers.

Regarding the task for our proof-of-concept experiments, we chose reinforcement learning (RL) as a bio-plausible benchmark. Biological organisms equipped with neuronal networks, indeed, exist immersed in an external environment from which they receive stimuli and upon which they can perform actions. These same characteristics are also inherent to the tasks addressed by RL. Our goal was to create virtual neural agents capable of solving the control tasks defined by the CartPole-v1, MountainCar-v0, LunarLander-v2 and Acrobot-v1 environments from the OpenAI Gym library (Brockman et al., 2016), among the most famous benchmarks in the RL community. In Cart Pole, a pole is attached to a cart that moves along a frictionless track. The objective is to balance the pole by applying forces to move the cart left or right. In Mountain Car, instead, the goal is to strategically accelerate a car to climb a sinusoidal valley-and-hill terrain. In Lunar Lander, agents must operate the main and orientation engines ofa lander to ensure it lands smoothly on a designated landing pad without damage. Finally, in the Acrobot game, the aim is to apply torques to the actuated joint of a two-link chain to make the free end of the chain surpass a given height.

Initially, we separately trained SynaptoGen-16/32/64 genes, 3 neurotransmitters, neuronal population sizes of ~128 (see Appendix Training Details for the training details)-on the four selected environments using the DQN algorithm (Mnih et al., 2015). The training provided the learned genetic rules, gene expression profiles, neurotransmitter and receptor distributions, and synaptic conductances that enabled the agents to perform the tasks. Notably, SynaptoGen is built on the average equivalent conductances introduced in (10), and can be interpreted as an average agent that reflects the effects of the underlying genetically-derived quantities.

Successively, for each environment, we sampled 100 neural networks from the trained Synap- toGen models, simulating the development, based on the computed synaptogenesis rules, of multiple populations of neurons into neuronal networks. We then measured their performance and compared the obtained metrics with those from two carefully designed baselines. The first baseline (Appendix The Separable Natural Evolution Strategy (SNES) Baseline) adopts the optimization technique employed in (Stöckl et al., 2022), which has been applied to optimize related models, the probabilistic skeletons, for similar tasks, i.e., the optimization of connection probabilities between neuronal types. This technique, known as SNES (Schaul, 2012), is an evolutionary algorithm specifically designed for medium-to-large problem dimensions. Given the structure of Synapto- Gen, which is explicitly designed for gradient-based optimization but remains general and flexible enough to accommodate other optimizers, SNES could serve as both a benchmark we aim to surpass and a potential alternative worth exploring. The second baseline (Appendix The Bio-Plausible Baseline) leveraged the SynaptoGen’s biology, using the optimized genetic rules O and conductance matrix K, while initializing gene expression profiles in a bio-plausible manner following the procedure identified by Kerstjens et al. (Kerstjens et al., 2022, 2024). As before, 100 neuronal networks were sampled from each baseline.

The results obtained from experiments with agents characterized by a genetic profile of 16 genes are shown in Figure 2. Table 1 provides summary metrics of the outcomes of each experiment. The first metric is the average reward calculated across groups of 100 sampled agents. The second metric is also the average reward, but calculated for the top-10 best-performing agents in each group. This metric aims to assess the models’ effectiveness in scenarios where the entity responsible for synthesizing biological agents prioritizes achieving a small subset of highly performing agents, even at the cost of the overall process yield-the number of agents capable of solving their task divided by the total number of agents. Finally, the third metric describes the percentage of simulated agents capable of solving the tasks defined by the four selected environments (see Appendix Reward Thresholds for Considering a Task Solved for further discussion). This metric represents the global yield of each methodology under consideration.

Mean reward distributions from the model families, characterized by a 16-gene profile, tested on the four selected RL environments.

The green crosses represent the mean rewards, averaged over 10 episodes, achieved by the trained SynaptoGen models. Each scatterplot point represents the mean reward obtained by a specific agent. The black crosses denote instead the distribution means. Model families are color-coded. The dashed horizontal lines indicate the reward threshold beyond which the task associated with an environment is considered solved.

Summary metrics computed over the reward distributions obtained from the model families characterized by a 16-gene profile.

Three metrics are reported: the distribution mean, a mean computed over the top-10 agents, and the percentage of simulated agents that solved the task. Each group of rows refers to one of the selected RL environments and the best scores are highlighted in bold.

In terms of average reward SynaptoGen (trained via gradient descent) emerged as the bestperforming model, outperforming agents optimized with SNES in three out of four environments. It also consistently surpassed the baseline characterized by biologically plausible initialization of gene expression. The SNES agents also performed well, exceeding the bio-plausible baseline-the one characterized by the bio-plausible expression initialization-in three out of four environments. When examining the average rewards of the top-10 agents, the superiority of SynaptoGen and its derived agents becomes even more evident. They achieved the highest performance across all environments studied, sharing the leading position only in Cart Pole. By contrast, the SNES top-10 agents achieved average rewards indicative of task resolution in only 50% of the cases. For the bio-plausible baseline, nearly all top-10 agents failed to achieve sufficient performance. Finally, regarding the percentage of agents capable of solving the tasks their genetic profiles were optimized/initialized for, SynaptoGen was the only approach to achieve non-zero percentages in all four environments. Moreover, it achieved the highest percentage in three out of four cases. SNES agents also performed reasonably well, reaching a 100% resolution rate in half the environments, though failing entirely in the other half. For the bio-plausible baseline, only 2 out of 400 sampled agents successfully solved their respective tasks. Notably, in 10 out of 12 comparisons, the differences between reward distributions were statistically significant based on the Mann-Whitney test with Bonferroni correction (for the multiple environment-specific comparisons performed). For brevity and clarity, results for agents derived from genetic profiles with 32 or 64 genes are presented in Appendix Additional Results. These results exhibit minimal differences from those reported here and lead to equivalent conclusions.

To provide additional evidence of SynaptoGen’s effectiveness, we designed an additional set of simulations aimed at validating the framework in a context closer to the biological reality of neuronal networks. In these experiments, we used genetic rules derived from experimental data collected on the nerve ring of the nematode C. elegans. This region was selected due to the availability of both transcriptomic data (Taylor et al., 2021) and connectomic data-including membrane contacts and synapses (Brittin et al., 2021; Cook et al., 2019; Witvliet et al., 2020)-collected from it. The genetic rules were constructed based on a restricted set of 141 cell adhesion molecules (CAMs) with a documented role in synapse formation (Colón-Ramos et al., 2007; Kim and Emmons, 2017), following a procedure adapted from the Network Differential Gene Expression Analysis (nDGE) framework proposed in (Taylor et al., 2021). The procedure is described in Appendix nDGE-based Bio-Plausible Rules and summarized in Figure 3. Briefly, we first identified pairs of genes responsible for CAM production that were co-expressed in nerve ring neurons with synaptic connections but not co-expressed in cells sufficiently close to form synapses yet lacking connections. We then constrained the (learnable) genetic linked to these pairs to assume high probabilities (>0.5), while assigning low probabilities (<0.5) to all other rules.

A. Co-expression data computed following our extended nDGE variant from the expression patterns released with (Taylor et al., 2021). In the matrix, rows correspond to genes expressed in pre-synaptic neurons, while columns represent genes utilized in post-synaptic neurons. The green circles indicate pairs of genes that are co-expressed in neurons that are connected but not co-expressed in neurons that could be connected but lack synapses. Genes involved in co-expressed pairs are highlighted in bold. B. Visualization of the two sigmoids used to map the learnable parameters associated with the genetic rules into the probabilities in O. The green sigmoid is applied to the parameters corresponding to co-expressed pairs, while the pink sigmoid is applied to the remaining ones. We also show in blue the parameter distribution after initialization and in green (co-expressed pairs) and pink (non-co-expressed pairs) the distributions of the probabilities obtainable in an example scenario where 50% of the gene pairs are co-expressed. C. Genetic rules learned by SynaptoGen in our bio-plausible validations. The emerging patterns are fully consistent with the co-expression data in A. (correlation coefficients: 0.92, 0.74, 0.44, 0.84; same order as in C.) and demonstrate how the model assigned high probabilities (~1) specifically to the genetic rules corresponding to the pairs of co-expressed genes in the C. elegans nerve ring.

The results of these experiments are presented in Figure 4 and Table 2. In this context, the SNES agents improved their competitiveness in terms of average reward, surpassing SynaptoGen in Acrobot. Nevertheless, SynaptoGen maintained the highest average reward in Mountain Car and Lunar Lander. As in the previous set of experiments, the bio-plausible baseline consistently ranked last across all environments. The average reward calculated for the top-10 agents followed a similar trend, with SynaptoGen managing to achieve a tie in Cart Pole. Finally, in terms of resolution percentages, agents derived from SynaptoGen once again were the only ones to achieve non-zero percentages across all tested environments. Results for SNES agents remained polarized, with all simulated agents resolving the tasks in two environments but none performing adequately in the remaining two. The bio-plausible baseline again failed to produce agents capable of solving any task.

Mean reward distributions from the model families, characterized by a 39-gene profile which includes the genetic rules derived from C. elegans, tested on the four selected RL environments.

The green crosses represent the mean rewards, averaged over 10 episodes, achieved by the trained SynaptoGen models. Each scatterplot point represents the mean reward obtained by a specific agent. The black crosses denote instead the distribution means. Model families are color-coded. The dashed horizontal lines indicate the reward threshold beyond which the task associated with an environment is considered solved.

Summary metrics computed over the reward distributions obtained from the model families characterized by a 39-gene profile which includes the genetic rules derived from C. elegans.

Three metrics are reported: the distribution mean, a mean computed over the top-10 agents, and the percentage of simulated agents that solved the task. Each group of rows refers to one of the selected RL environments and the best scores are highlighted in bold.

Discussion

In this section, we will discuss the contributions of SynaptoGen to neuroscience, with a particular focus on the “genomic bottleneck” field (Zador, 2019), and its potential impact on technologies aimed at creating synthetic biological intelligence. We will also address the limitations of SynaptoGen in its current form and propose strategies to mitigate these challenges.

Firstly, SynaptoGen represents the first model capable of rigorously explaining synaptic multiplicity based on genes, their interaction probabilities, and gene expression, achieving (by design) unprecedented granularity in modeling synaptogenesis. Secondly, SynaptoGen establishes a direct relationship between the average number of synapses between neurons, their average synaptic weights, and vector representations of genetic expression and protein interaction probabilities through a differentiable function. This is a key feature, as it ensures compatibility with gradientbased optimization techniques, bringing several advantages discussed further below. Notably, since SynaptoGen can be deployed as a standard neural network, it broadens the accessibility of the simulation tools used in computational neuroscience to deep learning practitioners, whose contributions could significantly advance the field. Additionally, SynaptoGen can be integrated with any differentiable AI model.

Arguably, one of the most significant features of SynaptoGen is its performance. Simulations confirm that SynaptoGen generates genetic profiles capable of guiding the development of biological agents that perform effectively, in terms of mean reward and yield, in the environments for which they have been pre-wired. Agents derived from SynaptoGen consistently outperform the bio-plausible baseline across all experiments. This is not surprising, as populations of neurons left to interact without regulation are unlikely to form the connections necessary for task-specific behavior. More unexpectedly, SynaptoGen also outperforms agents optimized through SNES in most scenarios. Unlike SNES, SynaptoGen does not rely on evaluating the simulated agents’ performance during optimization, highlighting the strategic advantage of gradient-based methods in our framework. We hypothesize that gradient descent scales better with problem size and task complexity. Supporting this hypothesis are results from the Mountain Car and Lunar Lander environments. In these environments, generally regarded by the RL community as more complex than inverted pendulum tasks, the performance gap between the SynaptoGen agents and the SNES-derived ones was greater. Nonetheless, SynaptoGen remains flexible and general enough to support alternative optimization techniques if advantageous. Notably, SynaptoGen demonstrates strong effectiveness in “biologically realistic” simulations (i.e., simulations in which rules derived from co-expression data are utilized), maintaining its superiority despite the additional constraints introduced by biological genetic rules.

Despite these advances, there remains a considerable gap between our model and the biological reality of neuronal networks before it can be applied to real-world contexts. One major difference lies in the event-based nature of processing in biological neural networks. SynaptoGen can be adapted into a spiking neural network without sacrificing differentiability by using libraries like snnTorch (Eshraghian et al., 2023), which leverages surrogate gradients. This does not preclude exploring more complex neuron models, although such complexity may be unnecessary, as recent studies indicate that networks trained with simplified neuron models can be effectively deployed on neuromorphic hardware with negligible performance loss (Cakal et al., 2024). This result indeed hints at the fact that even simplified neuron models could represent the essential computational dynamics of biological networks. Another significant assumption in our model is that synapses are modeled as simple resistors. Whether this level of abstraction suffices for developing functional biological networks remains an open question.

Moreover, the current implementation of SynaptoGen assumes a feedforward multipartite network topology, which can be achieved by genetically inhibiting, through an additional set of genes, all synapses that could form between compatible neurons in non-adjacent layers. Another possiblity would be to employ an external pruning process. However, biological neuronal networks exhibit more complex, often cyclic wiring. SynaptoGen can be extended to account for this complexity by integrating its weight matrix decomposition into frameworks like 4Ward (Boccato et al., 2024a,b), which are designed to convert arbitrary graphs into neural networks trainable via backpropagation. Incorporating more complex topologies would also enable compliance with constraints derived from contactomic data (i.e., information about the physical contacts between neurons), as synapses can only form between neurons in close proximity. Using such topologies, SynaptoGen could learn the average number of synapses only for neuron pairs which are actually in contact. Another phenomenon which his not modeled in the current version of SynaptoGen is the modulation of genetic rules based on the distance between neurons. As highlighted in (Stöckl et al., 2022), connection probabilities decay exponentially with increasing distance. This behavior can be incorporated into SynaptoGen without compromising differentiability by assigning each neuron a positional embedding to calculate distances. This embedding could be learnable, given as a prior, or derived from genetic expression. We recall that a function such as , where pu and pυ are the positions of neurons u and υ, is differentiable w.r.t. the positions except at coincident points.

Finally, the bio-plausibility of SynaptoGen could be enhanced by integrating experimental data on the circuit-level properties of synapses formed by specific gene interactions-parameters - these are currently treated as learnable. While the data available for integration is limited, promising progress by various research groups suggests this limitation may soon be alleviated (Taylor et al., 2021; Kovács et al., 2020; Fenyves et al., 2020; Harris et al., 2022). Additionally, developing more faithful synaptogenesis simulation techniques-such as replacing average conductances with further sampling from the learned distributions-could make SynaptoGen even more biologically accurate.

Conclusions

In this work, we introduced SynaptoGen, a novel computational framework aimed at advancing the field of synthetic biological intelligence by addressing a critical challenge: controlling the genetic factors underlying synaptogenesis to develop neural agents pre-wired for task-specific behavior. By modeling synaptic multiplicity through gene expression and protein interaction probabilities, SynaptoGen extends the existing frameworks by offering a granular approach to understanding neural connectivity. SynaptoGen is supported by an innovative use of the formalism of random variables and complemented by rigorous mathematical proves.

The differentiable nature of SynaptoGen makes it uniquely suited for integration with gradientbased optimization techniques. This capability enables the joint optimization of genetic quantities of interest and the performance of resulting agents in selected tasks, thus bridging a critical gap between biological plausibility and computing. Out validation experiments demonstrated the effectiveness of SynaptoGen across multiple RL benchmarks, with agents derived from the framework consistently outperforming the considered baselines. These results highlight the potential of SynaptoGen as a foundational tool for generating task-specific biological agents with unparalleled precision.

While SynaptoGen marks significant advancements over the existing methodologies, important challenges remain before its application in real-world contexts can be conceptualized. Key gaps include the event-driven nature of biological neural networks, the employed synapse model, and the reliance on feedforward topologies that do not fully capture the complexity of biological wiring. We proposed several mitigations to address these limitations, such as adapting the framework for spiking neural networks, incorporating positional embeddings to account for distance-dependent connectivity, and leveraging experimental data to refine circuit-level properties. These extensions will not only enhance the biological plausibility of SynaptoGen but also expand its applicability to more complex and realistic scenarios.

By enabling the precise manipulation of genetic rules governing neural circuit formation, we expect SynaptoGen to lay the foundation for groundbreaking advancements in the development of biological agents and the realization of applications that are currently beyond our reach.

Appendix 1

The Polarity Matrix

As outlined in (Fenyves et al., 2020), synapse polarity in C. elegans, a well-studied small nematode, is described by the interplay among 3 neurotransmitters-glutamate, acetylcholine, and GABA-and their corresponding receptors. Specifically, each neurotransmitter can be associated with receptors capable of exerting both excitatory and inhibitory effects on synaptic connections. This relationship can be represented by a 3 × (2 · 3) polarity matrix:

Here, each neurotransmitter synthesized in pre-synaptic neurons can be bind to either a positive (+) or negative (−) receptor in post-synaptic neurons. The 0s in the matrix signify that receptors attuned to a specific neurotransmitter are incapable of receiving different ones. This formalism readily extends to accommodate an arbitrary number of neurotransmitters by setting M = 2L and expanding A to an L × 2L block diagonal matrix, where each block is represented as [1,-1]. It is worth noting that, in the experiments described in Section Results, the entries of A do not belong to the set of learnable parameters.

Simulating Synaptogenesis

Algorithm 1 outlines one possible implementation of the synaptogenesis simulation introduced at the end of Section Methods, which we used in our experiments.

Our synaptogenesis simulation procedure.

In this algorithm, G represents the number of genes involved, notation ·:,i represents selection of the i-th column, ⊗ denotes the outer product, the method round() performs pointwise rounding to the nearest integer, while is the contribution of gene pair (i, j) to the number of synapses of neuron pair (u,υ). All quantities mentioned in Algorithm 1 are restricted to a specific neural layer.

The coefficient α is a general correction factor used to control the average degree of nodes in the generated networks. The necessity of this correction is discussed in Appendix A Problem of Quantization. Here, we show that this correction does not impact the average weight matrices learned by SynaptoGen:

Proposition 3

The average weight matrix resulting from the correction shown in Algorithm 1, with α> 0, coincides with the average weight matrix learned by SynaptoGen, .

Proof.

In all our simulations, we chose α such that the resulting networks had an average degree of 105. While this value is quite high (the typical number of synapses per neuron in the human brain is on the order of magnitude of 104 (Zhang, 2019)), it aligns with the connectivity levels observed in certain types of cells in specific brain regions, e.g., Purkinje cells in the cerebellar cortex (Napper and Harvey, 1988).

A Problem of Quantization

The synaptogenesis simulation introduces a subtle “quantization” issue. When the order of magnitude of the input provided to a SynaptoGen network and the average conductances associated with each pair of neurons are fixed, the number of synapses determines the range of values each weight can take. Conversely, when the number of synapses is fixed, the average conductances dictate the granularity with which a given range can be represented. Thus, the interplay between these factors significantly affects the degree of synaptic weight quantization. Additionally, the role of rounding, introduced in Algorithm 1 to ensure that the parameters nij of the binomial random variables are integers, cannot be overlooked. The inherently discrete nature of these random variables also plays a fundamental role in the quantization process.

It is therefore interesting to study the error , specifically the contribution to this error made by a single gene pair for a specific pair of neurons. It can be shown that:

Proposition 4

When considering an optimal simulated agent-characterized by a number of synapses as close as possible to the mean learned by SynaptoGen-and assuming the correction introduced in Algorithm 1 is applied, the error e between the mean number of synapses predicted by SynaptoGen and the actual number of synapses in the agent is bounded by .

Proof.

In this proof, , the outer round in step (15) helps account for the nearest integer approximation ofthe mean number of synapses learned by SynaptoGen, and steps (17) and (18) are derived from bounds applied to the two rounding operations.

It is evident that choosing a sufficiently large α-and thus achieving a sufficiently high average neuronal degree-can bring the weight matrices ofthe best simulated agents closer to the optimal matrices identified by SynaptoGen. This reduces the performance loss of these agents on the task for which they were pre-wired.

Training Details

This section provides all the details related to the training of the SynaptoGen models that were not explicitly mentioned in Section Methods Neural networks were implemented as custom MLPs featuring a single hidden layer with 128 neurons, a number that allowed us to easily conduct experiments with the computational resources available. The input and output layers were configured with a number of neurons equal to the dimensionality of the observation space and the number of possible actions in the selected RL environments, respectively. Consequently, the size of the input layer ranged from 2 to 8, while the size of the output layer ranged from 2 to 4. Details about the RL environments can be found in the Gym library documentation: https://www.gymlibrary.dev/index.html. Learnable parameters associated with genetic rules and conductances were initialized following the procedure described in (Barabási et al., 2023). All other parameters were initialized using a Kaiming normal distribution (He et al., 2015).

Trainings were conducted using the Adam optimizer (Kingma and Ba, 2015), with its default parameters unchanged except for the learning rate. Training consisted of 500000 environment steps. To minimize randomness in the results and mitigate sub-optimality associated with specific hyperparameter choices, the genetic profiles used in simulations were selected through a hyperparameter grid-search. This search explored three different learning rates (0.03, 0.003, 0.0003) and three random seeds for various sampling processes. Specifically, after hyperparameter optimization, we retained the checkpoint corresponding to the best validation performance achieved by the models. Validation was conducted every 10000 training steps. During both validation and subsequent simulations, agent rewards were computed as the average scores over 10 different episodes. The remaining hyperparameters related to the DQN algorithm were sourced from: https://github.com/DLR-RM/rl-baselines3-zoo/blob/master/hyperparams/dqn.yml. Additional information can be found in the released code.

The Separable Natural Evolution Strategy (SNES) Baseline

As mentioned in Section Results, the SNES baseline was constructed by replacing gradient descent in our SynaptoGen framework with SNES, an optimization technique introduced in (Schaul, 2012) and employed by the authors of (Stöckl et al., 2022) for a related task, namely the optimization of probabilistic skeletons. The core idea behind SNES is to sample λ offsprings from a normal distribution of the type Ɲ(μ, Iσ), where μ is built by concatenating the flattened learnable matrices of SynaptoGen. These offsprings are evaluated based on their fitness, allowing the distribution to adapt and better capture those regions of the parameter space where offspring fitness is higher. The implementation details of SNES are provided in the pseudo-code of Algorithm 2.

The SNES optimization technique.

Inside the algorithm, the vectors sk contain the “noise” used in generating the offsprings, σ is a vector of variances initialized in accordance with (Stöckl et al., 2022), F is the fitness function, ημ and ησ are two learning rates > 0, and the method steps_performed() returns the number of actions taken during the current iteration on the RL environment being optimized. Specifically, we defined the fitness function F(·) as the average performance achieved by m simulated biological agents based on the genetic profile provided as input to the function. Individual scores were calculated by averaging the rewards obtained by an agent over 10 different episodes.

To ensure fair comparisons with the profiles obtained through gradient descent, we limited the number of actions executable on a RL environment to 500000 in the context of a single optimization. Additionally, for this baseline, we conducted a hyperparameter search similar to the one described in Appendix Training Details, varying m in the set {10,20, 30} and assigning λ, during the search, the values 8,16, and the default value of the SNES implementation used (https://github.com/pybrain/pybrain).

The Bio-Plausible Baseline

As a control, we designed an additional baseline, referred to as bio-plausible in the main text, intended to model a scenario where a neural population with a predefined number of neurons is left free to form synapses without any manipulation of genes or gene expression in individual neurons tailored to a specific task. In other words, this baseline was designed to assess whether there is a performance difference between agents produced by SynaptoGen and neuronal networks that develop without guidance, or equivalently, to determine whether the scores achieved by SynaptoGen agents are due to chance.

To do this, we fixed the genetic rules learned by SynaptoGen for each environment as the biological ground truth and initialized the gene expression of the individual neurons in the tested agents according to the lineal model proposed in (Kerstjens et al., 2022). This model is simple yet profound and elegant, and it is capable of predicting phenomena later validated by existing experimental data (Kerstjens et al., 2024). According to the lineal model, the expression inherited by two cells u and υ resulting from mitosis is calculated by adding a differential expression vector, drawn from a normal distribution, to the expression of the parent cell p. Formally:

Here, the genetic expression pattern c = [xT, qT, rT]T of a generic neuron is understood as the concatenation of genetic expression responsible for synaptic multiplicity, neurotransmitter distribution, and receptor distribution. With a slight abuse of notation, we are actually using these variables to refer to the previously mentioned vectors before they are normalized or otherwise mapped into the domain of the genetic quantities of interest.

In more detail, for each agent, we sampled the genetic expression profile of a virtual zygote from a normal distribution and simulated a lineage tree with as many leaves as the number of neurons in the agent. This allowed us to assign gene expression profiles to individual cells following the equations above. We finalized the procedure by randomly assigning each neural layer expression profiles corresponding to leaves from the same complete subtree, simulating spatial contiguity for neurons within the same layer. The procedure we implemented is detailed in Algorithm 3. In the pseudo-code, the first line initializes the zygote’s genetic expression, N corresponds to the target number of cells to be produced, the notation ·1…N selects the first N columns ofthe respective matrix, function rand_roll () circularly permutes the columns ofthe input matrix, and Cin, chidden and Cout represent the matrices associated with the genetic expressions of the various layers of the agent being initialized.

Our lineal model-based initialization procedure for gene expression.

Reward Thresholds for Considering a Task Solved

There is no consensus within the RL community on the definition of “solving” when it comes to the most popular benchmark tasks used for evaluating new algorithms. Therefore, we derived reward thresholds, based on the guidelines provided at https://www.gymlibrary.dev, to serve as a reference for distinguishing agents capable of solving their respective RL environments from those unable to achieve satisfactory performance.

For Cart Pole, the threshold was set to 195, which represents the number of consecutive time steps an agent must keep the pole balanced. The environment provides a reward of +1 for every step the pole remains upright. For Mountain Car, we used a threshold of −200. The objective of this task is to propel the car to the top of a hill within the environment. We consider the task solved in all cases where the car successfully reaches the goal, and consider it unsolved only when the car fails to reach the goal within the simulation time, which results in a reward of-200. In Lunar Lander, the threshold is set to 200, a reward that Gym documentation associates with a soft landing on the designated landing pad, without any crashes. Finally, for Acrobot, we consider the task solved if the agent manages to swing the chain above the target height before the simulation ends. This corresponds to achieving a reward greater than −500.

Additional Results

We provide here, for the sake of completeness, the results obtained by optimizing models characterized by genetic profiles containing 32 and 64 genes. These results are shown in Figures 1, 2 and Tables 1,2.

Mean reward distributions from the model families, characterized by a 32-gene profile, tested on the four selected RL environments.

The green crosses represent the mean rewards, averaged over 10 episodes, achieved by the trained SynaptoGen models. Each scatterplot point represents the mean reward obtained by a specific agent. The black crosses denote instead the distribution means. Model families are color-coded. The dashed horizontal lines indicate the reward threshold beyond which the task associated with an environment is considered solved.

Summary metrics computed over the reward distributions obtained from the model families characterized by a 32-gene profile.

Three metrics are reported: the distribution mean, a mean computed over the top-10 agents, and the percentage of simulated agents that solved the task. Each group of rows refers to one of the selected RL environments and the best scores are highlighted in bold.

Mean reward distributions from the model families, characterized by a 64-gene profile, tested on the four selected RL environments.

The green crosses represent the mean rewards, averaged over 10 episodes, achieved by the trained SynaptoGen models. Each scatterplot point represents the mean reward obtained by a specific agent. The black crosses denote instead the distribution means. Model families are color-coded. The dashed horizontal lines indicate the reward threshold beyond which the task associated with an environment is considered solved.

Summary metrics computed over the reward distributions obtained from the model families characterized by a 64-gene profile.

Three metrics are reported: the distribution mean, a mean computed over the top-10 agents, and the percentage of simulated agents that solved the task. Each group of rows refers to one ofthe selected RL environments and the best scores are highlighted in bold.

nDGE-based Bio-Plausible Rules

To produce the genetic rules used in the bio-plausible validation in Section Results, we first computed the gene co-expressions of the C. elegans nerve ring by performing a generalized nDGE analysis on the dataset provided in (Taylor et al., 2021). This generalization was necessary because the nDGE implementation available at https://github.com/cengenproject/connectivity_analysis is designed to calculate local co-expressions, specifically limited to a set of cells consisting of a pre-synaptic neuron and its corresponding post-synaptic partners.

To compute global co-expressions, we identified all pairs of neurons that shared synapses and all the pairs that were in contact without being connected. Then, for each possible pair of genes, we calculated co-expressions for both sets and conducted a T-test with Bonferroni correction. Pairs with a p-value < 0.05 were considered co-expressed. The detailed procedure is outlined in Algorithm 4. In this pseudo-code, B and C represent the connectome and the contactome ofthe neuronal network under analysis, respectively, ⊕denotes a pointwise XOR operation, and G corresponds to the number of genes analyzed. The third-level “for” loops, instead, separate the indices of pre- and post-synaptic neurons involved in the pairs being evaluated in each iteration. The co-expression of a specific pair of genes for all neuron pairs in a given set is calculated as the pointwise product , where C is the matrix containing gene expression data for all neurons, and the notation ·i,u selects the i-th row (corresponding to gene i) and the columns associated with the neurons in u. The method ttest() performs the T-test, while pij and gij represent the p-value for the gene pair (i,j) and its corresponding binary co-expression, respectively.

Our global nDGE variant.

After obtaining the co-expressions (illustrated in Figure 3), and to avoid constructing overly artificial rules-since co-expression reflects correlation while genetic rules imply causation-while still maintaining sufficient bio-plausibility, we allowed the models to learn the matrix O but constrained the domain of its entries. Specifically, starting from Synap- toGen’s learnable parameters, we computed the genetic rules matrix O differently for coexpressed and non-co-expressed gene pairs. The approach involved pushing the rules for co-expressed pairs toward probabilities close to 1, while those for non-co-expressed pairs were pushed toward probabilities close to 0. This was implemented using two sigmoid functions, also shown in Figure 3:

Here, Ô represents the parameter matrix from which the genetic rules O are derived, σÔ is the standard deviation of the Ô values’ distribution, and T is the temperature used. The learned genetic rule matrices shown in Figure 3 confirm that SynaptoGen assigned high probabilities (i.e., approximately 1) to co-expressed pairs, supporting the utility of coexpression data for optimization purposes.

Acknowledgements

This work is supported and funded by: #NEXTGENERATIONEU (NGEU); the Ministry of University and Research (MUR); the National Recovery and Resilience Plan (NRRP); project MNESYS (PE0000006, to NT) - A Multiscale integrated approach to the study of the nervous system in health and disease (DN. 1553 11.10.2022); the MUR-PNRR M4C2I1.3 PE6 project PE00000019 Heal Italia (to NT); the NATIONAL CENTRE FOR HPC, BIG DATA AND QUANTUM COMPUTING, within the spoke “Multiscale Modeling and Engineering Applications” (to NT); the European Innovation Council (Project CROSSBRAIN - Grant Agreement 101070908, Project BRAINSTORM - Grant Agreement 101099355); the Horizon 2020 research and innovation Programme (Project EXPERIENCE - Grant Agreement 101017727). Tommaso Boccato is a PhD student enrolled in the National PhD in Artificial Intelligence, XXXVII cycle, course on Health and Life Sciences, organized by Università Campus BioMedico di Roma.