When large groups of neurons exhibit joint activity, they are often assumed to form a functional unit, referred to as a neural assembly (Harris, 2005). Neural assemblies are thought to form elementary computational units that are essential for cognitive functions such as short-term memory, sensorimotor computation, and decision-making (Harris, 2005; Hebb, 1949; Gerstein et al., 1989). Recent advancements in neuroimaging methods now enable us to study the role of these neural assemblies in more detail. For example, a large breakthrough is the introduction of light-sheet microscopy which enables functional recordings of whole-brain volumes, thereby allowing the study of how complex computation emerges in the brain (Ahrens et al., 2013). It, however, remains a computational challenge to extract neural activation patterns from such datasets comprising ∼100.000 neurons or more.

Recent work introduced an unsupervised statistical method using the compositional Restricted Boltzmann Machine (cRBM), which combines both methods and can learn both a latent representation and a distribution of the data by observing samples from that distribution (Tubiana et al., 2019a). The cRBM is an extension of the Restricted Boltzmann Machine (RBM) (Smolensky, 1986), an undirected graphical model that consists of two layers of random variables, representing the data itself (through a set of visible units) and its lower-dimensional latent representation (through a set of hidden units). This model can learn low-dimensional representations of a data set by using maximum likelihood optimization, matching the model and empirical distributions (Salakhutdinov et al., 2007). The cRBM extends the classical RBM by adhering to a set of structural conditions (see Methods and Materials), pushing it to operate in a state referred to as the compositional phase (Tubiana and Monasson, 2017; Tubiana et al., 2019a,b). In the compositional phase, the visible-to-hidden connections in the RBM are sparse, and in our previous work (van der Plas et al., 2023) this Research Advance is based on, we show that the associated neural assemblies of the hidden units are localized and span the entire space of the visible units. Furthermore, only a small fraction of the hidden units are active at any point in time, making the model more interpretable.

Although the cRBM can accurately reproduce neural statistics and produce a low-dimensional representation of the high-dimensional data, this model is limited by capturing only static dependencies, without any specific account of temporal dependence. Neural activity driving animal behavior is expressed in both stochastic and deterministic states. For this reason, we here include temporal dependencies directly into the model by using the recurrent temporal RBM (RTRBM) (Sutskever et al., 2008). This model retains the advantages of the cRBM, but additionally accounts for deterministic dynamics underlying neural activity.

The RTRBM is a recurrent neural network constructed by chaining multiple RBMs in time (Mittelman et al., 2014). Each RBM has a hidden state that is conditioned on the expected hidden state of the RBM at the preceding time step. While temporal connections are constrained to single time steps, the recurrency in the model indirectly accounts for multi-timestep dependencies. Previous studies using RTRBMs in other domains have highlighted the value of including such temporal dependencies in extracting spatiotemporal features from high-dimensional data (Boulanger-Lewandowski et al., 2012; Li et al., 2018; Zhang et al., 2018).

In this work, we apply the RTRBM to both simulated and real data. First, we show that the model is capable of retrieving the artificial neural assemblies and their temporal connections in a fully simulated network with only a few hidden populations. We compare the resulting RTRBM with an RBM that is trained on the same data, and show that it outperforms in terms of generalization error, pairwise moments, and time-shifted pairwise moments. We then use a combined approach of the RTRBM and the cRBM to model the temporal connections of different neural assemblies in whole-brain neuronal zebrafish data through transfer learning by initializing RTRBM weights with their cRBM counterparts. The resulting RTRBM model extends upon the neural assemblies identified by the cRBM models by additionally capturing their temporal dependencies. We demonstrate that this extension improves the reconstructive power in terms of the estimated moments and provides additional temporal information regarding the underlying structure of the brain.


In this work, we investigated whether the inclusion of temporal dependencies between neural assemblies improves the representation of neuronal activity in the context of simulated data and whole-brain light-sheet recordings from zebrafish larvae (n = 8, 40709±13854 neurons, (van der Plas et al., 2023)). For this purpose, we first introduce and compare the recurrent temporal restricted Boltzmann machine (RTRBM) to the compositional RBM (Tubiana et al., 2019a) and then use two-step transfer learning to arrive at an estimate of the RTRBM that stays in the compositional phase and maintains the locally restricted assemblies identified by the cRBM.

The RTRBM extends the RBM by temporal dynamics on the assembly level

The principal structural difference between the RBM and the RTRBM is the addition of recurrent connections through a set of weights U in the RTRBM that connect the hidden unit states at time steps t − 1 and t. These connections allow the RTRBM to incorporate temporal dynamics from the data, while the (c)RBM is only able to represent time-independent, statistical relationships. This difference is illustrated in a small example of neural assemblies in Figure 1.

The RTRBM extends the RBM by additionally accounting for temporal interactions of the neural assemblies. (A) A schematic depiction of an RBM with visible units (neurons) on the left, and hidden units (neural assemblies) on the right. The visible and hidden units are connected through a set of weights W. (B) An example W matrix where a subset of visible units is connected to one hidden unit. Details of the equations in panel B and E are given in Methods and Materials. (C) Hidden and visible activity traces generated by sampling from the RBM. Due to its static nature, the RBM samples do not exhibit any sequential activation pattern, but merely show a stochastic exploration of the population activity patterns. (D) Schematic depiction of an RTRBM. The RTRBM formulation matches the static connectivity of the RBM, but extends it with the weight matrix U to model temporal dependencies between the hidden units. (E) In the present example, assembly 1 excites assembly 2, assembly 2 excites assembly 3, and assembly 3 excites assembly 1, while the remaining connections were set to 0. (F) Hidden and visible activity traces generated by sampling from the RTRBM. In contrast to the RBM samples, the RTRBM generates samples featuring a sequential firing pattern. It is able to do so due to the temporal weight matrix U which enables modeling temporal dependencies.

When applying the RBM to neural data, the neurons are represented by the visible units, while the underlying neural assemblies are represented by the hidden units (see Figure 1 for a visualization). In this basic example, we connect each assembly to an exclusive set of neurons for simplicity (Figure 1A). In the case of the RTRBM, there are additional, direct connections between the assemblies (Figure 1D, red arrows, defined by the weights U). To emphasize the resulting difference in temporal dynamics, we set up an RBM and an RTRBM with matching hidden to visible connections W (Figure 1B,E). We set these connections up so that each neural assembly is connected to a single hidden unit. In this manner, the hidden unit dynamics act as assembly dynamics, where each hidden unit represents a distinct assembly. In addition, the RTRBM has a weight matrix U that models the inter-assembly temporal dynamics. The activation function applied to the sum of the inputs is a sigmoid function, which enforces the output to be in the range [0, 1]. The weights that connect the visible to the hidden units outside their assigned assembly must therefore be negative to prevent their participation in other assemblies.

To demonstrate the resulting difference in temporal dynamics, we sample from the RBM and RTRBM and compare the resulting activity traces. In the RBM, as expected from the model definition, the stochastic sampling between the hidden and visible units does not lead to systematic sequential activations of the assemblies occur, aside from some persistence due to the reactivation of similar ensembles through the weights W.

Conversely for the RTRBM, a combination of temporal sequences and stochastic exploration can be realised: In this simple example of an RTRBM, assembly 1 excites assembly 2, assembly 2 excites assembly 3 and assembly 3 again excites assembly 1 (see Figure 1E). As a result, the hidden activity traces of data sampled from the RTRBM show matching sequential activation of hidden units (Figure 1F). Because each hidden unit is connected to a subset of visible units, this results in a sequential activation of the assemblies, where typically only one assembly is strongly active at each time step. As the representation of the RTRBM is still probabilistic, the dynamics display a mixture between dynamic, and stochastic properties, which we consider a hallmark of neural activity.

In summary, the RTRBM maintains the features of the RBM to provide an interpretable and probabilistic representation of neural data, but extends it to include temporal dependencies between neural assemblies.

RTRBMs learn assembly dynamics from simulated neural data

In the above example, the neural assembly connectivity was predefined. Next, we demonstrate that the RTRBM can be trained on simulated neural data to learn a set of weights W and U that correctly capture the underlying temporal dynamics on the assembly level. In this test case, we indeed find the RTRBM to outperform the RBM in the representation of the underlying moments.

We devised a method for generating artificial data sets mimicking neural population activity using a simplified neural network model. Here, neural activity is driven by the population activity of underlying neural assemblies. These activities of assemblies were determined by two factors: endogenous, assembly specific activations, and recurrent activations through the connections between assemblies Figure 2A, left). The activity of the neurons was then generated from a Poisson process whose time-dependent rate was given by the activations of a single assembly population. For clarity of the presentation, we here again implement a direct match between assemblies and neurons, and thus expect the estimated weight matrix to be a ‘diagonal’ matrix between assemblies and neurons.

The RTRBM outperforms the RBM on sequential statistics on simulated data. (A) Simulated data generation: Hidden Units interact over time to generate firing rate traces which are used to sample a Poisson train. For example, assembly 1 drives assembly 2 and inhibits assembly 10, both at a single time step delay. (B) Schematic depiction of the RBM and RTRBM trained on the simulated data. (C) For the RBM, the aligned weight matrix Ŵ contains spurious off-diagonal weights, while the RTRBM identifies the correct diagonal structure (top). For the assembly weights U (left), the RTRBM also converges to similar aligned temporal weights Û (right). (D) The RTRBM attributes only a single strong weight to each visible unit ((wi,j > 0.5σ, where σ is the standard deviation of W)), consistent with the specification in W, while in the RBM multiple significant weights get assigned per visible units. (E) The RBM and RTRBM perform similarly for concurrent (⟨vi⟩, ⟨vi vj⟩) statistics, but the RTRBM provides more accurate estimates for sequential statistics. In all panels, the abscissa refers to the data statistics in the test set, while the ordinate shows data sampled from the two models respectively. (F) The trained RTRBM and the RBM yield similar concurrent moments, but the RTRBM significantly outperformed the RBM on time-shifted moments (see text for details on statistics). (G) The RTRBM achieved significantly higher accuracy when predicting ahead in time from the current state in comparison to RBMs for up to 4 time steps.

Figure 2—figure supplement 1. Alignment of weight matrices after learning.

An RBM (Figure 2B) trained on the simulated data recovers a non-diagonal weight matrix (Figure 2C), which is composed of both the true assembly-to-neuron weights on the diagonal, but in addition has multiple off-diagonal weights, which partially account for the dependencies between the assemblies.

In contrast, the RTRBM correctly segments all ten assemblies, recovering a clean ‘diagonal’ connectivity matrix Ŵ (Figure 2C, right), in addition to providing a close estimate Û to the true assembly connectivity matrix U, i.e. it recovers the underlying hidden connections from the activation patterns of neurons alone. Consistently, each visible neuron has only a single dominant weight (|wij |) in the RTRBM and thus produces a diagonal weight matrix, while the RBM assigns multiple strong weights to an RBM to address the time-dependencies (Figure 2D).

As expected, the RBM performs very well in capturing the average activations of the visible units (⟨vi⟩) and their correlations (⟨vi vj⟩), referred to as first and second order moments, respectively (Figure 2E, top), however, cannot accurately capture the time-shifted moments of the visible or hidden units (Figure 2E, bottom). The RTRBM performs similarly for the simultaneous moments (Figure 2E, top), however, the RTRBM provides a more accurate account of the time-shifted moments (Figure 2E, bottom). This behavior is consistent for the different moments across multiple runs on independent simulated data sets and model estimates (N = 10, Figure 2F), with significant improvements of the RTRBM observed for the time-shifted moments (p-values 0.907 for vi, 0.485 for hi, 9.13 ⋅ 105 for and 2.91 ⋅ 104 for one-sided Mann-Whitney U test).

Lastly, the RTRBM also exhibited a significantly higher accuracy (p = 1.479 ⋅ 10−14, 2-way ANOVA with time steps and model type as factors, N = 10, Figure 2G) when predicting ahead in time inside the simulated data (not used in training). The RTRBM’s advantage in prediction stayed significant up to 4 time steps (p < 0.05, two group t-test per time step with Bonferroni correction for the number of time steps). This decay of differences between the models is expected, as the probabilistic basis of the RBM/RTRBM by design leads to non-deterministic trajectories, similar to the divergence of trajectories in non-linear dynamic systems where small noise eventually leads to large differences (Strogatz, 2000) (see Discussion for a relation to animal behavior).

These results indicate that the RTRBM provides a more accurate account of the model structure and data statistics in particular for sequential activations. The RBM can only account for the temporal structure partially, however, thereby conflating it with non-temporal weights in W.

The RTRBM outperforms the cRBM on whole-brain zebrafish data

Next, we trained the RTRBM on whole-brain recorded in zebrafish larvae (n = 8, same data as in van der Plas et al. (2023)). To obtain binarized spike traces that can be used by the RTRBM, the individual fluorescence traces are deconvolved by means of blind sparse deconvolution (Figure 3A). Model training was performed using a trained cRBM as the basis for the assembly-to-neuron weights W, and then training the temporal assembly-to-assembly weights U, while allowing W to only change slightly (the learning rate for these weights is reduced by two orders of magnitude, see Methods and Materials for more details on model training). For each animal, model training was successful and the weight changes converged to small values. This approach of using pre-learned weights can be seen as a variant of transfer learning (Tan et al., 2018). This choice of training procedure was used as the weight matrix W inferred by the RTRBM is rarely able to identify localized receptive fields for a large portion of hidden units within its present, non-compositional, formulation.

RTRBM often outperforms the cRBM on zebrafish data (A) Whole-brain neural activity of larval zebrafish was imaged via Calcium-indicators using light sheet microscopy at single neuron resolution (left). Calcium activity (middle, blue) is deconvolved by blind, sparse deconvolution to obtain a binarized spike train (middle, black). The binarized neural activity of 1000 randomly chosen neurons (right). (B) Left: Distribution of all visible-to-hidden weights. Here, a strong weight is determined by proportional thresholding, wi,j > wthr. Here wthr is set such that 5000 neurons have a strong connection towards the hidden layer. Right: log-weight distribution of the visible to hidden connectivity. (C) The RTRBM extracts sample assemblies (color indicates assembly) by selecting neurons based on the previously mentioned threshold. Visible units with stronger connections than this threshold for a given hidden unit are included. Temporal connections (inhibitory: blue, excitatory: red) between assemblies are depicted across timesteps. (D) Temporal connections between the assemblies are sorted by agglomerative clustering (dashed lines separate clusters, colormap is clamped on the range [1, 1] for visibility). Details on the clustering method can be found in Methods and Materials. (E) Corresponding receptive fields of the clusters identified in (D), where the visible units with strong weights are selected similarly to (B). (F) Comparative analysis between the cRBM (bottom row) and RTRBM (top row) on inferred model statistics and data statistics (test dataset).Compared in terms of Spearman correlations and sum square difference. From left to right: the RTRBM significantly outperformed the cRBM on the mean activations ⟨< vi> (p < ϵ), pairwise neuron-neuron interactions, ⟨vivj ⟩ (p < ϵ) time-shifted pairwise neuron-neuron interactions and time-shifted pairwise hidden-hidden interactions for example fish 4. (G) The methodology in panel F is extended to analyze datasets from eight individual fish. Spearman correlation and the assessment of significant differences between both models are determined using a bootstrap method (see Methods and Materials for details).

The reconstruction error in the initial phase of model training was ∼ 0.40, the RTRBM was able to reduce this to ∼ 0.072, which it achieves predominantly by adjusting the temporal weights. The trained RTRBM model maintained the localised neural assemblies inherited from the cRBM (Figure 3C) as quantified by a sparse weight distribution (Figure 3B, left) and a comparable, lower number of typically 1-3 strong weights per neuron as in the cRBM (Figure 3B, right).

Most hidden units showed self-excitation, i.e. indicated as a positive value on the diagonal of Û. The overall pattern of temporal connections between the assemblies in Û could be divided into several groups. To this end, agglomerative clustering (see Methods and Materials for details) can be applied to the incoming (row) or outgoing (columns) connections of the matrix Û to identify assemblies with similar temporal structures. We here focus on the incoming connections/receptive fields, as their grouping was more clear (Figure 3D, dashed lines indicate boundaries between clusters). The clustering for outgoing connections was similar, however, not identical as Û are generally not symmetric, due to the directedness of the temporal connections. The identified clusters showed characteristic patterns of connectivity, e.g. clusters 1, 2 and 6 predominantly show recurrent connectivity with themselves and a wide variety of inter-cluster connections (Figure 3D). Cluster 4 shows even stronger recurrent intra-cluster connectivity, but also strongly inhibits all other clusters. Cluster 5 is dominated by strong inhibitory connectivity to itself and to cluster 2. Cluster 3 shows a more diverse connectivity pattern overall and does not show strong intra-cluster recurrent connectivity. By using a lower clustering threshold, an even further refined clustering structure appears (data not shown). The sets of strongly connected visible units corresponding to the hidden unit clusters furthermore formed spatially localized sets of neurons (Figure 3E).

To compare the performance between the inferred RTRBM and cRBM models, we analysed the reconstruction quality. To this end, we sampled data from both models and again compared the model statistics to the data statistics, using an unseen test set (matched to the test set in (van der Plas et al., 2023), see Methods and Materials for Details). For the example fish in Figure 3B-F, the first order moments between neurons ⟨ vi⟩ strongly correlated for the RTRBM (rp = 0.90, p < ϵ, where < ϵ denotes machine precision), that is close to the performance of the cRBM (rp = 0.91, p < ϵ). The second order moments between neurons ⟨vivj ⟩ of the cRBM (rp = 0.73, p < ϵ) correlate better compared to the RTRBM (rp = 0.70, p < ϵ). To establish how well both models can capture the temporal dynamics of the data, we compared the time shifted moments of the visible and hidden units. The time shifted moments of the visible units of the RTRBM (rp = 0. 74, p < ϵ) correlates better than the cRBM (rp = 0.69, p < ϵ). Furthermore, the time shifted moments of the hidden units of the RTRBM (rp = 0.96, p < ϵ) also correlates better than the cRBM (rp = 0.83, p < ϵ).

The Pearson correlation is scale-free, i.e. even if one variable is doubled, the correlation can stay the same. However, in many cases, the sampled RTRBM was much closer to the test data in absolute terms (indicated by densities in Figure 3D that are closer to the diagonal). To quantify this difference, we also compared the Sum Square Difference (SSD) between the sampled statistics of the RTRBM and cRBM with the statistics of the test set, to determine how well both models accounted for the real data on a quantitative, absolute level. The RTRBM had a lower SSD for all first-, second- and time-shifted moments compared to the cRBM. This suggests that the statistics of the RTRBM better matched in an absolute sense, and can be considered as better behaved than the statistics of the cRBM when compared to the test set.

Over the whole dataset, the RTRBM outperforms the cRBM on 5 out of 8 fish for ⟨vi⟩, and on 4 out of 8 fish for ⟨vivj⟩, while the performance was similar on the remaining fish, except for fish 3. To establish how well both models can capture the temporal dynamics of the data, we compared the and. Here, the RTRBM outperformed the cRBM on 4 out of 8 fish for the visible units and on 6 out of 8 fish for the hidden units. Note, that the second order statistics are statistics the algorithms are not explicitly trained on to replicate (see also Methods and Materials).

In summary, the transfer learning approach was able to successfully expand the cRBM model to include the temporal connections, while maintaining a high level of sparsity in the hidden-to-visible layer connections. It is beyond the scope of this study to evaluate the detailed differences between the two models, but this transfer learning approach appears a promising avenue to enable the RTRBM to be estimated on large scale data sets.

Identification of the underlying time-scale of assembly interactions

The sampling rate in an experiment will generally not match the effective interaction time between neural assemblies. If the mismatch is too large, it may prevent the RTRBM from making a correct estimate of the temporal connections. It is therefore important to be able to estimate the interaction time of assemblies in relation to the sampling rate.

To investigate this issue, simulated data was generated where the interaction time between assemblies was set to ΔtA = 4 time steps (relative to the sampling rate of the simulated data, Figure 4A, left). This simulated data was then down-sampled to different rates using integer steps on the range [1, 10] (Figure 4A, middle). Down-sampling was performed by selecting the value of the data at the sampling interval, rather than averaging or interpolating over all points in the interval. This choice was motivated by the fact that light-sheet imaging only has access to the neural activity at particular time-points and cannot average the entire duration between these time-points (as it is imaging at different planes in depth in between). As the size of the system should not make a qualitative difference for this analysis, we generated simulated data with only Nh = 10 hidden assemblies and Nv = 20 visible units per assembly. Different runs (N = 10) were inferred on independently drawn assembly dynamics and subsequently drawn spike-times, but with identical temporal and static weight matrices W and U.

Neural interaction timescale can be identified via RTRBM estimates over multiple timescales. (A) Training paradigm. Simulated data is generated as in Figure 2, but with temporal interactions between populations at a delay of ΔtA = 4 time steps. This data is downsampled according to a downsampling rate ΔtD by taking every ΔtD-th time step (shown here is ΔtD = 4), and used for training different RTRBMs. (B) Performance of the RTRBM for various down-sampling rates measured as the accuracy in predicting the visible units one time step ahead (N = 10 models per ΔtD). Dotted line shows a the mean estimate of the upper bound on the accuracy ± SEM (N = 10, 000) due to inherent variance in the way the data is generated (see Methods and Materials). Dashed gray line indicates the theoretical performance of an uninformed, unbiased estimator (C) Cosine similarity between the interaction matrix U and the aligned learned matrices Û, both z-scored. Bars and errorbars show mean and standard deviation respectively across the N = 10 models per ΔtD. Dark lines show absolute values of the mean cosine similarity. Shown above are the Û matrices with the largest absolute cosine similarity per down-sampling rate.

To evaluate the reconstruction performance of the trained RTRBMs, the reconstruction accuracy one time step ahead is calculated on a single test data set and compared for different down-sampling rates (Figure 4B, see Methods and Materials for details). The trained RTRBM performs significantly better than an unbiased random estimator, i.e. , for all down-sampling rates (p = 0.0098 for ΔtD ϵ {1, 3, 4, 5, 8, 9}, p ≤ 0.0488 for others, one-sided Wilcoxon signed rank test, N = 10 with Bonferroni correction for the number of down-sampling rates, effect size 1.71). The performance is highest and near-optimal when the down-sampling factor matches the simulated interaction time at ΔtD = 4 (p = 0.00082, one-sided Wilcoxon rank-sum test, N = 10, effect size 5.9). Optimal is here defined as a mean upper bound in accuracy obtained from knowing the underlying model precisely, only limited by the unpredictable variance from the random factors of Poisson sampling and intrinsic assembly dynamics (see Methods and Materials for details). While the absolute difference in accuracy between the down-sampling factors may seem small, visual inspection shows that only the sampled data from ΔtD = 4 contains the characteristic temporal sequences generated by the connections in U (see Discussion for an interpretation of the overall accuracy range).

To verify that the correct temporal connections between the hidden units are found, the temporal weights Û after alignment of assemblies (see Methods and Materials) are compared with the true temporal connections U of the simulated data using cosine similarity (Figure 4C). Correspondingly, the correlation peaks when the down-sampling rate matches the interaction time and the absolute value of the similarity falls off rather smoothly with distance from the correct step size. While neighboring values around the simulation step size have similar absolute correlations, the correct step size still outperforms them. Conversely, this indicates that it may be sufficient to be close to the true step size in order to correctly estimate the temporal dependence in U.

We applied this analysis as a refinement to the training of the RTRBM on zebrafish data, and found that the the temporal correlation of neural activity to model activity was highest at the natural sampling rate (data not shown). Therefore, the analysis in Figure 3 was conducted without down-sampling.


Here, we introduced the RTRBM as a powerful dynamical statistical model for the analysis of large-scale neural data, demonstrating that it can uncover temporal dependencies between neural assemblies. We achieve this through transfer learning on the basis of the static assembly structure estimated by a cRBM trained on the same data (van der Plas et al., 2023). The estimated RTRBM models are structurally more fitting and provide more accurate accounts of the activity dynamics than those of the cRBM, as we demonstrate on simulated and experimentally acquired, whole-brain zebrafish data. The resulting temporal connectivity structure on the assembly level provides a compact description of the neural dynamics, which decomposes into dynamical networks of assemblies. Training of an RTRBM/cRBM model can be completed in a few hours on current hardware, and could thus lend itself for within experiment, interventional studies. The RTRBM therefore provides an effective and practically feasible model formalism for accounting for temporal dynamics as well as stochastic properties of whole-brain zebrafish activity.

Relation with previous studies on large-scale assemblies

Estimating functional divisions and connectivity from large-scale activity data can be considered one of the key objectives of computational neuroscience, as it would allow to automatically extract interpretable structure from datasets of (human-level) uninterpretable complexity. While it is generally recognized that this poses a difficult analytical challenge, in particular in highly connected systems (Das and Fiete, 2020), whole-brain recordings have brought a critical advance to this endeavor. However, due to the relatively recent introduction of whole-brain measurements in zebrafish larvae (Ahrens et al., 2013), surprisingly few studies exist in this system that have attempted the investigation of neural activity in this type of data at the whole-brain scale (Nguyen et al., 2018; Chen et al., 2018; Betzel, 2020; van der Plas et al., 2023).

These studies have all focused on extracting functional groupings from the neural activity, without directly attempting to perform temporal predictions of neural activity. In Chen et al. 2018, a clustering approach was introduced that identified a set of clusters of neurons, which showed responses to specific visual stimuli or motor behaviors. Betzel (2020) estimates instantaneous functional connectivity from spontaneous activity and identifies groups of local nodes that form a hierarchical, modular structure, however, without the possibility of using this model in a generative way. In our previous study (van der Plas et al., 2023) we identified neural assemblies from spontaneous activity using a generative, probabilistic approach, i.e. the cRBM, but without explicitly modeling any time-dependencies. Lastly, Nguyen et al. (2018) uses Gaussian mixture modelling to cluster on the activity level, but again, this method does not yield any insight into the time-dependencies between the identified clusters.

Many other studies have focused on the analysis of subsystems, but also without directly modelling the temporal dependence, e.g. sensorimotor transformations in the visual Bianco and Engert (2015) and the auditory system (Privat et al., 2019), the representation and maintenance of spatial position (Yang et al., 2022), decision making (Bahl and Engert, 2020), or the neural circuit underlying heading direction (Petrucco et al., 2023).

Since the current temporal resolution of light-sheet imaging is rather low, i.e. limited of a few volumes per second (but alternative scanning approaches might change this soon, see e.g. Bouchard et al. (2015)), estimating the temporal dynamics on the level of individual neurons is still difficult. Therefore, our present approach focused on the dynamics between assemblies, which are expected to develop on slower timescales. Temporal connections are generally more insightful than instantaneous function connections, as they are directed and therefore provide a better basis for separating correlation from causation.

We demonstrate that the RTRBM was able to capture functional temporal connections between neural assemblies while maintaining localized receptive fields of the hidden units. Additionally, as the RTRBM yields insight into the temporal dynamics of these identified neural assemblies, it provides a way of identifying which assemblies are similar in their dynamics and thereby can suggest distinct large-scale dynamical networks spanning one or multiple brain areas (see Fig. 3). Moreover, the RTRBM outperforms the classical RBM on the artificial data in terms of reconstruction statistics, and also outperforms cRBM in accounting for temporal dynamics.

To our knowledge, only two earlier studies have attempted to predict dynamics or estimate dynamical relations between neural assemblies on the whole brain level (Watanakeesuntorn et al., 2020; Pao et al., 2021). In both studies Empirical Dynamical Modelling/Convergent Cross Mapping is used to estimate neural dynamics, however, the zebrafish data is mostly utilized as a usage case for demonstrating that the methods scale to large datasets, without providing insight into the resulting ensembles or prediction quality. Another approach used is to apply dynamical modeling of the behavioral level and then use concurrently acquired whole-brain activity to identify corresponding structures in the zebrafish brain Dunn et al. (2016).

Limitations and future improvements

The RTRBM introduces temporal dependencies in a constrained way that effectively limit the number of additional parameters. This feature is important to avoid overfitting on the limited amount of data generally available in each experiment. However, the increased complexity involved with the addition of these temporal dependencies limits the analytical tractability of the model. This added complexity had a number of consequences on the estimation procedure, which should ideally be resolved in future work.

Specifically, the RTRBM in its current form is not intrinsically driven towards the compositional phase, which is an important property that pushes the model to identify localized neural assemblies. Specifically, the use of dReLU hidden unit potentials within the RTRBM framework was not analytically tractable in our hands. We therefore opted for a transfer learning approach, where the assemblies are first estimated by the cRBM and initializing with these identified assemblies. The RTRBM then infers the temporal connectivity between the identified localized assemblies, while only slightly modifying their assembly structure. This approach could be limiting in multiple ways. For example, the cRBM-estimated assembly structure could contain an amalgam of static and dynamic connectivities (see Figure 2 for simulated data). Furthermore, it might be necessary to estimate the assembly structure jointly with the temporal connectivities between them for optimal decomposition. Extending the current work, we aim to refine the RTRBM by introducing other sparsity constraints on the hidden-hidden connections (similar to Mittelman et al. (2014)), or by realizing the compositional properties in the RTRBM to allow single-step, direct estimation.

Another limitation of the current RTRBM framework is that all assemblies are interacting on a single time-scale. While we have demonstrated (Figure 4) that a single timescale can be identified through the estimation of and subsequent selection from multiple RTRBMs on different timescales, the more general case of multiple interaction time-scales between different assembles remains un-addressed. Partly, this issue is alleviated by the compounding effect of temporal interactions over multiple time-steps, which therefore suggests to err on the side of shorter time-steps in estimation. In preliminary explorations we noticed that the estimated RTRBM generates alternating dynamics between clusters of assemblies (see Figure 3) on time-scales that are much longer than the single interaction step. In subsequent work, the RTRBM could be generalized to include multiple time-scales of interaction for different assemblies.

Related to the time-scale issue, light-sheet imaging is currently limited to ∼100 Hz, which means that the ∼30-40 imaging planes are sampled at only 2-4 Hz, depending on the specific system. At these low imaging rates it is likely that some assembly dynamics are missed or appear simultaneous. Improvements in the speed of stepping between imaging planes will increase the sampling rate per cell. Together with brighter flourescent indicators (Zhang et al., 2023), this will provide a more reliable basis for estimating models that incorporate temporal dependencies.


The RTRBM formalism is the logical next steps in the analysis of whole-brain recordings, as it accounts for the static and dynamic aspects using a probabilistic formalism, which captures both the stochastic and deterministic aspects that are hallmarks of neural activity. Followup studies need to attempt to extend the RTRBM into the compositional phase directly, thus speeding up learning and ensuring matched assemblies and temporal connectivities (Bargmann and Marder, 2013). Recordings at higher temporal resolutions and for longer durations will be instrumental in allowing convergence of the cRTRBM (Helmstaedter, 2015). Together with advancements of computing hardware this should allow for interventional studies based on the estimated dynamics to directly verify the estimated temporal connectivity through modulation techniques such as optogenetic control or laser ablation.

Methods and Materials

The Restricted Boltzmann Machine (RBM)

The Rstricted Boltzmann Machine (RBM) (Salakhutdinov et al., 2007) is an undirected graphical model that defines a probability distribution over a set of binary visible units carrying the data configurations , and real-valued latent representations given by a set of hidden units. In contrast to the definition of the classical Boltzmann Machine, the RBM has no direct couplings between pairs of units within the same layer, making it a bipartite graph that allows for more efficient model training. The joint probability distribution of the model is defined as:

where Wij are visible-to-hidden unit weights, is the global energy of the RBM,Z = Σvd h e−E(v,h) the partition sum, and υj hj the hidden unit potential. The choice of the hidden unit potential shapes the energy landscape of the model and thus the states the hidden units in the the model can take. It is therefore an important model choice. For a more detailed discussion on the choice of the shape of the hidden unit potential, we refer to our previous paper (van der Plas et al., 2023). In this work, we use the Bernoulli hidden unit potential, which is defined as , where bh is the hidden bias and . Here, i is used to index the visible units and j to index the hidden units. The full set of model parameters is given by the visible bias , the hidden bias and weights. Note that, while the visible units v are observed variables, the hidden units h are unobserved (latent) variables and must therefore be sampled, conditioned on the state of the visible units v.

The posterior distributions over the hidden and visible units allow sequential sampling of the hidden and visible unit states. As both layers contain no within-layer dependencies, the posterior distributions over v and h are conditioned only on the other variable and factorize as:

The choice of the Bernoulli hidden unit potential reduces these equations to:

where σ denotes the logistic function, defined as .

Training the RBM

The RBM model parameters are learned by maximizing the log-likelihood of the target data, denoted as ℒ = ⟨log(P (v)) ⟩ data (Ackley et al., 1985). This learning procedure ensures an accurate representation of the underlying distribution of the target dataset through the Boltzmann-Gibbs energy distribution (Boltzmann, 1868). Stochastic gradient-based methods are employed to minimize the Kullback–Leibler divergence (Kullback and Leibler, 1951) between the model and data distributions. These steps write:

where E is the energy of the RBM, as given in Equation 1. We here let⟨…⟩denote the expectation value over the data and the model distributions respectively. As the visible states of the data are given, ⟨E(v, h) ⟩ data is straightforward to compute. In contrast, the computation ⟨ E(v, h) ⟩model is in-tractable for sufficiently large systems due to the exponentially large state space over the visible unit states in the partition sum. To address this problem of intractability, a common solution is the use of a K-step Markov Chain Monte Carlo sampling scheme (Gibbs sampling) (Hinton, 2002). This approach starts from an initial configuration and aims to approximate the expectation values of the model distribution. This is also known as Contrastive Divergence (CD). In this learning scheme, samples are sequentially drawn from the visible and hidden posterior distributions respectively (Eq. Equation 4), up to K time. For sufficiently large values of K, this procedure yields unbiased samples of the underlying model distribution (Hinton, 2002). In practice, a value of K = 1 during training is generally sufficient to obtain reasonable expectation values (Carreira-Perpinan and Hinton, 2005). Increasing the number of samples will result in better approximations, and thus generally in improved model estimation.

The compositional phase

In the classical definition of the RBM training scheme, there is no regularization on the hidden-unit activation sparsity. This means that a visible unit can have a proportionally strong connection to a large set of hidden units. Such a non-localized visible-hidden unit connectivity can hinder the interpretation of the model’s learned latent representation. Previous work (Tubiana and Monasson, 2017; Tubiana et al., 2019a) introduced a method to regularize the RBM in such that it is pushed towards a sparse visible-hidden unit connectivity, termed the compositional phase. The resulting compositional RBM (cRBM) was employed to discover compact neural assemblies in our previous study (van der Plas et al., 2023). This sparsity in connections allows for a direct and interpretable relationship between the weights in the cRBM and the generated data configurations. The compositional phase is observed when RBMs are constrained to a specified set of structural conditions (Tubiana and Monasson, 2017):

  1. The hidden units are unbound and real-valued with ReLU like activation function.

  2. The weight matrix W is sparse.

  3. The columns of the W have similar norm.

A key implementation detail of this cRBM model is the use of the double-Rectified Linear Unit (dReLU) defining the hidden-unit potential (Tubiana and Monasson, 2017; Tubiana et al., 2019a,b). Detailed analyses of RBMs operating in the compositional phase have exemplified the dynamical consequences and the advantages of this regime for learning complex data manifolds. In-depth discussions on the cRBM and the compositional phase can be found in related literature (Tubiana and Monasson, 2017; Tubiana et al., 2019a,b). In this work, we use the implementation from van der Plas et al., 2023 for our cRBM model. A detailed description of the implementation and accompanying code can be found there.

The Recurrent Temporal Restricted Boltzmann Machine (RTRBM)

The RTRBM can be conceptualized as a series of RBMs unfolded across the temporal dimension, i.e. a recurrent network. The model state of the RTRBM at each time step t is in essentially an RBM, where the hidden state is conditioned on the contextual hidden state of the RBM at time step t − 1. Mathematically, this involves augmenting the hidden bias of the RBM at time t with an additional term dependent on the previous expected hidden states r[t1] = ⟨ h[t1] ⟩. Furthermore, the weights are introduced which directly connect the previous to the current hidden unit states. This setup allow the model to directly capture latent-state temporal dependencies that cross multiple time steps. The statistical distribution of the RTRBM at time step t is given by:

where we dropped the indexing subscripts ij for conciseness. At t = 1, the term Ur[t−1] is replaced by , which denotes the learnable initial bias for the hidden units. The joint probability distribution of an RTRBM model with length T is found by factoring over the RBM stack over all time steps t, … T. This procedure is written as:


Importantly, instead of using binary hidden unit states h[t−1], sampled from the expected real-valued hidden states r[t−1], the RTRBM propagates these real-valued hidden unit states directly. This approach constitutes the mean-field approximation of the hidden states of the temporally preceding RBMs, resulting in an efficient approximation of the temporal state at each time step t (Hinton, 2002). The inputs r[t] of the RTRBM at time step t, given the visible unit state v, are then calculated as:

Training the RTRBM

The model parameters θ ∈ {W, U, bv, binit, bh } are learned by maximization of the log-likelihood using (stochastic) gradient ascent:

where η denotes the learning rate and where

are the partial derivatives of the log-likelihood with respect to the model parameters. The energy function of the RTRBM can be split up into two components: a static component ℋ and a temporal component 𝒬. The gradients of the static component ℋ with respect to θ are calculated by summing over the gradients of the RBM at each time step. The calculation of the gradients of 𝒬 with respect to θ is more complex. First, observe that 𝒬 can be computed recursively as:

Hence, the backpropagation-through-time algorithm (Rumelhart et al., 1986) can be employed to recursively compute of gradients for 𝒬 with respect to r[t]:

where ⊙ denotes the element-wise product. The final step to obtain the derivative with respect to θ involves applying the chain rule:

For a more detailed derivation of these equations, we refer to Mittelman et al. (2014).

Inference in the RTRBM

The RTRBM’s sequential sampling scheme, using preceding hidden- and visible states to predict subsequent time steps, enables generating data from its learned model distribution. Initially, r[t] is calculated using the previous expected hidden states and current visible states. The contrastive divergence sampling scheme is then used to sample v[t+1]. Consequently, v[t+1] is utilized to get the following expected hidden unit states r[t+1], and so forth. This sequential process enables the RTRBM to statistically predict events in future time steps.

During model training, all time steps T are available and it is not required to infer longer sequences. At the start of each training epoch r[t] can be computed recursively up to time point T, followed by contrastive divergence performed for all time steps in parallel. At model inference, we initialize r[t] on the first time step of each test batch and sample multiple time steps into the future using the Gibbs sampling scheme (see Figure 2G). By comparing the inferred data with the remainder of the test data batch, we quantify the extent to which the RTRBM accurately captures the statistics of neural data across time (see also Performance metrics).

Model-generated simulated data

The simulated data is generated according to the following principles: Nh hidden populations are instantiated that are temporally connected through a set of weights U. The activity of a single hidden population is represented as a time-varying firing rate. Each hidden population activity is the combination of two sources: (1) An intrinsic, time-varying firing rate, generated by randomly timed peaks (where the inter-peak interval is randomly drawn as ΔtU (t1, t2)) convolved with Gaussian-shaped signals (ϕ(x; t, σ), where σU (σ1, σ2)). The resulting signal is subsequently renormalized to be in the range [fmin, fmax]. The final signal represents the firing rates of the hidden unit populations. (2) Recurrent interactions of firing rates between the hidden populations at a delay of ΔtA. The strength of the temporal interactions between hidden populations is given by U, with positive and negative entries representing excitatory and inhibitory interactions respectively. Each hidden population connects to a distinct set of Nv visible units, and their activity is independently drawn from a Poisson distribution whose rate over time is given by the rate of the hidden population, scaled by a different constant for each visible unit. The parameters for the data generating model were set to Nh = 10, Nv = 20, t1 = 3, t2 = 10, σ1 = 0.1, σ1 = 0.5, fmin = 0.3, and fmax = 0.8. The temporal connectivity matrix U was configured such that each hidden population is both excited and inhibited by one of the other hidden populations (Figure 2 shows a graphical representation of the data generation pipeline).

For model training and evaluation, the generated simulated data was divided into a train- and test-set. Model inference is used for generating samples from trained RBM/RTRBM models to evaluate their performance. At model inference, the models are initialized on the first time step of the test batch, and the subsequent time steps are inferred through Gibbs sampling. The inferred data is then compared to the ground-truth test data through several statistics: (1) The accuracy between inferred and test data, which measures how well the models can reproduce unseen activity of visible units. (2) The mean activation of the visible units ⟨vi⟩, also referred to as the first order moments. (3) Pairwise moments between the visible units ⟨ v i v j⟩, also denoted as second-order statistics. These pairwise moments assess the model’s ability to capture data statistics it is not explicitly trained on. (4) Time-shifted pairwise moments of the visible and hidden units, i.e. and respectively. More details on how these statistics are calculated can be found in Performance metrics.

Zebrafish data

Whole-brain single-cell functional recordings for 15 zebrafish larvae were recorded by means of light-sheet microscopy in the lab of G. Debregeas, 8 of which are used in this study (see van der Plas et al., 2023 for a more detailed description of both the data acquisition process and subsequent data processing). These datasets are publicly available and can be found at In summary, the datasets consist on average of 40, 709 ± 13, 854 neurons, recorded for 5836 ± 1183 time steps, at a frequency of 3.9 ± 0.8 Hz. The zebrafish larvae are 5-7 days post-fertilization and expressed GCaMP6s or GCaMP6f calcium reporters for imaging. The experimental procedure is described in more detail in van der Plas et al. (2023). The process from data segregation to analysis is displayed in Figure 3. To obtain binarized spike traces that can be used by the RTRBM the individual fluorescence traces are deconvolved by blind sparse deconvolution (Figure 3A) (van der Plas et al., 2023).

RTRBM initialization through transfer learning

Through empirical evaluation, we found de novo learning of the RTRBM to not converge to a solution that satisfies the compositional-phase criteria (see The compositional phase). This is to be expected as the formulation of the RTRBM as used in this work does not feature the required elements necessary to push the model solution towards such a solution. To overcome this issue, we make use of a transfer learning strategy (Tan et al., 2018). More specifically, the visible-to-hidden weights W of the RTRBM models are initialized by their estimated counterparts from trained cRBM models (van der Plas et al., 2023). During subsequent training of the RTRBM, we let the model to update the values of these weights with a reduced learning rate (see below). This initialization strategy is able to bias the resulting weight matrix W inferred by the RTRBM to contain a strong prior towards the localized receptive fields of the hidden units of the cRBM.

For model training and evaluation, the functional data for each animal was divided into a train- and test-set. To this end, the functional recordings of each animal were subdivided into 10 segments of consecutive time steps. In all cases, temporal segments 2, 6, and 7 are labeled as test sets, the remaining temporal segments are labeled as training sets. This data division strategy mirrors that of van der Plas et al. 2023. In practice, each segment was further subdivided into smaller batches of size 16 ± 4, enabling their computation on an Nvidia GeForce RTX 3090 24GB GPU. This resulted on average in 243 ± 67 train batches and 104 ± 29 test batches. Using transfer learning, we trained each RTRBM for 10, 000 epochs with a learning rate of 103. For each epoch we calculated the gradients for 20 batches in parallel before updating the model parameters. This strategy reduced the training time to a relatively short duration of 3 hours for 10, 000 epochs calculated on an NVIDIA RTX3090 GPU. To maintain neural assemblies we enforced an L1-norm sparsity regulator with its constant set to λ = 106 (with λ = 107 for two of the animals), and a reduced learning rate for the matrix W by two orders of magnitude (i.e. 105).

Performance evaluation and significance testing

The performance of both the cRBM and RTRBM models after training was evaluated through inspection of the learning moments, similar to that of the simulated data (see Performance metrics for more details). Furthermore, to test the significance of the difference in performance between the RTRBM and the cRBM, we made use of a bootstrap method. To this end, we randomly selected 1000 neurons (∼ 2% of the total population) and calculated the neural statistics for each model. This process was repeated n times with non-overlapping subsets such that each neuron only gets sampled once. For each repetition, the Spearman correlation between data and model sampled moments was calculated for each ⟨fkmodel (see Figure 2). We then calculated a confidence interval given by 2 standard deviations from the mean for the Spearman correlation. If the confidence intervals for the RTRBM and the cRBM did not overlap, the difference in performance was considered significant.

Clustering of hidden-hidden unit weights

To identify neural assemblies with similar temporal connectivity, agglomerative clustering is employed on the matrix Û after model training. To this end, we utilize the function AgglomerativeClustering from scikit-learn and use Ward’s method to evaluate the distance dendrogram (Pedregosa et al., 2011). Clustering can be applied to the incoming (row) or outgoing (column) connections of the matrix Û, in this work we apply it to the outgoing connections only. For fish 4, as shown in figure (Figure 3D, E), a distance threshold of 20 was used to threshold the resulting distance dendrogram, which identified 6 functional clusters. Receptive fields were determined by assigning neurons to the identified clusters based on the strength of their weight (Figure 3E). Here, a strong weight is determined by proportional thresholding, wi,j > wthr. Here wthr is set so that at least 5000 neurons have a strong connection towards the hidden layer. This mirrors the thresholding strategy used in figure Figure 3B.

Alignment of the estimated temporal weight matrix

In the context of the simulated data, the activity of the visible units are systematically generated in assemblies of size. Nevertheless, during the training process of the RTRBM, which solely relies on the state of the visible units, the link between each assembly and a particular hidden unit becomes arbitrary. This is, of course, under the assumption that all assemblies are retrieved correctly during the training process. Consequently, the estimated temporal weights Û can have any arbitrary ordering and is often not matched with that of the original matrix U used to generate the data, generally resulting in an invalid match between both matrices. To enable a valid comparison, the ordering of the hidden units in the estimated matrix Û is matched to those in the original matrix U by leveraging the learned visible-to-hidden unit weight matrix Ŵ. Here, as the assemblies of the visible units are known, assemblies are matched with hidden units under the assumption that an assembly of visible units is most linked to the hidden unit with the largest mean absolute visible-to-hidden weights between them (Figure 2—figure Supplement 1A, left), calculated using the row-wise correlations with the ideal (‘diagonal’) weight matrix W. The ordering of the hidden units can then be set such that it is aligned with the ordering of the assemblies of visible units. Cases where two assemblies have the same initially matched hidden unit are resolved by sequentially assigning assemblies by order of their correlation strength to the hidden unit. These cases however generally indicate that the assemblies of visible units were not correctly captured by the model.

In addition to any arbitrary ordering of hidden units in the estimated matrix Û, it is possible for assemblies to form an inverse but correct match with a hidden unit. This means that the corresponding hidden unit is inactive when the corresponding assembly is active and vice versa. This phenomenon is enabled through the anti-symmetry of the activation function, and generally occurs when the weights between the assembly and the hidden unit are negative. An inverse match is identified based on the sign of the mean weight between the hidden unit and the matched assembly of visible units. In the case of an inverse match with a hidden unit, the temporal weights between this hidden unit and all the other units are inverted, resulting in sign switches in the estimated matrix Û (Figure 2—figure Supplement 1, right). In the case of two inverted hidden units, the two temporal weights between them are unaffected as the sign switches cancel each other out.

Together, these corrections yield an aligned estimated temporal weight matrix Û that is more similar to the original matrix U, allowing element-wise comparisons (Figure 2—figure Supplement 1B). All presented Û matrices for simulated data are aligned according to this procedure.

Performance metrics

We evaluated model performance using multiple metrics throughout this study. More specifically, we compared between first and second order model and data statistics for both the visible and hidden units and we used model accuracy on the visible units.

Comparison of model and data statistics

The cRBM and RTRBM models are trained to optimize multiple statics. These include the mean activity of the visible units (neurons) ⟨vi⟩, the mean activity of the hidden units ⟨hμ⟩and their paired Interaction ⟨vi hμ⟩. Furthermore, one can evaluate the second-order statistics ⟨vi vj⟩, ⟨ hμhv⟩, and ⟨vihj ⟩ that are not directly optimized by the models.

To evaluate model performance, we take these statistics and compare then between those of the empirical data and the mode . Data statistics are calculated based on a withheld test-data set consisting of ∼30% of the data. To evaluate hidden-unit statistics for the empirical data, we use the expected value of ht conditioned on the visible unit state vt at time point t for the model under evaluation. Model statistics are not explicitly available and must be sampled from the model under study. To this end, model statistics are approximated through Gibbs sampling of the visible and hidden unit states. For the simulated data, 15 steps of Gibbs sampling are used with a burn-in period of 4000 steps. This is done for 100 time points of data with a chain length of 20 time steps. For the zebrafish data, again 15 steps of Gibbs sampling are used with a burn-in period of 4000 time steps. Here 68-150 random time points in the test set were chosen each with a sampling chain of 14-16 time steps. The number of time points and chain length depended on the length of the functional recording of a single data-batch, and was empirically chosen by comparing resulting performance.

In all cases, we subsequently measure correspondence between pairs of data and model statistics and by evaluating their Spearman correlation.

Model accuracy

We define the model accuracy as the number of correctly predicted visible unit-states for some time point t:

Where Ncorr denotes the number of correctly predicted visible units and Nv denotes the number of visible units.

Determining accuracy bounds for simulated data

For the simulated data, the model accuracy would be directly interpretable if the model generating the data was completely deterministic. However, the model used in this work features multiple stochastic components. To gain insight into model performance for simulated data, it is thus necessary to infer statistically meaningful bounds on performance with respect to the generating model. To this end, we here describe methods used for estimating both an upper- and a lower-bound for the generating model used in this work.

Upper-bound estimation

As described in the methods above, the state of the visible units is sampled through a Poisson distribution from the assembly activity consisting of (1) an intrinsic, time-varying firing rate, generated by randomly timed, Gaussian-shaped variations of firing rate, and (2) recurrent, delayed interactions of firing rates between the hidden populations with interaction time ΔtA. It is straightforward to estimate the variance in a Poisson sampling process under a known distribution of assembly traces. However, the intrinsic firing rates of the artificial neural assemblies are complicated to describe with theoretical distributions. The exact calculation of this theoretical bound is thus rather complex. Furthermore, one must make a distinction between an estimator with perfect knowledge of the assembly state at the previous time step, an ideal estimator, and a real estimator that must approximate this assembly state, such as the RTRBM. A representative upper-bound thus includes this source of stochasticity.

To obtain an upper-bound that includes these notions, we turn to empirical methods. To this end, the assembly activity in a previous time step is initialized to a fixed, known state. Then, the temporal interactions are calculated to obtain the deterministic assembly activity after an interval ΔtA time steps, which are added to different samples from the intrinsic assembly activity to obtain the full assembly state. Poisson sampling is performed on two such states and the accuracy is calculated between them. This corresponds to the performance of an ideal model in predicting the state of the visible units ahead by a time equivalent to the interaction delay, with perfect knowledge of the underlying dynamics, and yields sets of accuracy samples incorporating all aforementioned sources of stochasticity.

For a representative distribution of the possible states, 10, 000 known assembly activity states are randomly sampled from the original test data, which form the fixed previous time step. Next, 2×200 random instances of the intrinsic assembly activities are added as described above per initial state, and samples are taken twice from each of the 200 pairs of states. As the accuracy is a linear metric, the mean accuracy is taken over all initial states and all random intrinsic states. The error in this estimate is quantified as the standard error of the mean, and is calculated by seeing each average accuracy per initial state as an estimate of the mean.

Lower-bound estimation

A lower-bound is determined through a representative naive estimator. Here, we define this naïve estimator with the expected firing rate of each visible unit such that the estimator is unbiased by definition. The accuracy reached by this estimator can be precisely calculated as ⟨1 2 ⟨vi ⟩(1 − ⟨vi ⟩).⟩The choice of this unbiased estimator is especially important in the case of sparse activity traces as dealt with here. It is easy to see that a higher accuracy is reached by the (biased) estimator with accuracy 1 − ⟨v⟩. It is worth noting that in the context of down-sampling experiments, even though all models are assessed using the same test dataset, the act of down-sampling yields a similar but different mean activity ⟨vi⟩, resulting in a slightly different theoretical estimate of the accuracy of the unbiased random estimator.

Code and data availability

The RTRBM models described in this study have been implemented in Python 3.7. Packages used for model implementation, analysis and data visualization include Numpy (Harris et al., 2020), Scipy (Virtanen et al., 2020), Pytorch (Paszke et al., 2019), Scikit-learn (Pedregosa et al., 2011), matplotlib (Hunter, 2007), Pandas (Wes McKinney, 2010), Seaborn (Waskom et al., 2017), h5py (Collette, 2013), tqdm. Separate notebooks are available allowing for recreation of each individual figure presented in this paper. All code accompanying this paper is available at All model-generated simulated data is seeded and can easily be generated through the accompanying notebooks. All functional whole-brain datasets were obtained in the lab of G. Debregeas and are publicly available at

Declaration of interests

The authors declare no competing interests.


We would like to thank T. van de Plas and G. Tubiana for technical discussions. BE acknowledges funding from an NWO VIDI grant (016.Vidi.189.052).

An alignment procedure is used to be able to enable comparison between the learned temporal weight matrix Û and the original temporal matrix used to generate the data. (A) The link between hidden units and assemblies of visible units can often be clearly identified within the weight matrix, making it possible to define an ordering of the hidden units such that it is in line with the ordering of the assemblies. This leads to a shuffling of the rows in the Ŵ matrix. The sign of the mean weight between an assembly and its matched hidden unit is used to identify inverse relations. (B) The reordering of the hidden units leads to a reshuffling of both the rows and the columns of the Û matrix. The sign of an entry is switched when the two hidden units have an opposing sign to their matched assembly of visible units.