# Abstract

Observations of power laws in neural activity data have raised the intriguing notion that brains may operate in a critical state. One example of this critical state is “avalanche criticality,” which has been observed in various systems, including cultured neurons, zebrafish, rodent cortex, and human EEG. More recently, power laws were also observed in neural populations in the mouse under an activity coarse-graining procedure, and they were explained as a consequence of the neural activity being coupled to multiple dynamical latent variables. An intriguing possibility is that avalanche criticality emerges due to a similar mechanism. Here, we determine the conditions under which dynamical latent variables give rise to avalanche criticality. We find that a single, quasi-static latent variable can generate critical avalanches, but multiple latent variables lead to critical behavior in a broader parameter range. We identify two regimes of avalanches, both critical but differing in the amount of information carried about the latent variable. Our results suggest that avalanche criticality arises in neural systems, in which there is an emergent dynamical variable or shared inputs creating an effective latent dynamical variable and when this variable can be inferred from the population activity.

**eLife assessment**

This paper provides a simple example of a neural-like system that displays signatures of criticality, but not for any deep reason; it's just because a population of neurons are driven (independently!) by a slowly varying latent variable, something that could easily arise in neural systems. In this model, criticality does not imply optimal information transmission (one of its proposed functions). This is an **important** finding backed up by **compelling** evidence, and it should be influential in how people think about criticality in the brain.

# Introduction

The neural criticality hypothesis – the idea that neural systems operate close to a phase transition, perhaps for optimal information processing – is both ambitious and banal. Measurements from biological systems are limited in the range of spatial and temporal scales that can be sampled, not only because of the limitations of recording techniques but also due to the fundamentally non-stationary behavior of most, if not all, biological systems. These limitations make proving that an observation indicates critical behavior difficult. At the same time, the idea that brain networks are critical echoes the anthropic principle: tuned another way, a network becomes quiescent or epileptic and in either state, seems unlikely to support perception, thought, or flexible behavior. Further muddying the water, researchers have reported multiple kinds of criticality in neural networks, including through analysis of avalanches (Beggs and Plenz, 2003; Plenz et al., 2021; O’Byrne and Jerbi, 2022; Girardi-Schappo, 2021) and of coarse-grained activity (Meshulam et al., 2019), as well as of correlations (Dahmen et al., 2019). How these flavors of critical behavior relate to each other or any functional network mechanism is unknown.

The phenomenon that we will refer to as “avalanche criticality” appears remarkably widespread. It was first observed in cultured neurons (Beggs and Plenz, 2003) and later studied in zebrafish (Ponce-Alvarez et al., 2018), turtles (Shew et al., 2015), rodents (Ma et al., 2019), monkeys (Petermann et al., 2009), and even humans (Poil et al., 2008). The standard analysis, described later, requires extracting power-law exponents from fits to the distributions of avalanche size and of duration and assessing the relationship between exponents. There is debate over whether these observations reflect true power laws, but within the resolution achievable from experiments, neural avalanches exhibit power laws with exponent relationships predicted from theory developed in physical systems *( Perkovic et al., 1995).*

Avalanche criticality is not the only form of criticality observed in neural systems. Zipf’s law, in which the frequency of a network state is inversely proportional to its rank, appears in systems as diverse as fly motion estimation and the salamander retina (Mora and Bialek, 2010; Schwab et al., 2014; Aitchison et al., 2016). More recently, Meshulam et al. (2019) reported various statistics of population activity in the mouse hippocampus, including the eigenvalue spectrum of the covariance matrix and the activity variance. These were found to scale as populations were “coarse-grained” through a procedure, in which neural activities were iteratively combined based on similarity. Similar observations have been reported in spontaneous activity recorded across a wide range of brain areas in the mouse (Morales et al., 2023). Simple neural network models of such data explain neither Zipf’s law nor coarse-grained criticality (Meshulam et al., 2019).

Even though these three forms of criticality are observed through different analyses, they may originate from similar mechanisms. Numerous studies have reported latent structure in the activity of large populations of neurons (Mazor and Laurent, 2005; Ahrens et al., 2012; Mante et al., 2013; Pandarinath et al., 2018; Stringer et al., 2019; Nieh et al., 2021). Capturing this observation in a dynamical latent variable (DLV) mode,l in which neural populations are broadly and heterogeneously coupled to multiple dynamical variables, we previously reproduced scaling under coarse-graining analysis within experimental uncertainty (Morrell et al., 2021). Zipf’s law has been explained by a similar mechanism *( Schwab et al., 2014; Aitchison et al., 2016; Humplik and Tkacik, 2017). A* single quasi-static latent variable has been shown to produce avalanche power laws, but not the relationships expected between the critical exponents (Priesemann and Shriki, 2018), while a model including a global modulation of activity can generate avalanche criticality (Mariani et al., 2021), but has not demonstrated coarse-grained criticality (Morrell et al., 2021). It is not known under what conditions the more general DLV model generates avalanche criticality.

In this paper, we systematically investigate avalanche statistics in the DLV model. We show that a system coupled to one or many dynamical latent variables can generate avalanche criticality, and we examine the requirements for the number and timescale of variables for this criticality to occur. We find that avalanche criticality is observed over a wide range of parameters, some of which may be optimal for information representation. Our results suggest that latent dynamical structure in large-scale neural recordings may be responsible for the signatures of criticality observed across many systems.

# Results

## Critical exponents values and crackling noise

We begin by defining the metrics used to quantify avalanche statistics and briefly summarize experimental observations, which have been reviewed in detail elsewhere (Plenz et al., 2021; O’Byrne and Jerbi, 2022; Girardi-Schappo, 2021). Activity is recorded across a set of neurons and binned in time. Avalanches are then defined as contiguous time bins, in which at least one neuron in the population is active. The duration of an avalanche is the number of contiguous time bins and the size is the summed activity during the avalanche. The distributions of avalanche size and duration are fit to power laws (*P* (*S*) ~ *S ^{−τ}* for size

*S*, and

*P*(

*D*) ~

*D*for duration

^{−α}*D*) using standard methods (Clauset et al., 2009).

Power laws can be indicative of criticality, but they can also result from non-critical mechanisms (Touboul and Destexhe, 2017; Priesemann and Shriki, 2018). A more stringent test of criticality is the “crackling” relationship (Perkovic et al., 1995; Touboul and Destexhe, 2017), which involves fitting a third power-law relationship, , and comparing *γ*_{fit} to the predicted exponent *γ*_{pred}, derived from the size and duration exponents, *τ* and *α*:

Previous work demonstrating approximate power laws in size and duration distributions through the mechanism of a slowly changing latent variable did not generate crackling (Touboul and Destexhe, 2017; Priesemann and Shriki, 2018).

Measuring power-laws in empirical data is challenging: it generally requires setting a lower cutoff in the size and duration, and the power-law behavior only has limited range due to the finite size and duration of the recording itself. Nonetheless, there is some consensus (Shew et al., 2015; Fontenele et al., 2019; Ma et al., 2019) that even if *τ* and *α* vary over a wide range (1.5 to about 3) across recordings, the values of *γ*_{fit} and *γ*_{pred} stay in a relatively narrow range, from about 1.1 to 1.3.

## Avalanche scaling in the Dynamical Latent Variable (DLV) model

We studied a population of neurons that are coupled to dynamical latent variables but not coupled to each other (Fig. 1A). We refer to this model as the Dynamical Latent Variable (DLV) model. The latent variables determine the inputs to the simulated population of neurons. We are agnostic as to the origin of these inputs: they may be externally driven from other brain areas, or they may arise from recurrent dynamics locally. We have previously shown that the DLV model with at least about five latent variables can produce power laws under the coarse-graining analysis (Morrell et al., 2021). In this paper, we examine avalanche criticality in the same model.

Specifically, we model the neurons as binary units (*s _{t}*) that are randomly (

*J*~

_{iμ}*N*(0, 1)) coupled to dynamical variables

*h*(

_{μ}*t*). The probability of any pattern {

*s*}, given the current state of the latent variables, is

_{i}
where the parameter *η* controls the scaling of the variables and *ϵ* controls the overall activity level. We modeled each latent variable as an Ornstein-Uhlenbeck process with the time scale *τ _{F}* (see

*Methods*). Thus our model has four parameters:

*η*(input scaling),

*e*(activity threshold),

*τ*(dynamical timescale), and

_{F}*N*(number of latent variables).

_{F}Distributions of avalanche size and avalanche duration within this model followed approximate power laws (Fig. 1C; see *Methods*). In the example shown (*N _{F}* =5,

*τ*= 104,

_{F}*η*= 4 and

*ϵ*= 12), we found exponents

*τ*= 1.89 ± 0.02 (size) and

*α*= 2.11 ± 0.02 (duration). Further, the average size of avalanches with fixed duration scaled as

*S*~

*D*, with the fitted

^{γ}*γ*

_{fit}= 1.24 ±0.02, in agreement with the predicted value

*γ*

_{pred}= 1.24±0.02. Thus, our model could generate avalanche scaling, at least for some parameter choices. In the following sections, we examine how avalanche scaling depends on model parameters (

*N*,

_{F}*τ*,

_{F}*η*and

*ϵ*; see Table 2). We first focus on two sets of simulations: one set with

*N*= 1 latent variable, which does not generate scaling under coarse-graining (Morrell et al., 2021), and one set with

_{F}*N*= 5 latent variables, which can generate such scaling for some values of parameters

_{F}*τ*,

_{F}*η*, and

*ϵ*(Morrell et al., 2021).

## Avalanche scaling depends on the number of latent variables

We analyzed avalanches from one- and five-variable simulations, each with fixed latent dynamical timescale (*τ _{F}* = 5 × 10

^{3}time steps; see Table 2 for parameters). In the following sections, time is measured in simulation time steps, see

*Methods*for converting time steps to seconds. We used established methods for measuring empirical power laws (Clauset et al., 2009). The minimum cutoffs for size (

*S*

_{min}) and duration (

*D*

_{min}) are indicated by vertical lines in Fig. 2. For the population coupled to a single latent variable, the avalanche size distribution was not well fit by a power law (Fig. 2A). With a sufficiently high minimum cut-off (

*D*

_{min}), the duration distribution was approximately power-law (Fig. 2B).

We next assessed whether the simulation produced crackling. If so, the value *γ*_{fit} obtained by fitting would be similar to . In many cases, such as the one-variable example shown in Fig. 2C, the full range of avalanche durations were not fit by a single power law. Therefore, we determined the largest range, over which a power law was a good fit to the simulated observations. In this case, slightly over two decades of apparent scaling were observed starting from avalanches with minimum duration slightly less than 100 time steps (Fig. 2C), with a best-fit value of *γ _{fit}* ∈ [1.69, 1.74]. As we did not find a power-law in the size distribution, calculating

*γ*

_{pred}is meaningless. If we do it anyway, we obtain

*γ*= 0.83 ± 0.03 (yellow line in Fig. 2C), which clearly deviates from the fitted value of

_{pred}*γ*. Thus, for the single dynamical latent variable model (

*τ*= 5000), power-law fits are poor, and there is no crackling.

_{F}The five-variable model produces a different picture. We now find avalanches, for which size and duration distributions are much better fit by power-law models starting from very low minimum cutoffs (Fig. 2D-E, Fig. 2-Supp. Fig. 2). Average size scaled with duration, again over more than two decades, with *γ*_{fit} = 1.27 ± 0.03, which was in close agreement with *γ*_{pred} = 1.25 ± 0.02 (Fig. 2F). Holding other parameters constant, we thus found that scaling relationships and crackling arise in the multi-variable model but not the single-variable model.

## Avalanche scaling depends on the time scale of latent variables

Based on simulations in the previous section, we surmised that the five-variable simulation generated scaling more readily due to creating an “effective” latent variable that had slower dynamics than any individual latent variable. We reasoned that at any moment in time, the latent variable state *h _{μ}*(

*t*) is a vector in the latent space. Because coupling to the latent variables is random throughout the population, only the length and not the direction of this vector matters, and the timescale of changes in this length would be much slower than

*τ*, the timescale of each of the components

_{F}*h*(

_{μ}*t*). We therefore speculated that increasing the timescale of dynamics of the latent variables should eventually lead to scaling and crackling in the single-variable model as well as the five-variable one. To examine the dependence of avalanche scaling on this timescale, we simulated one-variable and five-variable networks at fixed

*η*and

*ϵ*coupled to latent variables with the correlation time of their Ornstein-Uhlenbeck dynamics of

*τ*∈ [10

_{F}^{3}, 10

^{5}] time steps, spanning from a factor of 10 faster to a factor of 10 slower than the original

*τ*in Fig. 1. Simulations were replicated five times at each combination of parameters by drawing new latent variable coupling values (

_{F}*J*), as well as new latent variable dynamics and instances of neural firing. For simulations that passed the criteria to be fitted by power laws, we plot the fitted values of

_{iμ}*τ*,

*α*,

*γ*

_{fit}and

*γ*

_{fit}−

*γ*

_{pred}(Fig. 2G-J). Missing points are those, for which distributions did not pass the power law fit criteria.

In the single-variable model, best-fit exponents changed abruptly for latent variable timescale around *τ _{F}* = 10

^{4}(Fig. 2G, H), while in the five-variable model, exponents tended to increase gradually (Fig. 2G, H, red). The discontinuity in the single-variable case reflected a change in the lower cutoff values in the power-law fits: size and duration distributions generated with faster latent dynamics could be fit reasonably well to a power law by using a high value of the lower cutoff (Fig. 2-Supp. Fig. 3). For time scales greater than ~ 10

^{4}, the minimum cutoffs dropped, and the single-variable model generated power-law distributed avalanches and crackling (Fig. 2J), similar to the five-variable model. In summary, in the DLV model, introducing multiple variables generated scaling at faster timescales. However, by slowing the timescale of the latent dynamics, the DLV model generated signatures of critical avalanche scaling for both multi- and single-variable simulations.

## Avalanche criticality, input scaling, and firing threshold

In the previous section, we found that a very slow single DLV model generated scaling. Thus, from now on, we simplify the model in order to characterize avalanche statistics across values of input scaling *η* and firing threshold *ϵ*. Specifically, we modeled a population of *N* = 128 neurons coupled to a single quasi-static latent variable. We simulated 10^{3} segments of 10^{4} steps each and drew a new value of the latent variable (*h* ~ *N*(0, 1)) for each segment. Ten replicates of the simulation were generated at each of the combinations of *η* and *ϵ* (see *Methods*).

Almost independent of *η* and *ϵ*, we found quality power law fits and crackling. Fig. 3 shows the average (across *n* = 10 network realizations) of the exponents extracted from size (*τ*, Fig. 3A) and duration (*α*, Fig. 3C) distributions. At small firing threshold (*ϵ* = 2), we do not observe scaling because the system is always active, and all avalanches merge into one. At large firing threshold *ϵ* and low input scaling *η*, we do not observe scaling because activity is so sparse that all avalanches are small. At intermediate values of the parameters, the simulations generated plausible scaling relationships in size and duration. The difference between *τ*_{fit} and *τ*_{pred} was typically less than 0.1 (Fig. 4D-F), which was consistent with previously reported differences between fit and predicted exponents (Ma et al., 2019). Thus, there appears to be no need for fine-tuning to generate apparent scaling in this model, at least in the limit of (near) infinite observation time. Wherever *η* and *ϵ* generate avalanches, there are approximate power-law distributions and crackling.

To determine where avalanches occur, we derive the avalanche rate across values of the latent variable *h*. In the quasi-static model, the probability of an avalanche initiation is the probability of a transition from the quiet to an active state. Because all neurons are conditionally independent, this is *p*_{ava} = *p*_{silence}(1 − *P*_{silence}). Then the expected number of avalanches is obtained by integrating *P*_{ava} over *h* at each value of *η* and *ϵ*:

where *p*(*h*) is the standard normal distribution. This probability tracks the observed number of avalanches across simulations, Fig. 4A.

To gain an intuition for the conditions, under which avalanches occur, we show two slices of the avalanche probability, at fixed *η* (Fig. 4B) and at fixed *ϵ* (Fig. 4C). Black regions indicate where avalanches are likely to occur. If, for a given value of *ϵ* and *η*, there is no overlap between high avalanche probability regions and the distribution of *h*, then there will be no avalanches. For large *ϵ*, avalanches occur because neurons with large coupling to the latent variable (*η*|*J _{i}*| >> 1, recall

*J*~

_{i}*N*(0, 1)) are occasionally activated by a value of the latent variable

*h*that is sufficient to exceed

*ϵ*(Fig. 4B). Thus, the scaling parameter

*η*controls the value of

*h*, for which avalanches occur most frequently (Fig. 4C). As

*ϵ*decreases, avalanches occur for smaller and smaller

*h*until avalanches primarily occur when

*h*= 0.

To calculate the probability of avalanches, we must integrate over all values of *h*, but we can gain a qualitative understanding of which avalanche regime the system is in by examining the probability of avalanches at *h* = 0. At *h* = 0, the avalanche probability (see *Methods*) is

which is maximized at *ϵ*_{0} = −log(2^{1/N} − 1), independent of *J _{i}* and

*η*. The dependence on

*N*reflects that a larger threshold is required for larger networks: large networks (

*N*→ ∞) are unlikely to achieve complete network silence, therefore preventing avalanches from occurring. Similarly, small networks (

*N*~ 1) are unlikely to fire consecutively and thus are unlikely to avalanche.

We plot *P*_{ava}(*ϵ*, *η*; *J _{i}*,

*N*,

*h*= 0) as a function of

*ϵ*in Fig. 4B. The peak at

*ϵ*

_{0}divides the space into two regions. For

*ϵ*<

*ϵ*

_{0}, a power-law is only observed in the large-size avalanches, which are rare (Fig. 4E, green). By contrast, when

*ϵ*>

*ϵ*

_{0}, minimum size cutoffs are low (Fig. 4F, orange). Both regions,

*ϵ*<

*ϵ*

_{0}and

*ϵ*>

*ϵ*

_{0}, exhibit crackling noise scaling. If observation times are not sufficiently long (estimated in Fig. 4-Supp. Fig. 1), then scaling will not be observed in the

*ϵ*<

*ϵ*

_{0}region, whose scaling relations consist of rare events. Insufficient observation times may explain experiments and simulations where avalanche scaling was not found.

## Inferring the latent variable

Our analysis of *P*_{ava}(*ϵ*, *η*, *h*) at *h* = 0 suggested that there are two types of avalanche regimes: one with high activity and high minimum cutoffs in the power law fit (Type 1), and the other with lower activity and size cutoffs (Type 2). Further, when *P*_{ava} drops to zero, avalanches disappear because the activity is too high or too low. We now examine how information about the value of the latent variables represented in the network activity relates to the activity type. To delineate these types, we calculated numerically *ϵ**(*η*), the value of *ϵ*, for which the probability of avalanches is maximized, and the contours of *P*_{ava} (Fig. 5A). Curves for *ϵ**(*η*) and *ϵ*_{0} and *P*_{ava} = 10^{−3} are shown in Fig. 5A and B.

We expect that the more cells fire, the more information they would convey, until the firing rate saturates, and inferring the value of the latent variable becomes impossible. Fig. 5B supports the prediction: generally, information is higher in regions with more activity (lower *ϵ*, higher *η*), but only up to a limit: as *α* → 0, information decreases. This decrease begins approximately where the probability of avalanches drops to nearly zero (dashed black lines, Fig. 5B-E) because all of the activity merges into a few very large avalanches. In other words, the Type-1 avalanche region coincides with the highest information about the latent variable.

The critical brain hypothesis suggests that the brain operates in a critical state, and its functional role may be in optimizing information processing (Beggs, 2008; Chialvo, 2010). Under this hypothesis, we would expect the information conveyed by the network to be maximized in the regions we observe avalanche criticality. However, we see that critical regions do not always have optimal information transmission. In Fig. 5, the region that displays crackling noise is that where avalanches exist (*P*_{ava} > 0.001), which corresponds to any *η* value and *ϵ* ≳ 3. This avalanche region encompasses both networks with high information transmission and networks with low information transmission. In summary, observing avalanche criticality in a system does not imply a high-information processing network state. However, the scaling can be seen at smaller cutoffs, and hence with shorter recordings, in the high-information state. This parallels the discussion by Schwab et al. (2014), who noticed that the Zipf’s law always emerges in neural populations driven by quasi-stationary latent fields, but it emerges at smaller system sizes when the information about the latent variable is high.

# Discussion

Here we studied systems with distributed, random coupling to Dynamical Latent Variables (DLV) and we found that avalanche criticality is nearly always observed, with no fine-tuning required. Avalanche criticality was surprisingly robust to changes in input gain and firing rate threshold. Loss of avalanche criticality could occur if the latent process was not well-sampled, either because the simulation was not long enough or the dynamics of the latent variables were too fast. Finally, while information about the latent variables in the network activity was higher where avalanches were generated compared to when they were not, there was a range of information values across the critical avalanche regime. Thus, avalanche criticality alone was not a predictor of optimal information transmission.

## Explaining experimental exponents

A wide range of critical exponents have been found in *ex vivo* and *in vivo* recordings from various systems. For instance, the seminal work on avalanche statistics in cultured neuronal networks by Beggs and Plenz (2003) found size and duration exponents of 1.5 and 2.0 respectively, along with *γ* = 2, when time was discretized with a time bin equal to the average inter-event interval in the system. These values are predicted by a theoretical model of a critical branching process. By contrast, a survey of many *in vivo* and *ex vivo* recordings found power-law size distributions with exponents ranging from 1 to 3 depending on the system (Fontenele et al., 2019). Separately, Ma et al. (2019) reported recordings in freely moving rats with size exponents ranging from 1.5 to 2.7. In all of the these recordings, when the crackling relationship held, the reported value of *γ* was near 1.2 (Fontenele et al., 2019; Ma et al., 2019).

Our DLV model, across the parameters we tested that produced exponents consistent with the scaling relationship, generated *τ* values that ranged from 1.9 to about 2.5. Across those simulations, we found values *γ* within a narrow band from 1.1 to 1.3 (see Fig. 2I,J and Fig. 3H). While the exponent values our model produces are inconsistent with a critical branching process (*γ* = 2), they match very closely the ranges of exponents reported by Fontenele et al. (2019).

One possible resolution to the discrepancy in exponents derives from how the system is subsampled in space or coarse-grained in time, both of which systematically change exponents *τ* and *α* (Beggs and Plenz, 2003; Shew et al., 2015). Were we to change the time bin, our modeling results would exhibit different exponent values. However, neither manipulations of the latent variable timescale (*τ _{F}* or

*N*), nor of the overall activity level (

_{F}*η*,

*ϵ*) produced exponents close to 1.5 and 2.0, despite maintaining the crackling relationship across many different choices of parameters.

A second possibility is that different experiments study similar, but distinct biological phenomena. In other words, the underlying biology can differ between networks that were cultured *in vitro* and those that were not, whether they are *in vivo* or *ex vivo* (i.e., brain slices). This could happen if cultured networks develop connections between neurons such that they truly do produce dynamics that approximate a critical branching process, while brain networks that develop in a living brain have different structure and resulting dynamics and can be better understood as a system coupled to latent dynamical variables. This is especially true in sensory systems, where coupling to (latent) external stimuli in a way that the neural activity can be used to infer the stimuli is the reason for the networks’ existence (Schwab et al., 2014).

## Relationship to past modeling work

Our model is not the first to produce approximate power-law size and duration distributions for avalanches from a latent variable process (Touboul and Destexhe, 2017; Priesemann and Shriki, 2018). In particular, Priesemann and Shriki (2018) derived the conditions, under which an inhomogeneous Poisson process could produce such approximate scaling. The basic idea is to generate a weighted sum of exponentially distributed event sizes, each of which are generated from a homogeneous Poisson process. How each process is weighted in this sum determines the approximate power-law exponent, allowing one to tune the system to obtain the critical values of 1.5 and 2. Interestingly, this model did not generate non-trivial scaling of size with duration (*S* ~ *D ^{γ}*). Instead, they found

*γ*= 1, not the predicted

*γ*= 2. Our results differ significantly, in that

*γ*was typically between 1.1 and 1.3 and it was nearly always close to the prediction from

*α*and

*τ*. We speculate that this is due to nonlinearity in the mapping from latent variable to spiking activity, as doubling the latent field

*h*does not double the population activity, but doubling the rate of a homogeneous Poisson process does double the expected spike count. As biological networks are likely to have such nonlinearities in their responses to common inputs, this scenario may be more applicable to certain kinds of recordings.

## Summary

Latent variables – whether they are emergent from network dynamics (Clark et al., 2022; Sederberg and Nemenman, 2020) or derived from shared inputs – are ubiquitous in large-scale neural population recordings. This fact is reflected most directly in the relatively low-dimensional structure in large-scale population recordings (Stringer et al., 2019; Pandarinath et al., 2018; Nieh et al., 2021). We previously used a model based on this observation to examine signatures of neural criticality under a coarse-graining analysis and found that coarse-grained criticality is generated by systems driven by many latent variables (Morrell et al., 2021). Here we showed that the same model also generates avalanche criticality, and that when information about the latent variables can be inferred from the network, avalanche criticality is also observed. Crucially, finding signatures of avalanche criticality required long observation times, such that the latent variable was well-sampled. Previous studies showed that Zipf’s law appears generically in systems coupled to a latent variable that changes slowly relative to the sampling time, and that the Zipf’s behavior is easier to observe in the higher information regime (Schwab et al., 2014; Aitchison et al., 2016). However, this also suggests that observation of either scaling at modest data set sizes indeed points to some fine-tuning — namely to the increase of the information in the individual neurons (and, since neurons in these models are conditionally independent, also in the entire network) about the value of the latent variables. In other words, one would expect a sensory part of the brain, if adapted to the statistics of the external stimuli, to exhibit all of these critical signatures at relatively modest data set sizes. In monocular deprivation experiments, when the activity in the visual cortex is transiently not adapted to its inputs, scaling disappears, at least for recordings of a typical duration, and is restored as the system adapts to the new stimulus (Ma et al., 2019). We speculate that the observed recovery of criticality by Ma et al. (2019) could be driven by neurons adapting to the reduced stimuli state, for instance, by adjusting *η* (input scaling) and *ϵ* (firing rate threshold). Taken together, these results suggest that critical behavior in neural systems - whether based on the Zipf’s law, avalanches, or coarse-graining analysis - is expected whenever neural recordings exhibit some latent structure in population dynamics and this latent structure can be inferred from observations of the population activity.

# Methods and Materials

## Simulation of Dynamic Latent Variable (DLV) model

We study a model from Morrell et al. (2021), incorporating only latent variables (no place variables), and assuming that every cell is coupled to every latent variable with some randomly drawn coupling strength.

The probability of observing a certain population state {*s _{i}*} given latent variables {

*h*(

_{μ}*t*)} at time

*t*is

where *Z* is the normalization, and *H* is the “energy”:

The latent variables {*h _{μ}* (

*t*)} are Ornstein-Uhlenbeck processes with zero mean, unit variance, and time constant

*τ*. Couplings

_{m}*J*are drawn from the standard normal distribution.

_{iμ}The parameters *η*, *ϵ*, and *τ _{m}* are constants, and we simulate

*N*= 1024 cells. For the infinite time constant simulation, we reset

*h*~

_{n}*N*(0, 1) (for each of

*n*= 1..

*N*) and simulate for 10000 time steps, then repeat for 1000 draws of

_{n}*h*.

_{n}### Time step units

Most results were presented using arbitrary time units: all times (i.e., *τ _{F}* and avalanche duration

*D*) are measured in units of an unspecified time step. Specifying a time bin converts the probability of firing into actual firing rates, in spikes per second, and this choice determines which part of the

*η*-

*ϵ*phase space is most relevant to a given experiment.

The time step is the temporal resolution at which activity is discretized, which varies from several to hundreds of milliseconds across different experimental studies (Beggs and Plenz, 2003; Fontenele et al., 2019; Ma et al., 2019). In physical units and assuming a bin size of 3 ms to 10 ms, our choice of *η* and *ϵ* in Fig. 2 would yield physiologically realistic firing rate ranges (Hengen et al., 2016), with high-firing neurons reaching averages rates of 20 – 50 spikes/second and median firing-rate neurons around 1 – 2 spikes/second. The timescales of latent variables examined range from about 3 seconds to 3000 seconds, assuming 3-ms bins. Simulations were carried out for the same number of time steps (2 × 10^{6}), which would be approximately 1 to 2 “hours,” which is a reasonable duration for *in vivo* neural recordings. Note that at large values of *τ _{F}*, the latent variable space is not well sampled during this time period.

## Analysis of avalanche statistics

### Setting the threshold for observing avalanches

In our model, we count avalanches as periods of continuous activity (in any subset of neurons) that is book-ended by time bins with no activity in the entire simulated neural network. For real neural populations of modest size, this method fails because there are no periods of quiescence. The typical solution is to set a threshold, and to only count avalanches when the population activity exceeds that threshold, with the hope that results are relatively robust to that choice. In our model, this operation is equivalent to changing *ϵ*, which shifts the probability of firing up or down by a constant amount across all cells independent of inputs. Our results in Fig. 3 show that *α* and *τ* decrease as the threshold for detection is increased (equivalent to large |*ϵ*|), but that the scaling relationship is maintained. The model predicts that *γ*_{pred} − *γ*_{fit} would initially increase slightly with the detection threshold before decreasing back to near zero.

Following the algorithm laid out in Clauset et al. (2009), we fit power laws to the size and duration distributions from simulations generating avalanches. We use least-squares fitting to estimate *γ*_{fit}, the scaling exponent for size with duration, assessing the consistency of the fit across decades.

### Reading power laws from data

We want, from each simulation, a quantification of the quality of scaling (how many decades, minimally) and an estimate of the scaling exponents (*τ* for the size distribution, *α* for the duration distribution). Following the steps outlined by Clauset et al. (2009), we use the maximum-likelihood estimator to determine the scaling exponent. This is the solution to the transcendental equation

where *ζ*(*α*,*x*_{min}) is the Hurwitz zeta function. For values of *x*_{min} < 6, a numerical look-up table based on the built-in Hurwitz zeta function in the symbolic math toolbox was used (MATLAB2019b). For *x*_{min} > 6 we use an approximation (Clauset et al. (2009)),

To determine *x*_{min}, we computed the maximum absolute difference between the empirical cumulative density (*S*(*x*)) function and model’s cumulative density function *P*(*x*) (the Kolmogorov-Smirnov(KS) statistic; . The KS statistic was computed between for power-law models with scaling parameter and cutoffs *x*_{min}. The value of *x*_{min} that minimizes the KS statistic was chosen. Occasionally the KS statistic had two local minima (as in Figure 2-Supplemental Figure 1), indicating two different power-laws. In these cases, the minimum size and duration cutoffs were the smallest values that were within 10% of the absolute minimum of the KS statistic. Note that the statistic is computed for each model only on the power-law portion of the CDF (i.e. *x _{i}* >

*x*

_{min}). We do not attempt to determine an upper cut-off value.

To assess the quality of the power-law fit, Clauset et al. (2009) compared the empirical observations to surrogate data generated from a semi-parametric power-law model. The semi-parametric model sets the value of the CDF equal to the empirical CDF values up to *x* = *x*_{min} and then according to the power-law model for *x* > *x*_{min}. If the KS statistic for the real data (relative to its fitted model) is within the distribution of the KS statistics for surrogate datasets relative to their respective fitted models, the power-law model was considered a reasonable fit.

Strict application of this methodology could give misleading results. Much of this is due to the loss of statistical power when the minimum cutoff is so high that the number of observations drops. For instance, in the simulations shown in Fig. 2, the one-variable duration distribution passed the Clauset et al. (2009) criterion, with a minimum KS statistic of 0.03 when the duration cutoff was 18 time steps. However, for the five-variable simulation in Fig. 2, a power-law would be narrowly rejected for both size and duration, despite having much smaller KS statistics: for *τ*, the KS statistic was 0.0087 (simulation range: 0.0008 to 0.0082; number of avalanches observed: 58, 787) and for *α* it was 0.0084 (simulation range: 0.0011 to 0.0075). Below we discuss this problem in more detail.

### Determining range, over which avalanche size scales with duration

For fitting *γ*, our aim was to find the longest sampled range, over which we have apparent power-law scaling of size with duration. Because our sampled duration values have linear spacing, error estimates are skewed if a naive goodness of fit criterion is used. We devised the following algorithm. First, the simulation must have at least one avalanche of size 500. We fit *S* = *cD ^{γ}* over one decade at a time. We chose as the lower duration cutoff the value of minimum duration, for which the largest number of subsequent (longer-duration) fits produced consistent fit parameters (Figure 2-Supp. Fig. 3 and 4, top row). Next, with the minimum duration set, we gradually increased the maximum duration cut-off, and we determined whether there was a significant bias in the residual over the first decade of the fit. We selected the highest duration cutoff, for which there was no bias. Finally, over this range, we re-fit the power law relationship and extracted confidence intervals.

Our analysis focused on finding the apparent power-law relationship that held over the largest log-scale range. A common feature across simulation parameters (*τ _{F}*,

*N*) was the existence of two distinct power-law regimes. This is apparent in Fig. 2I, which shows that when

_{F}*N*= 1 at small

_{F}*τ*, the best-fit

_{F}*γ*(that showing the largest range with power-law-consistent scaling) is much larger (> 1.7), and then above

*τ*~ 3000, the best-fit

_{F}*γ*drops to around 1.3.

### Statistical power of power-law tests

In several cases, we found examples of power-law fits that passed the rejection criteria commonly used to determine avalanche scaling relationships because of limited number of observations. A key example is that of the single latent variable simulation shown in Fig. 2B, where we could not reject a power law for the duration distribution. Conversely, strict application of the surrogate criteria would reject a power law for distributions that were quantitatively much closer to a power-law (i. e., lower KS statistic), but for which we had many more observations and thus a much stronger surrogate test (Fig. 2). This points to the difficulty of applying a single criterion to determining a power-law fit. In this work, we adhere to the criteria set forth in Clauset et al. (2009), with a modification to control for the unreasonably high statistical power of simulated data. Specifically, the number of avalanches used for fitting and for surrogate analysis was capped at 500, 000, drawn randomly from the entire pool of avalanches.

Additionally, we found examples, in which a short simulation was rejected, but increasing the simulation time by a factor of five yielded excellent power-law fits. We speculate that this arises due to insufficient sampling of the latent space. These observations raise an important biological point. Simulations provide the luxury of assuming the network is unchanging for as long as the simulator cares to keep drawing samples. In a biological network, this is not the case. Over the course of hours, the effective latent degrees of freedom could change drastically (e. g., due to circadian effects (Aton et al., 2009), changes in behavioral state (Fu et al., 2014), plasticity (Hooks and Chen, 2020), etc.), and the network itself (synaptic scaling, firing thresholds, etc.) could be plastic (Hengen et al., 2016). All of these factors can be modeled in our framework by determining appropriate cutoffs (in duration of recording, in time step sizes, for activity distributions) based on specific experimental timescales.

### Calculation of avalanche regimes

In the quasistatic model, we derive the dependence of the avalanche rate on *η*, *ϵ* and number of neurons *N*, finding that there are two distinct regimes, in which avalanches occur. Each time bin is independent, conditioned on the value of *h*. For an avalanche to occur, the probability of silence in the population (i.e., all *s _{i}* = 0) must not be too close to 0 (or there are no breaks in activity) or too close to 1 (or there is no activity). At fixed

*h*, the probability of silence is

An avalanche occurs when a silent time bin is followed by an active bin, which has probability *P*_{ava}(*ϵ*, *η*; *J _{i}*,

*N*,

*h*) =

*P*

_{silence}(1 −

*P*

_{silence}).

## Information calculation

### Maximum-likelihood decoding

For large populations coupled to a single latent variable, we estimated the information between population spiking activity and the latent variable as the information between the maximum-likelihood estimator *h** of the latent variable *h* and the latent variable itself. This approximation fails at extremes of network activity levels (low or high).

Specifically, we approximated the log-likelihood of *h** given *h*_{true} near *h** by . Thus we assume that *h** is normally distributed about *h*_{true} with variance *σ*^{2}(*h*_{true}). The variance is then derived from the curvature of the log-likelihood at the maximum. The information between two Gaussian variables, here and *p*(*h*) = *N*(0, 1), is

where the average is taken over *h*_{true} ~ *N*(0, 1).

Given a set of *T* observations of the neurons {*s _{i}*}, the likelihood is

Maximizing the log likelihood gives the following condition:

where is the average over observations *t*. The uncertainty in *h** is *σ _{h}*, which was calculated from the second derivative of the log likelihood:

This expression depends on the observations only through the maximum-likelihood estimate *h**. When *h** → *h*_{true}, then the variance is

To generate Figure 5, we evaluated Eqn. 10 using Eqn. 18.

# Acknowledgements

IN was supported in part by the Simons Foundation Investigator program, the Simons-Emory Consortium on Motor Control, NSF grant BCS/1822677 and NIH grant 2R01NS084844. AS was supported in part by NIH grant 1RF1MH130413-01 and by startup funds from the University of Minnesota Medical School.

# References

- Brain-wide neuronal dynamics during motor adaptation in zebrafish
*Nature***485**:471–477 - Zipf’s Law Arises Naturally When There Are Underlying, Unobserved Variables:1–32https://doi.org/10.1371/journal.pcbi.1005110
- Mechanisms of Sleep-Dependent Consolidation of Cortical Plasticity:454–466https://doi.org/10.1016/j.neuron.2009.01.007
- The criticality hypothesis: how local cortical networks might optimize information processing
*Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences***366**:329–343 - Neuronal Avalanches in Neocortical Circuits:11167–11177https://doi.org/10.1523/JNEUROSCI.23-35-11167.2003
- Avalanches, Barkhausen Noise, and Plain Old Criticality:4528–4531https://doi.org/10.1103/PhysRevLett.75.4528
- Emergent complex neural dynamics
*Nature physics***6**:744–750 - Dimension of Activity in Random Neural Networkshttps://doi.org/10.48550/ARXIV.2207.12373
- Power-law distributions in empirical data
*SIAM review***51**:661–703 - Second type of criticality in the brain uncovers rich multiple-neuron dynamics:13051–13060https://doi.org/10.1073/pnas.1818972116
- Criticality between Cortical Stateshttps://doi.org/10.1103/PhysRevLett.122.208101
- A Cortical Circuit for Gain Control by Behavioral State:1139–1152https://doi.org/10.1016/j.cell.2014.01.050
- Brain criticality beyond avalanches: open problems and how to approach themhttps://doi.org/10.1088/2632-072X/ac2071
- Neuronal Firing Rate Homeostasis Is Inhibited by Sleep and Promoted by Wake:180–191https://doi.org/10.1016/j.cell.2016.01.046
- Circuitry Underlying Experience-Dependent Plasticity in the Mouse Visual System:21–36https://doi.org/10.1016/j.neuron.2020.01.031
- Probabilistic models for neural populations that naturally capture global coupling and criticality:1–26https://doi.org/10.1371/journal.pcbi.1005763
- Cortical circuit dynamics are homeostatically tuned to criticality in vivo
*Neuron***104**:655–664 - Context-dependent computation by recurrent dynamics in prefrontal cortex
*Nature***503**:78–84 - Neuronal Avalanches Across the Rat Somatosensory Barrel Cortex and the Effect of Single Whisker Stimulation
*Frontiers in Systems Neuroscience* - Transient Dynamics versus Fixed Points in Odor Representations by Locust Antennal Lobe Projection Neurons
*Neuron***48**:661–673 - Coarse Graining, Fixed Points, and Scaling in a Large Population of Neuronshttps://doi.org/10.1103/PhysRevLett.123.178103
- Are Biological Systems Poised at Criticality?
*Journal of Statistical Physics - J STATIST PHYS***12**https://doi.org/10.1007/s10955-011-0229-4 - Quasiuniversal scaling in mouse-brain neuronal activity stems from edge-of-instability critical dynamics
*Proceedings of the National Academy of Sciences***120** - Latent Dynamical Variables Produce Signatures of Spatiotemporal Criticality in Large Biological Systemshttps://doi.org/10.1103/PhysRevLett.126.118302
- Geometry of abstract learned knowledge in the hippocampus:80–84https://doi.org/10.1038/s41586-021-03652-7
- How critical is brain criticality?:820–837https://doi.org/10.1016/j.tins.2022.08.007
- Inferring single-trial neural population dynamics using sequential auto-encoders:805–815https://doi.org/10.1038/s41592-018-0109-9
- Spontaneous cortical activity in awake monkeys composed of neuronal avalanches:15921–15926https://doi.org/10.1073/pnas.0904089106
- Self-Organized Criticality in the Brainhttps://doi.org/10.3389/fphy.2021.639389
- Avalanche dynamics of human brain oscillations: Relation to critical branching processes and temporal correlations
*Human brain mapping***7**:770–7https://doi.org/10.1002/hbm.20590 - Whole-Brain Neuronal Activity Displays Crackling Noise Dynamics:1446–1459https://doi.org/10.1016/j.neuron.2018.10.045
- Can a time varying external drive give rise to apparent criticality in neural systems?:1–29https://doi.org/10.1371/journal.pcbi.1006081
- Zipf’s Law and Criticality in Multivariate Data without Fine-Tuninghttps://doi.org/10.1103/Phys-RevLett.113.068102
- Randomly connected networks generate emergent selectivity and predict decoding properties of large populations of neurons:1–19https://doi.org/10.1371/journal.pcbi.1007875
- Adaptation to sensory input tunes visual cortex to criticality:659–663https://doi.org/10.1038/nphys3370
- Spontaneous behaviors drive multidimensional, brainwide activityhttps://doi.org/10.1126/science.aav7893
- Power-law statistics and universal scaling in the absence of criticalityhttps://doi.org/10.1103/PhysRevE.95.012413