1 Introduction

Geometric analysis of neuronal population activity has revealed the fundamental structures of neural representations and brain dynamics (14). Dimensionality reduction methods, which identify collective or latent variables in neural populations, simplify our view of high-dimensional neural data (5). Their applications to optical and multi-electrode recordings have begun to reveal important mechanisms by which neural cell assemblies process sensory information (6, 7), make decisions (8, 9), maintain working memory (10) and generate motor behaviors (1, 1113).

In the past decade, the number of neurons that can be simultaneously recorded in vivo has grown exponentially (11, 1421). This increase spans various brain regions (6, 16, 22) and the entire mammalian brain (23, 24). As more neurons are recorded, the multidimensional neural activity space, with each axis representing a neuron’s activity level (Fig. 1A), becomes more complex. The changing size of observed cell assemblies raises a number of basic questions. How does this space’s geometry evolve and what structures remain invariant with increasing number of neurons recorded? A key measure, the effective dimension or participation ratio (denoted as DPR, Fig. 1B), captures a major part of variability in neural activity (2529). How does DPR vary with the number of sampled neurons (Fig. 1A)? Two scenarios are possible: DPR grows continuously with more sampled neurons; DPR saturates as the sample size increases. Which scenario fits the brain? Furthermore, even if two cell assemblies have the same DPR, they can have different shapes (Fig. 1C). How does the shape vary with the number of neurons sampled? Lastly, are we going to observe the same picture of neural activity space when using different recording methods such as two-photon microscopy, which records all neurons in a brain region, versus Neuropixels (16), which conducts a broad random sampling of neurons?

The relationship between the geometric properties of the neural activity space and the size of neural assemblies.

A. Illustration of how dimensionality of neural activity (DPR) changes with the number of recorded neurons. B. The eigenvalues of the neural covariance matrix dictate the geometrical configuration of the neural activity space with being the distribution width along a principal axis. C. Examples of two neural populations with identical dimensionality (DPR = 25/11 ≈ 2.27) but different spatial configurations, as revealed by the eigenvalue spectrum (green: {λi} = {7, 7, 1}, blue: {λi} = {9, 3, 3}).

Here, we aim to address these questions by analyzing brain-wide Ca2+ activity in larval zebrafish during hunting or spontaneous behavior (Fig. 2A) recorded by Fourier light-field microscopy (30). The small size of this vertebrate brain, together with the volumetric imaging method, enables us to capture a significant amount of neural activity across the entire brain simultaneously. To characterize the geometry of neural activity beyond its dimensionality DPR, we examine the eigenvalues or spectrum of neural covariance (31) (Fig. 1C). The covariance spectrum has been instrumental in offering mechanistic insights into neural circuit structure and function, such as the effective strength of local recurrent interactions and the depiction of network motifs (29, 31, 32). Intriguingly, we find that both the dimensionality and covariance spectrum remain invariant for cell assemblies that are randomly selected from various regions of the zebrafish brain. We also verify this observation in datasets recorded by different experimental methods, including light-sheet imaging of larval zebrafish (33), two-photon imaging of mouse visual cortex (23), and multi-area Neuropixels recording in the mouse (23). To explain the observed phenomenon, we model the covariance matrix of brain-wide activity by generalizing the Euclidean Random Matrix (ERM) (34) such that neurons correspond to points distributed in a d-dimensional functional or feature space, with pairwise correlation decaying with distance. The ERM theory, studied in theoretical physics (34, 35), provides extensive analytical tools for a deep understanding of the neural covariance matrix model, allowing us to unequivocally identify three crucial factors for the observed scale invariance.

Whole-brain calcium imaging of zebrafish neural activity and the phenomenon of its scale-invariant covariance eigenspectrum.

A. Rapid light-field Ca2+ imaging system for whole brain neural activity in larval zebrafish. B. Inferred firing rate activity from the brain-wide calcium imaging. The ROIs are sorted by their weights in the first principal component (23). C. Procedure of calculating the covariance spectrum on the full and sampled neural activity matrices. D. Dimensionality (circles, average across 8 samplings (dots)), as a function of the sampling fraction. The curve is the predicted dimensionality using Eq. (5). E. Iteratively sampled covariance matrices. Neurons are sorted in each matrix to maximize values near the diagonal. F. The covariance spectra, i.e., eigenvalue vs. rank/N, for randomly sampled neurons of different sizes (colors). The gray dots represent the sorted variances Cii of all neurons. G to I. Same as F but from three models of covariance (see details in Methods): (G) a Wishart random matrix calculated from a random activity matrix of the same size as the experimental data; (H) replacing the eigenvectors by a random orthogonal set; (I) covariance generated from a randomly connected recurrent network.

Building upon our theoretical results, we further explore the connection between the spatial arrangement of neurons and their locations in functional space, which allows us to distinguish among three sampling approaches: random sampling, anatomical sampling (akin to optical recording of all neurons within a specific region of the brain) and functional sampling (20). Our ERM theory makes distinct predictions regarding the scaling relationship between dimensionality and the size of cell assembly, as well as the shape of covariance eigenspetrum under various sampling methods. Taken together, our results offer a new perspective for interpreting brain-wide activity and unambiguously show its organizing principles, with unexplored consequences for neural computation.

2 Results

2.1 Geometry of neural activity across random cell assemblies in zebrafish brain

We recorded brain-wide Ca2+ activity at a volume rate of 10 Hz in head-fixed larval zebrafish (Fig. 2A) during hunting attempts (Methods) and spontaneous behavior using a Fourier light field microscopy (30). Approximately 2000 ROIs (1977.3 ± 677.1, mean ± SD) with a diameter of 16.84 ± 8.51 µm were analyzed per fish based on voxel activity (Methods). These ROIs likely correspond to multiple nearby neurons with correlated activity. Henceforth, we refer to the ROIs as “neurons” for simplicity.

We first investigate the dimensionality of neural activity DPR (Fig. 1B) in a randomly chosen cell assembly in zebrafish, similar to multi-area Neuropixels recording in a mammalian brain. We focus on how DPR changes with a large sample size N. We find that if the mean squared covariance remains finite instead of vanishing with N, the dimensionality DPR (Fig. 1B) becomes sample size independent and depends only on the variance and the covariance Cij between neurons i and j:

where E(…) denotes average across neurons (Methods and (29)). The finite mean squared covariance condition is supported by the observation that the neural activity covariance Cij is positively biased and widely distributed with a long tail (Fig. S2A). As predicted, the data dimensionality grows with sample size and reaches the maximum value specified by Eq. (1) (Fig. 2D).

Next, we investigate the shape of the neural activity space described by the eigenspectrum of the covariance matrix derived from the activity of N randomly selected neurons (Fig. 2C). When the eigenvalues are arranged in descending order and plotted against the normalized rank r/N, where r = 1,…,N, (we refer to it as the rank plot), this curve shows an approximate power law that spans 10 folds. Interestingly, as the size of the covariance matrices decreases (N decreases), the eigenspectrum curves nearly collapse over a wide range of eigenvalues. This pattern holds across diverse datasets and experimental techniques (Fig. 2F, Fig. S2E-L). The similarity of the covariance matrices of randomly sampled neural populations can be intuitively visualized (Fig. 2E), after properly sorting the neurons (Methods).

The scale invariance in the neural covariance matrix is non-trivial. The spectrum is not scale-invariant in a common covariance matrix model based on independent noise (Fig. 2G). It is also absent when replacing the neural covariance matrix eigenvectors with random ones, keeping the eigenvalues identical (Fig. 2H). A recurrent neural network with random connectivity (31) also fails to produce the scale-invariant covariance spectrum (Fig. 2I). Thus, a new model for the covariance matrix of neural activity is needed.

2.2 Modeling covariance by organizing neurons in functional space

Dimension reduction methods simplify and visualize complex neuron interactions by embedding them into a low-dimensional map, within which nearby neurons have similar activities. Inspired by these ideas, we use the Euclidean Random Matrix (ERM (34)) to model neural covariance. Imagine sprinkling neurons uniformly distributed on a d-dimensional functional space of size L (Fig. 3A), where the distance between neurons i and j affects their correlation. Let represent the functional coordinate of the neuron i. The distance-correlation dependency is described by kernel function with f (0) = 1, indicating closer neurons have stronger correlations, and decreases as distance increases (Fig. 3A and Methods). To model the covariance, we extend the ERM by incorporating heterogeneity of neuron activity levels (shown as the size of the neuron in the functional space in Fig. 3A)

The variance of neural activity is drawn i.i.d. from a given distribution and is independent of neurons’ position.

ERM model of covariance and its eigenspectrum.

A. Schematic of the Euclidean Random Matrix (ERM) model, which reorganizes neurons (circles) from the anatomical space to the functional space (here d =2 is a two-dimensional box). The correlation between a pair of neurons decreases with their distance in the functional space according to a kernel function . This correlation is then scaled by neurons’ variance (circle size) to obtain the covariance Cij. B. An example ERM correlation matrix (i.e., when ). C. Spectrum (same as Fig. 2F) for the ERM correlation matrix in B. D. Visualizing the distribution of the same ERM eigenvalues in C by plotting the probability density function (pdf).

This multidimensional functional space may represent attributes to which neurons are tuned, such as sensory features (e.g., visual orientation (36), auditory frequency) and movement characteristics (e.g., direction, speed (37, 38)). In sensory systems, it represents stimuli as neural activity patterns, with proximity indicating similarity in features. For motor control, it encodes movement parameters and trajectories. In the hippocampus, it represents the place field of a place cell, acting as a cognitive map of physical space (3941).

We first explore the ERM with various forms of and find that fast-decaying functions like Gaussian and exponential functions do not produce eigenspectra similar to the data and no scale invariance over random sampling (Fig. S4A-H and Supp. Note). Thus, we turn to slow-decaying functions including the power law, which produce spectra similar to the data (Fig. 3C,D; see also Fig. S5). We adopt a particular kernel function because of its closed-form and analytical properties: (Methods). For large distance , it approximates a power law and smoothly transitions at small distance to satisfy the correlation requirement f (0) = 1 (Fig. S7I, J).

2.3 Analytical theory on the conditions of scale invariance in ERM

To determine the conditions for scale invariance in ERM, we analytically calculate the eigenspectrum of covariance matrix C (Eq. (2)) for large N, L using the replica method (34). A key order parameter emerging from this calculation is the neuron density ρ:= N/Ld. In the high-density regime ρ ϵd ≈ 1, the covariance spectrum can be approximated in a closed form (Methods). For the slow-decaying kernel function defined above, the spectrum for large eigenvalues follows a power law (Supp. Note):

where r is the rank of the eigenvalues in descending order and p(λ) is their probability density function. Eq. (3) intuitively explains the scale invariance over random sampling. Sampling in the ERM reduces the neuron density ρ. The eigenspectrum is ρ-independent whenever µ/d ≈ 0. This indicates two factors contributing to the scale invariance of the eigenspectrum. First, a small exponent µ in the kernel function means that pairwise correlations slowly decay with functional distance and can be significantly positive across various functional modules and throughout the brain. For a given µ, an increase in dimension d improves the scale invariance. The dimension d could represent the number of independent features or latent variables describing neural activity or cognitive states.

We verify our theoretical predictions by comparing sampled eigenspectra in finite-size simulated ERMs across different µ and d (Fig. 4A). We first consider the case of homogeneous neurons ( in Eq. (2), revisited later) in these simulations (Fig. 3C, D and Fig. 4A), making C’s entries correlation coefficients. To quantitatively assess the level of scale invariance, we introduce a collapse index (CI) motivated by Eq. (3). In the log-log scale rank plot, Eq. (3) shows the spectrum shifts vertically with ρ. Thus, we define CI as this average displacement (Fig. 4A upper right, Methods), and a smaller CI means more scale-invariant. Using CI, Fig. 4A shows that scale invariance improves with slower correlation decay as µ decreases and the functional dimension d increases. Conversely, with large µ and small d, the covariance eigenspectrum varies significantly with scale (Fig. 4A).

Three factors contributing to scale invariance.

A. Impact of µ and d (see text) on the scale invariance of ERM spectrum (same plots as Fig. 3C) with. The degree of scale invariance is quantified by the collapse index (CI), which essentially measures the area between different spectrum curves (upper right inset). For comparison, we fix the same coordinate range across panels hence some plots are cropped. B. Top: sampled correlation matrix spectrum in an example animal (fish 1). Bottom: Same as top but for the covariance matrix that incorporates heterogeneous (gray dots). C. The CI of the correlation matrix (filled squares) is found to be larger than that for the covariance matrix (opened squares) across different datasets: f1 to f6: six light-field zebrafish data (10 Hz per volume, this paper); fl: light-sheet zebrafish data (2 Hz per volume, (33)); mn: mouse Neuropixels data (downsampled to 10 Hz per volume); mp: mouse two-photon data, (3 Hz per volume, (23)).

Next, we consider the general case of unequal neural activity levels and check for differences between the correlation (equivalent to) and covariance matrix spectra. Using the collapsed index (CI), we compare the scale invariance of the two spectra in experimental data. Intriguingly, the CI of the covariance matrix is consistently smaller (more scale-invariant) across all datasets (Fig. 4C, Fig. S6C, open vs. closed squares), indicating that the heterogeneity of neuronal activity variances significantly affects the eigenspectrum and the geometry of neural activity space (42). By extending our spectrum calculation to the intermediate density regime ρ ϵd ≪ 1 (Methods), we show that the ERM model can quantitatively explain the improved scale invariance in the covariance matrix compared to the correlation matrix (Fig. S6B).

Lastly, we examine factors that turn out to have minimal impact on the scale invariance of the covariance spectrum. First, the shape of the kernel function over a small distance does not affect the distribution of large eigenvalues (Fig. S7, table S3, Fig. S9A). This supports our use of a specific to represent a class of slow-decaying kernels. Second, altering the spatial distribution of neurons in the functional space, whether using a Gaussian, uniform, or clustered distribution, does not affect large covariance eigenvalues, except possibly the leading ones (Fig. S9B, Methods). Third, different geometries of the functional space, such as a flat square, a sphere, or a hemisphere, result in eigenspectra similar to the original ERM model (Fig. S9C). These findings indicate that our theory for the covariance spectrum’s scale invariance is robust to various modeling details.

2.4 Connection among random sampling, functional sampling, and anatomical sampling

So far, we have focused on random sampling of neurons, but how does the neural activity space change with different sampling methods? To this end, we consider three methods (Fig. 5A1): random sampling (RSap), anatomical sampling (ASap) where neurons in a brain region are captured by optical imaging (6, 43, 44), and functional sampling (FSap) where neurons are selected based on activity similarity (20). The difference among sampling methods depends on the neuron organization throughout the brain. If anatomically localized neurons also cluster functionally (Fig. 5A4), ASap FSap; if they are spread in the functional space (Fig. 5A2), ASap ≈ RSap. Generally, the anatomical-functional relationship is in-between and can be quantified using the Canonical Correlation Analysis (CCA). This technique finds axes (CCA vectors and) in anatomical and functional spaces such that the neurons’ projection along these axes has the maximum correlation, RCCA. The extreme scenarios described above correspond to RCCA =1 and RCCA = 0.

The relationship between the functional and anatomical space and theoretical predictions.

A. Three sampling methods (A1) and RCCA (see text). When RCCA ≈ 0 (A2), the anatomical sampling (ASap) resembles the random sampling (RSap), and while when RCCA ≈ 1 (A4), ASap is similar to the functional sampling (FSap). B. Distribution of neurons in the functional space inferred by MDS. Each neuron is color-coded by its projection along the first canonical direction in the anatomical space (see text). Data based on fish 6, same for C to E. C. Similar to B. but plotting neurons in the anatomical space with color based on their projection along in the functional space (see text). D. Dimensionality (DPR) across sampling methods: average DPR under RSap (circles), average and individual brain region DPR under ASap (squares and dots), and DPR under FSap for the most correlated neuron cluster (triangles; Methods). Dashed and solid lines are theoretical predictions for DPR under RSap and FSap, respectively (Methods). E. The CI of correlation matrices under three sampling methods in 6 animals (colors). **p<0.01; ***p<0.001; one-sided paired t tests: RSap vs. ASap, p = 0.0010; RSap vs. FSap, p = 0.0004; ASap vs. FSap, p = 0.0014.

To determine the anatomical-functional relationship in neural data, we infer the functional coordinates of each neuron by fitting the ERM using multidimensional scaling (MDS) (45) (Methods). For simplicity and better visualization, we use a low-dimensional functional space where d = 2. The fitted functional coordinates confirm the slow decay kernel function in ERM except for a small distance (Fig. S12). The ERM with inferred coordinates also reproduces the experimental covariance matrix, including cluster structures (Fig. S11) and its sampling eigenspectra (Fig. S10).

Equipped with the functional and anatomical coordinates, we next use CCA to determine which scenarios illustrated in Fig. 5A align better with the neural data. Fig. 5B,C shows a representative fish with a significant RCCA = 0.327 (p-value=0.0042, Anderson–Darling test). Notably, the CCA vector in the anatomical space, , aligns with the rostrocaudal axis. Coloring each neuron in the functional space by its projection along shows a correspondence between clustering and anatomical coordinates (Fig. 5B). Similarly, coloring neurons in the anatomical space (Fig. 5C) by their projection along reveals distinct localizations in regions like the forebrain and optic tectum. Across animals, functionally clustered neurons show anatomical segregation (33), with an average RCCA of 0.335±0.054 (mean±SD).

Next, we investigate the effects of different sampling methods (Fig. 5A1) on the geometry of the neural activity space when there is a significant but moderate anatomical-functional correlation as in the experimental data. Interestingly, dimensionality in data under anatomical sampling consistently falls between random and functional sampling values (Fig. 5D). This phenomenon can be intuitively explained by the ERM theory. Recall that for large N, the key term in Eq. (1) is. For a fixed number of sampled neurons, this average squared covariance is maximized when neurons are selected closely in the functional space (FSap) and minimized when distributed randomly (RSap). Thus, RSap and FSap DPR set the upper and lower bounds of dimensionality, with ASap expected to fall in between. This reasoning can be precisely formulated to obtain quantitative predictions of the bounds (Methods). We predict the ASap dimension at large N as

Here DPR is the dimensionality under RSap (Eq. (1)), k represents the fraction of sampled neurons. RASap is the correlation between anatomical and functional coordinates along the direction where the anatomical subregions are divided (Methods), and it is bounded by the canonical correlation RASapRCCA. When RASap = 0, we get the upper bound (Fig. 5D dashed line). The lower bound is reached when RASap = RCCA =1 (Fig. 5A4), where Eq. (4) shows a scaling relationship that depends on the sampling fraction k (Fig. 5D solid line). This contrasts with the k-independent dimensionality of RSap in Eq. (1). Furthermore, if RASap and its upper bound is not close to 1 (precisely RASap ≤ 0.84 for the ERM model in Fig. 5D), align closer to the upper bound of RSap. This prediction agrees well with our observations in data across animals (Fig. 5D, Fig. S20 and Fig. S21).

Beyond dimensionality, our theory predicts the difference in the covariance spectrum between sampling methods based on the neuronal density ρ in the functional space (Eq. (3)). This density ρ remains constant during Fsap (Fig. 5A1) and decreases under RSap; the average density across anatomical regions ⟨ ρ ⟩in ASap lies between those of FSap and RSap. Analogous to Eq. (4), the relationship in ρ orders the spectra: ASap’s spectrum lies between those of FSap and RSap (Methods). This further implies that the level of scale invariance under ASap should fall between that of RSap and FSap, which is confirmed by our experimental data (Fig. 5E).

3 Discussion

Impact of hunting behavior on scale invariance and functional space organization

How does task-related neural activity shape the covariance spectrum and brain-wide functional organization? We examine the hunting behavior in larval zebrafish, marked by eye convergence (both eyes move inward to focus on the central visual field) (46). We find that scale invariance of the eigenspectra persists when the hunting frames from the Ca2+ imaging data are removed (Fig. 4C, Fig. S15AB, Methods). This is consistent with the scale-invariant spectrum found in other data sets during spontaneous behaviors (Fig. S10F, Fig. S2GH), suggesting scale invariance is a general phenomenon.

Interestingly, in the inferred functional space, we observe reorganizations of neurons after removing hunting behavior (Fig. S15C, D). Neurons in one cluster disperse from their center of mass (Fig. S15D) and decreases the local neuronal density ρ (Methods and Fig. S15E). The neurons in this dispersed cluster have a consistent anatomical distribution from the midbrain to the hindbrain in 4 out of 5 fish (Fig. S17). During hunting, the cluster has robust activations that are widespread in the anatomical space but localized in the functional space(Movie. S1).

Our findings suggest that the functional space could be defined by latent variables that represent cognitive factors such as decision-making, memory, and attention. These variables set the space’s dimensions, with neural activity patterns reflecting cognitive state dynamics. Functionally related neurons – through sensory tuning, movement parameters, internal conditions, or cognitive factors – become closer in this space, leading to stronger activity correlations.

Criticality and power law

What drives brain dynamics with a slow-decaying distance-correlation function in functional space? Long-range connections and a slow decline in projection strength over distance (47) may cause extensive correlations, enhancing global activity patterns. This behavior is also reminiscent of phase transitions in statistical mechanics (48), where local interactions lead to expansive correlated behaviors. Studies suggest that critical brains optimize information processing (49, 50). The link between neural correlation structures and neuronal connectivity topology is an exciting area for future exploration.

In the high-density regime of the ERM model, the eigenvalue rank plot (Eq. (3)) follows a power law λ ∼rα, with α < 1. The scale invariant spectrum occurs when is close to 1. Experimental data, however, align more closely with the model in the intermediate-density regime, where the power-law spectrum is an approximation and the decay is slower (for ERM model Fig. S3BC, and for data = 0.47 ± 0.08, mean ± SD, n =6 fish). Stringer et al. (6) found an α ≈ 1 decay in the mouse visual cortex’s stimulus trial averaged covariance spectrum, suggesting this decay optimizes visual code efficiency and smoothness. Our study differs as we recorded brain-wide activity during spontaneous or hunting behavior, calculating neural covariance from single-trial activity. Much of the neural activity was not driven by sensory stimulus and unrelated to specific tasks, requiring a different interpretation of the neural covariance spectrum.

We draw inspiration from the renormalization group (RG) approach to navigate neural covariance across scales, which has also been explored in the recent literature. Following Kadanoff’s block spin transformation (48), Meshulam et al. (20) formed size-dependent neuron clusters and their covariance matrices by iteratively pairing the most correlated neurons and placing them adjacent on a lattice. The groups expanded until the largest reached the system size. The RG process, akin to spatial sampling in functional space (FSap), maintains constant neuron density ρ. Thus, for any kernel function, including the power law and exponential, the covariance eigenspectrum remains invariant across scales (Fig. S19A,B,D,E).

Bounded dimensionality under random sampling

The saturation of the dimensionality DPR at large sample sizes indicates a limit to neural assembly complexity, evidenced by the finite mean square covariance. This is in contrast with neural dynamics models such as the balanced excitatory-inhibitory (E-I) neural network (51), where resulting in an unbounded dimensionality (see Supp. Note). Our results suggest that the brain encodes experiences, sensations, and thoughts using a finite set of dimensions instead of an infinitely complex neural activity space.

We found that the relationship between dimensionality and the number of recorded neurons depends on the sampling method. For functional sampling, the dimensionality scales with the sampling fraction. This suggests that if anatomically sampled neurons are functionally clustered, as with cortical neurons forming functional maps, the increase in dimensionality with neuron number may seem unbounded. This offers new insights for interpreting large-scale neural activity data recorded under various techniques.

Computational benefits of a scale-invariant covariance spectrum

The scale invariance of neural activity across different neuron assembly sizes could support efficient multiscale information encoding and processing. It indicates that the neural code is robust and requires minimal adjustments despite changes in population size. One recent study shows that randomly sampled and coarse-grained macrovoxels can predict population neural activity (52), reinforcing that a random neuron subset may capture overall activity patterns. This enables downstream circuits to readout and process information through random projections (27).

Understanding how dimensionality and spectrum change with sample size also suggests the possibility of extrapolating from small samples to overcome experimental limitations. This is particularly feasible when µ/d → 0, where the dimensionality and spectrum under anatomical, random, and functional sampling coincide (Equations (3) and (4)). Developing extrapolation methods and exploring the benefits of scale-invariant neural code are promising future research directions.

4 Materials and Methods

4.1 Experimental methods

The handling and care of the zebrafish complied with the guidelines and regulations of the Animal Resources Center of the University of Science and Technology of China (USTC). All larval zebrafish (huc:h2b -GCaMP6f) were raised in E2 embryo medium (comprising 7.5 mM NaCl, 0.25 mM KCl, 0.5 mM MgSO4, 0.075 mM KH2PO4, 0.025 mM Na2HPO4, 0.5 mM CaCl2, and 0.35 mM NaHCO3; containing 0.5 mg/L methylene blue) at 28.5 °C and with a 14-h light and 10-h dark cycle.

To induce hunting behavior (composed of motor sequences like eye convergence and J turn) in larval zebrafish, we fed them a large amount of paramecia over a period of 4-5 days post-fertilization (dpf). The animals were then subjected to a 24-hour starvation period, after which they were transferred to a specialized experimental chamber. The experimental chamber was 20mm in diameter and 1mm in depth, and the head of each zebrafish was immobilized by applying 2% low melting point agarose. The careful removal of the agarose from the eyes and tail of the fish ensured that these body regions remained free to move during hunting behavior. Thus, characteristic behavioral features such as J-turns and eye convergence could be observed and analyzed. Subsequently, the zebrafish were transferred to an incubator and stayed overnight. At 7 dpf, several paramecia were introduced in front of the previously immobilized animals, each of which was monitored by a stereomicroscope. Those displaying binocular convergence were selected for subsequent Ca2+ imaging experiments.

We developed a novel optomagnetic system that allows (1) precise control of the trajectory of the paramecium and (2) imaging brain-wide Ca2+ activity during the hunting behavior of zebrafish. To control the movement of the paramecium, we treated these microorganisms with a suspension of ferric tetroxide for 30 minutes and selected those that responded to its magnetic attraction. A magnetic paramecium was then placed in front of a selected larva, and its movement was controlled by changing the magnetic field generated by Helmholtz coils that were integrated into the imaging system. The real-time position of the paramecium, captured by an infrared camera, was identified by online image processing. The positional vector relative to a predetermined target position was calculated. The magnitude and direction of the current in the Helmholtz coils were adjusted accordingly, allowing for precise control of the magnetic field and hence the movement of the paramecium. Multiple target positions could be set to drive the paramecium back and forth between multiple locations.

Table of notations.

The experimental setup consisted of head-fixed larval zebrafish undergoing two different types of behavior: induced hunting behavior by a moving paramecium in front of a fish (fish 1-5), and spontaneous behavior without any visual stimulus as a control (fish 6). Experiments were carried out at ambient temperature (ranging from 23°C to 25°C). The behavior of the zebrafish was monitored by a high-speed infrared camera (Basler acA2000-165umNIR, 0.66x) behind a 4F optical system and recorded at 50 Hz. Brain-wide Ca2+ imaging was achieved using XLFM. Light-field images were acquired at 10 Hz, using customized LabVIEW software (National Instruments, USA) or Solis software (Oxford Instruments, UK), with the help of a high-speed data acquisition card (PCIe-6321, National Instruments, USA) to synchronize the fluorescence with behavioral imaging.

4.1.1 Behavior analysis

The background of each behavior video was removed using the clone stamp tool in Adobe Photoshop CS6. Individual images were then processed by an adaptive thresholding algorithm, and fish head and yolk were selected manually to determine the head orientation. The entire body centerline, extending from head to tail, was divided into 20 segments. The amplitude of a bending segment was defined as the angle between the segment and the head orientation. To identify the paramecium in a noisy environment, we subtracted a background image, averaged over a time window of 100 s, from all the frames. The major axis of the left or right eye was identified using DeepLabCut (53). The eye orientation was defined as the angle between the rostrocaudal axis and the major axis of an eye; The convergence angle was defined as the angle between the major axes of the left and right eyes. An eye-convergence event was defined as a period of time where the angle between the long axis of the eyes stayed above 50 degrees (46).

4.1.2 Imaging data acquisition and processing

We used a fast eXtended light field microscope (XLFM, with a volume rate of 10 Hz) to record Ca2+ activity throughout the brain of head-fixed larval zebrafish. Fish were ordered by the dates of experiments. As previously described (30), we adopted the Richardson-Lucy deconvolution method to iteratively reconstruct 3D fluorescence stacks (600 × 600 × 250) from the acquired 2D images (2048 × 2048). This algorithm requires an experimentally measured point spread function (PSF) of the XLFM system. The entire recording for each fish is 15.3±4.3 min (mean±SD).

To perform image registration and segmentation, we first cropped and resized the original image stack to 400 × 308 × 210, which corresponded to the size of a standard zebrafish brain (zbb) atlas (54). This step aimed to reduce substantial memory requirements and computational costs in subsequent operations. Next, we picked a typical volume frame and aligned it with the zbb atlas using a basic 3D affine transformation. This transformed frame was used as a template. We aligned each volume with the template using rigid 3D intensity-based registration (55) and non-rigid pairwise registration (56) in the Computational Morphometry Toolkit (CMTK) (https://www.nitrc.org/projects/cmtk/). After voxel registration, we computed the pairwise correlation between nearby voxel intensities and performed the watershed algorithm on the correlation map to cluster and segment voxels into consistent ROIs across all volumes. We defined the diameter of each ROI using the maximum Feret diameter (the longest distance between any two voxels within a single ROI).

Finally, we adopted the “OASIS” deconvolution method to denoise and infer neural activity from the fluorescence time sequence (57). The deconvolved Δ F/F of each ROI was used to infer firing rates for subsequent analysis.

4.2 Other experimental datasets analyzed

Resources for additional experimental datasets

To validate our findings across different recording methods and animal models, we also analyzed three additional datasets. We include a brief description below for completeness. Further details can be found in the respective reference. The first dataset includes whole-brain light-sheet Ca2+ imaging of immobilized larval zebrafish in the presence of visual stimuli as well as in a spontaneous state (33). Each volume of the brain was scanned through 2.11 ± 0.21 planes per sec, providing a near-simultaneous readout of neuronal Ca2+ signals. We analyzed fish 8 (69,207 neurons × 7,890 frames), 9 (79,704 neurons × 7,720 frames) and 11 (101,729 neurons × 8,528 frames), which are the first three fish data with more than 7,200 frames. For simplicity, we labeled them l2, l3, and l1(fl). The second dataset consists of Neuropixels recordings from approximately ten different brain areas in mice during spontaneous behavior (23). Data from the three mice, Kerbs, Robbins, and Waksman, include the firing rate matrices of 1,462 neurons × 39,053 frames, 2,296 neurons × 66,409 frames, and 2,688 neurons × 74,368 frames, respectively. The last dataset comprises two-photon Ca2+ imaging data (2-3 Hz) obtained from the visual cortex of mice during spontaneous behavior. While this dataset includes numerous animals, we focused on the first three animals that exhibited spontaneous behavior:spont_M150824_MP019_2016-04-05 (11,983 neurons × 21,055 frames), spont_M160825_MP027_2016-12-12 (11,624 neurons × 23,259 frames), and spont_M160907_MP028_2016-09-26 (9,392 neurons × 10,301 frames) (23).

4.3 Covariance matrix, eigenspectrum and sampling procedures

To begin, we multiplied the inferred firing rate of each neuron (see section 4.1.2) by a constant such that in the resulting activity trace xi, the mean of xi(t) over the nonzero time frames equaled one (20). Consistent with the literature (20), this step aimed to eliminate possible confounding factors in the raw activity traces, such as the heterogeneous expression level of the fluorescence protein within neurons and the non-linear conversion of the electrical signal to Ca2+ concentration. Note that after this scaling, neurons could still have different activity levels characterized by the variance of xi(t) over time, due to differences in the sparsity of activity (proportion of nonzero frames) and the distribution of nonzero xi(t) values. The scale-invariant nature of the eigenspectrum (Fig. 2F) was unaffected by this imposed scaling of the mean activity, as the phenomenon of scale invariance persisted in the absence of such normalization. The three models of covariance in Fig. 2G-I were constructed as follows. For model in Fig. 2G, the entries of matrix G (with dimensions N × T) were sampled from an i.i.d. Gaussian distribution with zero mean and standard deviation σ = 1. In Fig. 2H, we constructed the composite covariance matrix for fish 1 achieved by maintaining the eigenvalues from the fish 1 data covariance matrix and replacing the eigenvectors U with a set of random orthonormal basis. Lastly, the covariance matrix in Fig. 2I was generated from a randomly connected recurrent network of linear rate neurons. The entries in the synaptic weight matrix are normally distributed with Jij ∼ 𝒩 (0,g2/N), with a coupling strength g = 0.95 (31, 32). For consistency, we used the same number of time frames T = 7, 200 when comparing CI across all the datasets (Fig. 4BC, Fig. 5DE, Fig. S6C). For other cases, we analyzed the full length of the data (number of time frames: fish 1 - 7495, fish 2 - 9774, fish 3 - 13904, fish 4 - 7318, fish 5 - 7200, fish 6 - 9388). Next, the covariance matrix was calculated as , where is the mean of xi(t) over time. Finally, to visualize covariance matrices on a common scale, we multiplied matrix C by a constant such that the average of its diagonal entries equaled one, that is, Tr(C)/N = 1. This scaling did not alter the shape of covariance eigenvalue distribution, but set the mean at 1 (see also Eq. (8)).

To maintain consistency across data sets, we fixed the same initial number of neurons at N0 = 1, 024. These N0 neurons were randomly chosen once for each zebrafish dataset and then used throughout the subsequent analyses. We adopted this setting for all analyses except in two particular instances: (1) for comparisons among the three sampling methods (RSap, ASap, and FSap), we specifically chose 1,024 neurons centered along the anterior-posterior axis, mainly from the midbrain to the anterior hindbrain regions (Fig. 5DE, Fig. S20). (2) When investigating the impact of hunting behavior on scale invariance, we included the entire neuronal population (section 4.11).

We used an iterative procedure to sample the covariance matrix C (calculated from data or as simulated ERMs). For RSap, in the first iteration, we randomly selected half of the neurons. The covariance matrix for these selected neurons was a N/2 × N/2 diagonal block of C. Similarly, the covariance matrix of the unselected neurons was another diagonal block of the same size. In the next iteration, we similarly created two new sampled blocks with half the number of neurons for each of the blocks we had. Repeating this process for n iterations resulted in 2n blocks, each containing N := N0/2n neurons. At each iteration, the eigenvalues of each block were calculated and averaged across the blocks after being sorted in descending order. Finally, the averaged eigenvalues were plotted against rank/N on a log-log scale.

In the case of ASap and FSap, the process of selecting neurons was different, although the remaining procedures followed the RSap protocol. In ASap, the selection of neurons was based on a spatial criterion: neurons close to the anterior end on the anterior-posterior axis were grouped to create a diagonal block of size, with the remaining neurons forming a separate block. FSap, on the other hand, used the Renormalization Group (RG) framework (20) to define the blocks (details in section 4.12). In each iteration, the cluster of neurons within a block that showed the highest average correlation was identified and labeled as the most correlated cluster (refer to Fig. 5D, Figures S20 and S21).

In the ERM model, as part of implementing ASap, we generated anatomical and functional coordinates for neurons with a specified CCA properties as described in section 4.9. Mirroring the approach taken with our data, ASap segmented neurons into groups based on the first dimension of their anatomical coordinates, akin to the anterier-posterior axis. FSap employed the same RG procedures outlined earlier (section 4.12).

To determine the overall power-law coefficient of the eigenspectra, α, throughout sampling, we fitted a straight line in the log-log rank plot to the large eigenvalues that combined the original and three iterations of sampled covariance matrices (selecting the top 10% eigenvalues for each matrix and excluding the first four largest ones for each matrix). We averaged the estimated α over 10 repetitions of the entire sampling procedure. R2 of the power-law fit was computed in a similar way. To visualize the statistical structures of the original and sampled covariance matrices, the orders of the neurons (i.e. columns and rows) are determined by the following algorithm. We first construct a symmetric Toeplitz matrix 𝒯, with entries 𝒯i,j = tij and tij tji. The vector is equal to the mean covariance vector of each neuron calculated below. Let be a row vector of the data covariance matrix; we identify, where D(·) denotes a numerical ordering operator, namely rearranging the elements in a vector such that c0c1 ≥ … ≥ cN−1. The second step is to find a permutation matrix P such that ∥ 𝒯 − PCPTF is minimized, where ∥ ∥F denotes the Frobenius norm. This quadratic assignment problem is solved by simulated annealing. Note that after sampling, the smaller matrix will appear different from the larger one. We need to perform the above reordering algorithm for every sampled matrix so that matrices of different sizes become similar in Fig. 2E.

The composite covariance matrix with substituted eigenvectors in (Fig. 2H) was created as described in the following steps. First, we generated a random orthogonal matrix Ur (based on the Haar measure) for the new eigenvectors. This was achieved by QR decomposition A = UrR of a random matrix A with i.i.d. entries Aij ∼ 𝒩 (0, 1/N). The composite covariance matrix Cr was then defined as, where Λ is a diagonal matrix that contains the eigenvalues of C. Note that since all the eigenvalues are real and Ur is orthogonal, the resulting Cr is a real and symmetric matrix. By construction, Cr and C have the same eigenvalues, but their sampled eigenspectra can differ.

4.4 Dimensionality

In this section, we introduce the Participation Ratio (DPR) as a metric for effective dimensionality of a system, based on (2529, 58). DPR is defined as:

Here, λi are the eigenvalues of the covariance matrix C, representing variances of neural activities. Tr(·) denotes the trace of the matrix. The term denotes the expected value of the squared elements that lie off the main diagonal of C. This represents the average squared covariance between the activities of distinct pairs of neurons.

With these definitions, we explore the asymptotic behavior of DPR as the number of neurons N approaches infinity:

This limit highlights the relationship between the PR dimension and the average squared covariance among different pairs of neurons. It is worth mentioning that a similar theoretical finding is established by Dahmen et. al. (29). The transition from increasing DPR with N to approaching the saturation point occurs when N is significantly larger than DPR.

4.5 ERM model

We consider the eigenvalue distribution or spectrum of the matrix C at the limit of N ≫ 1 and L ≫ 1. This spectrum can be analytically calculated in both high-density and intermediate-density scenarios using the replica method (34). The following sketch shows our approach, and detailed derivations can be found in Supp. Note. To calculate the probability density function of the eigenvalues (or eigendensity), we first compute the resolvent or Stieltjes transform. Here ⟨…⟩ is the average across the realizations of C (that is, random s and s). The relationship between the resolvent and the eigendensity is given by the Sokhotski-Plemelj formula:

where Im means imaginary part.

Here we follow the field-theoretic approach (34), which turns the problem of calculating the resolvent to a calculation of the partition function in statistical physics by using the replica method. In the limit N → ∞, Ld → ∞, ρ being finite, by performing a leading order expansion of the canonical partition function at large z (Supp. Note), we find the resolvent is given by

In the high-density regime, the probability density function (pdf) of the covariance eigenvalues can be approximated and expressed from Equations (6) and (7) using the Fourier transform of the kernel function:

where δ (x) is the Dirac delta function and E(σ2) is the expected value of the variances of neural activity. Intuitively, Eq. (8) means that λ / ρ are distributed with a density proportional to the area of level sets (i.e., isosurfaces).

In section 2.3, we found that the covariance matrix consistently shows greater scale invariance compared to the correlation matrix across all datasets. This suggests that the variability in neuronal activity significantly influences the eigenspectrum. This finding, however, cannot be explained by the high-density theory, which predicts that the eigenspectrum of the covariance matrix is simply a rescaling of the correlation eigenspectrum by, the expected value of the variances of neural activity. Without loss of generality, we can always standardize the fluctuation level of neural activity by setting E(σ2)= 1. This is equivalent to multiplying the covariance matrix C by a constant such that Tr(C)/N = 1, which in turn scales all the eigenvalues of C by the same factor. Consequently, the heterogeneity of has no effect on the scale invariance of the eigenspectrum (see Eq. (8)). This theoretical prediction is indeed correct and is confirmed by direct numerical simulations and quantifying the scale invariance using the CI (Fig. S6A).

Fortunately, the inconsistency between theory and experimental results can be resolved by focusing the ERM within the intermediate density regime ρ ϵd ≪ 1, where neurons are positioned at a moderate distance from each other. As mentioned above, we set E(σ2)=1 in our model and vary the diversity of activity fluctuations among neurons represented by E(σ4). Consistent with the experimental observations, we find that the CI decreases with E(σ4) (see Fig. S6B). This agreement indicates that the neural data are better explained by the ERM in the intermediate density regime.

To gain a deeper understanding of this behavior, we use the Gaussian variational method (34) to calculate the eigenspectrum. Unlike the high-density theory where the eigendensity has an explicit expression, in the intermediate density the resolvent g(z) no longer has an explicit expression and is given by the following equation

where ⟨…⟩σ computes the expectation value of the term within the bracket with respect to σ, namely ⟨…⟩σp(σ)dσ. Here and in the following, we denote . The function is determined by a self-consistent equation,

We can solve from Eq. (10) numerically and below is an outline, and the details are explained in Supp. Note. Let us define the integral . First, we substitute z ≡ λ + into Eq. (10) and write 𝒢 = Re𝒢 + iIm𝒢. Eq. (10) can thus be decomposed into its real part and imaginary part, and a set of nonlinear and integral equations, each of which involves both Re𝒢 and Im𝒢. We solve these equations at the limit η → 0 using a fixed-point iteration that alternates between updating Re𝒢 and Im𝒢 until convergence.

We find that the variational approximations exhibit excellent agreement with the numerical simulation for both large and intermediate ρ where the high-density theory starts to deviate significantly (for ρ = 256 and ρ = 10.24, ϵ = 0.03125, Fig. S3). Note that the departure of the leading eigenvalues in these plots is expected, since the power-law kernel function we use is not integrable (see section 4.6).

To elucidate the connection between the two different methods, we estimate the condition when the result of the high-density theory (Eq. (8)) matches that of the variational method (Equations (9) and (10)) (Supp. Note). The transition between these two density regimes can also be understood (see section 4.8.1 and Supp. Note).

Importantly, the scale invariance of the spectrum at µ/d → 0 previously derived using the high-density result (Eq. (3)) can be extended to the intermediate-density regime by proving the ρ -independence using the variational method (Supp. Note).

Finally, using the variational method and the integration limit estimated by simulation (see section 4.7.2), we show that the heterogeneity of the variance of neural activity, quantified by E(σ4), indeed improves the collapse of the eigenspectra for intermediate ρ (Supp. Note). Our theoretical results agree excellently with the ERM simulation (Fig. S6A, B).

4.6 Kernel function

Throughout the paper, we have mainly considered a particular approximate power-law kernel function inspired by the Student’s t distribution (section 2.2)

To understand how to choose ϵ and µ, see section 4.8.1. Variations of Eq. (11) near x =0 have also been explored; see a summary in table S3.

Modifications of the shape of near used in Fig. S7, Fig. S8 and Fig. S9.

Flat: when . Tangent: when follows a tangent line of the exact power law and have a same first-order derivative when . c is a constant. Tent: when follows a straight line while the slope is not the same as the tangent case. Parabola: when follows a quadratic function (ax2 +1 and have same first-order derivative). t pdf: mimic the smoothing treatment like the t distribution. All the constant parameters are set such that f (0) = 1.

It is worth mentioning that a power law is not the only slow decaying function that can produce a scale-invariant covariance spectrum (Fig. S5). We choose it for its analytical tractability in calculating the eigenspectrum. Importantly, we find numerically that the two contributing factors to scale invariance – namely, slow spatial decay and higher functional space – can be generalized to other nonpower-law functions. An example is the stretched exponential function with 0 < η < 1. When η is small and d is large, the covariance eigenspectra also display a similar collapse upon random sampling (Fig. S5).

This approximate power-law has the advantage of having an analytical expression for its Fourier transform, which is crucial for the high-density theory (Eq. (8)),

Here Kα (x) is the modified Bessel function of the second kind, and r(x) is the Gamma function. We calculated the above formulas analytically for d = 1, 2, 3 with the assistance of Mathematica and conjectured the case for general dimension d, which we confirmed numerically for d ≤ 10.

We want to explain two technical points relevant to the interpretation of our numerical results and the choice of. Unlike the case in the usual ERM, here we allow to be non-integrable (over ℝd), which is crucial to allow power law. The nonintegrability violates a condition in the classical convergence results of the ERM spectrum (59) as N → ∞. We believe that this is exactly the reason for the departure of the first few eigenvalues from our theoretical spectrum (e.g., in Fig. 3). Our hypothesis is also supported by ERM simulations with integrable (Fig. S4), where the numerical eigenspectrum matches closely with our theoretical one, including the leading eigenvalues. For ERM to be a legitimate model for covariance matrices, we need to ensure that the resulting matrix C is positive semidefinite. According to the Bochner theorem (60), this is equivalent to the Fourier transform (FT) of the kernel function being nonnegative for all frequencies. For example, in 1D, a rectangle function does not meet the condition , but a tent function does (its FT is sinc2(x)). For the particular kernel function in Eq. (11), this condition can be easily verified using the analytical expressions of its Fourier transform (Eq. (12)). The integral expression for Kα (x), given as, shows that Kα (x) is positive for all x> 0. Likewise, the Gamma function Γ (x) > 0. Therefore, the Fourier transform of Eq. (11) is positive and the resulting matrix C (of any size and values of is guaranteed to be positive definite.

Building upon the theory outlined above, numerical simulations further validated the empirical robustness of our ERM model, as showcased in Fig. 3B-D and Fig. 4A. In Fig. 3B-D, the ERM was characterized by the parameters N = 1024, d = 2, L = 10, ρ = 10.24 and µ = 0.5 and ϵ = 0.03125 for. To numerically compute the eigenvalue probability density function, we generated the ERM 100 times, each sampled using the method described in section 4.3. The probability density function (pdf) was computed by calculating the pdf of each ERM realization and averaging these across the instances. The curves in Fig. 3D showed the average of over 100 ERM simulations. The shaded area (most of which is smaller than the marker size) represented the SEM. For Fig. 4A, the columns from left to right were corresponded to µ = 0.5, 0.9, 1.3, and the rows from top to bottom were corresponded to d = 1, 2, 3. Other ERM simulation parameters: N = 4096, ρ = 256, L = (N/ρ)1/d, ϵ = 0.03125 and . It should be noted that for Fig. 4A, the presented data pertain to a single ERM realization.

4.7 Collapse index (CI)

Motivated by the high-density theory in the ERM (Eq. (3)), we quantify the extent of scale invariance using CI defined as the area between two spectrum curves (Fig. 4A upper right):

we set q1 such that λ (q1)= 1, which is the mean of the eigenvalues of a normalized covariance matrix. The other integration limit q0 is set to 0.01 such that λ (q0) is the 1% largest eigenvalue.

Here we provide numerical details on calculating CI for the ERM simulations and experimental data.

4.7.1 A calculation of collapse index for experimental datasets/ERM model

To calculate CI for a covariance matrix C of size N0, we first computed its eigenvalues and those of the sampled block Cs of size Ns = N0/2, denoted as (averaged over 20 times for the ERM simulation and 2000 times in experimental data). Next, we estimated log λ (q) using the eigenvalues of C0 and Cs at q = i/Ns, i = 1, 2,…, Ns. For the sampled Cs, we simply had, its i-th largest eigenvalue. For the original C0, log λ (q = i/Ns) was estimated by a linear interpolation, on the log λ-log q scale, using the value of log λ (q) in the nearest neighboring q = i/N0’s (which again are simply log). Finally, the integral (Eq. (13)) was computed using the trapezoidal rule, discretized at q = i/Ns’ s, using the finite difference, where Δ denotes the difference between the original eigenvalues of C0 and those of sampled Cs.

4.7.2 Estimating CI using the variational method

In the definition of CI (Eq. (13), calculating λ (q) and directly using the variational method is difficult, but we can make use of an implicit differentiation

where is the complementary cdf (the inverse function of λ (q) in section 4.7.1). Using this, the integral in CI (Eq. (13)) can be rewritten as

Since, we switch the order of the integration interval in the final expression of Eq. (15).

First, we explain how to compute the complementary cdf q(λ) numerically using the variational method. The key is to integrate the probability density function p(λ) from λ to a finite λ (qs) rather than to infinity,

The integration limit λ (qs) cannot be calculated directly using the variational method. We thus used the value of λs(qsq0) (section 4.7) from simulations of the ERM with a large N = 1024 as an approximation. Furthermore, we employed a smoothing technique to reduce bias in the estimation of λs(qs) due to the leading zigzag eigenvalues (i.e., the largest eigenvalues) of the eigenspectrum. Specifically, we determined the nearest rank j < Nq0 and then smoothed the eigenvalue log λs(qs) on the log-log scale using the formula and , averaging over 100 ERM simulations.

Note that we can alternatively use the high-density theory (Supp. Note) to compute the integration limit λ (qs = 1/N) instead of resorting to simulations. However, since the true value deviates from the λh(qs = 1/N) derived from high-density theory, this approach introduces a constant bias (Fig. S6) when computing the integral in Eq. (16). Therefore we used the simulation value λs(qsq0) when producing Fig. S6AB.

Next, we describe how each term within the integral of Eq. (15) was numerically estimated. First, we calculated with a similar method described in section 4.7.1. Briefly, we calculated q0(λ) for density and qs(λ) for density , and then used the finite difference . Second, was evaluated at, where i = 0, 1, 2,…,k − 1, and we used k = 20. Finally, we performed a cubic spline interpolation of the term, and obtained the theoretical CI by an integration of Eq. (15). Fig. S6A,B shows a comparison between theoretical CI and that obtained by numerical simulations of ERM (section 4.7.1).

4.8 Fitting ERM to data

4.8.1 Estimating the ERM parameters

Our ERM model has 4 parameters: µ and ϵ dictate the kernel function, whereas the box size L and the embedding dimension d determine the neuronal density ρ. In the following, we describe an approximate method to estimate these parameters from pairwise correlations measured experimentally . We proceed by deriving a relationship between the correlation probability density distribution h(R) and the pairwise distance probability density distribution in the functional space, from which the parameters of the ERM can be estimated.

Consider a distribution of neurons in the functional space with a coordinate distribution. The pairwise distance density function g(u) is related to the spatial point density by the following formula:

For ease of notation, we subsequently omit the region of integration, which is the same as here. In the case of a uniform distribution,. For other spatial distributions, Eq. (17) cannot be explicitly evaluated. We therefore make a similar approximation by focusing on a small pairwise distance (i.e., large correlation):

By a change of variables:

Eq. (17) can be rewritten as

where Sd−1(u) is the surface area of d − 1 sphere with radius u.

With the approximate power-law kernel function, the probability density function of pairwise correlation h(R) is given by:

Taking the logarithm on both sides

Eq. (21) is the key formula for ERM parameters estimation. In the case of a uniform spatial distribution,. For a given dimension d, we can therefore estimate µ and (ϵ/L)d separately by fitting h(R) on the log-log scale using the linear least squares. Lastly, we fit the distribution of σ2 (the diagonal entries of the covariance matrix C) to a log-normal distribution by estimating the maximum likelihood.

There is a redundancy between the unit of the functional space (using a rescaled ϵδ ϵ/δ) and the unit of (using a rescaled), thus ϵ and L are a pair of redundant parameters: once ϵ is given, L is also determined. We set ϵ = 0.03125 throughout the article. In summary, for a given dimension d and ϵ, µ of (Eq. (11)), the distribution of σ2 (section 2.2) and ρ (or equivalently L) (section 2.2) can be fitted by comparing the distribution of pairwise correlations in experimental data and ERM. Furthermore, knowing (ϵ/L)d enables us to determine a fundamental dimensionless parameter

which tells us whether the experimental data are better described by the high-density theory or the Gaussian variational method (Supp. Note). Indeed, the fitted ρ ϵd 10−3 − 100 is much smaller than 1, consistent with our earlier conclusion that neural data are better described by an ERM model in the intermediate-density regime.

Notably, we found that a smaller embedding dimension d ≤5 gave a better fit to the overall pairwise correlation distribution. The following is an empirical explanation. As d grows, to best fit the slope of log h(R) − log R, µ will also grow. However, for very high dimensions d, the y-intercept would become very negative, or equivalently, the fitted correlation would become extremely small. This can be verified by examining the leading order log R independent term in Eq. (21), which can be approximated as. It becomes very negative for large d since ϵL by construction. Throughout this article, we use d =2 when fitting the experimental data with our ERM model.

The above calculation can be extended to the cases where the coordinate distribution becomes dependent on other parameters. To estimate the parameters in coordinate distributions that can generate ERMs with a similar pairwise correlation distribution (Fig. S9), we fixed the integral value . Consider, for example, a transformation of the uniform coordinate distribution to the normal distribution in ℝ2. We imposed. For the log-normal distribution, a similar calculation led to . The numerical values for these parameters are shown in section 4.10. However, note that due to the approximation we used (Eq. (18)), our estimate of the ERM parameters becomes less accurate if the density function changes rapidly over a short distance in the functional space. More sophisticated methods, such as grid search, may be needed to tackle such a scenario.

After determining the parameters of the ERM, we first examine the spectrum of the ERM with uniformly distributed random functional coordinates (Fig. S10M-R). Second, we use to translate experimental pairwise correlations into pairwise distances for all neurons in the functional space (Fig. S11, Fig. S10G-L). The embedding coordinates in the functional space can then be solved through Multidimensional Scaling (MDS) by minimizing the Sammon error (section 4.8.3). The similarity between the spectra of the uniformly distributed coordinates (Fig. S10M-R) and those of the embedding coordinates (Fig. S10G-L) is also consistent with the notion that specific coordinate distributions in the functional space have little impact on the shape of the eigenspectrum (Fig. S9).

4.8.2 Nonnegativity of data covariance

To use ERM to model the covariance matrix, the pairwise correlation is given by a non-negative kernel function that monotonically decreases with the distance between neurons in the functional space. This nonnegativeness brings about a potential issue when applied to experimental data, where, in fact, a small fraction of pairwise correlations/covariances are negative. We have verified that the spectrum of the data covariance matrix (Fig. S18) remains virtually unchanged when replacing these negative covariances with zero (Fig. S18). This confirms that the ERM remains a good model when the neural dynamics is in a regime where pairwise covariances are mostly positive (50) (see also Fig. S2B, Fig. S2B-D).

4.8.3 Multidimensional Scaling (MDS)

With the estimated ERM parameters (μ in and the box size L for given ϵ and d, see section 4.8.1), we performed MDS to infer neuronal coordinates in functional space. First, we computed a pairwise correlation from the data covariances. Next, we calculated the pairwise distance, denoted by , by computing the inverse function of with respect to the absolute value of. We used the absolute value |Rij | instead of Rij as a small percentage of Rij are negative (Fig. S2A-D) where the distance is undefined. This substitution by the absolute value serves as a simple workaround for the issue and is only used here in the analysis to infer the neuronal coordinates by MDS. Finally, we estimated the embedding coordinates for each neuron by the SMACOF algorithm (Scaling by MAjorizing a COmplicated Function), which minimizes the Sammon error

where is the pairwise distance in the embedding space calculated above.

To reduce errors at large distances (i.e., small correlations with Rij < f(L), where L is the estimated box size), we performed a soft cut-off at a large distance:

During the optimization process, we started at the embedding coordinates estimated by the classical MDS (45), with an initial sum of squares distance error that can be calculated directly, and ended with an error or its gradient smaller than 10−4.

The fitted ERM with the embedding coordinates reproduced the experimental covariance matrix including the cluster structures (Fig. S11) and its sampling eigenspectra (Fig. S10).

4.9 Canonical-Correlation Analysis (CCA)

Here we briefly explain the CCA method (61) for completeness. The basis vectors and , in functional and anatomical space, respectively, were found by maximizing the correlation . These basis vectors satisfy the condition that the projections of the neuron coordinates along them, and , are maximally correlated among all possible choices of and . Here represent the coordinates in functional and anatomical spaces, respectively. The resulting maximum correlation is RCCA. To check the significance of the canonical correlation, we shuffled the functional space coordinates across neurons’ identity and re-calculated the canonical correlation with the anatomical coordinates, as shown in Fig. S13.

To study the effect of functional-anatomical relation described by RCCA in the ERM model, we generated three dimensional anatomical coordinates and two dimensional functional coordinates for each neuron which are jointly five-dimensional zero-mean multivariate Gaussian random variables. The coordinates are independent among each other, except for the first dimension of the functional coordinates and the first dimension , which are assigned to have a correlation coefficient equals to RCCA. The variances of the coordinates are and for the numerics in Fig. S21. Under this construction, the first canonical correlation between the anatomical and functional coordinates equals RCCA, and the first canonical direction in the anatomical space is (1, 0, 0)T and the first canonical direction in the functional space is (1, 0)T.

4.10 Extensions of ERM and factors not affecting the scale invariance

In Fig. S9 we considered five additional types of spatial density distributions (coordinate distributions) in functional space and two additional functional space geometries. We examined the points distributed according to the uniform distribution, the normal distribution , and the log-normal distribution . We used the method described in Methods section 4.8.1 to adjust the parameters of the coordinate distributions based on the uniform distribution case, so that they all generate similar pairwise correlation distributions. The relationships between these parameters are described in Methods section 4.8.1. In Fig. S9B, we used the following parameters: d = 2; L = 10 for the uniform distribution; µp = 0, σp = 2.82 for the normal distribution; and µp = 2, σp = 0.39 for the log-normal distribution.

Second, we introduced multiple clusters of neurons in the functional space, with each cluster uniformly distributed in a box. We considered three arrangements: (1) two closely situated clusters (with a box size of the distance between two cluster centers being Lc = L), (2) two distantly situated clusters (with a box size of and the distance between clusters Lc = 4L), and three clusters arranged symmetrically in an equilateral triangle (with a box size of and the distance between clusters Lc = L).

Finally, we examined the scenario in which the points were uniformly distributed on the surface of a sphere (4πl2 = L2, l being the radius of the sphere) or a hemisphere (2πl2 = L2) embedded in ℝ3 (the pairwise distance is that in ℝ3). It should be noted that both cases have the same surface area as the 2D box.

4.11 Analyzing the effects of removing neural activity data during hunting

To identify and remove the time frames corresponding to putative hunting behaviors, the following procedure was used. The hunting interval was defined as 10 frames (1 sec) preceding the onset of an eye convergence (see Methods section 4.1.1) to 10 frames after the offset of this eye convergence. These frames were then excluded from the data before recalculating the covariance matrix (see Methods section 4.3) and subsequently the sampled eigenspectra (Fig. S15B, Fig. S16B,D,F,H). As a control to the removal of the hunting frame, an equal number of time frames that are not within those hunting intervals were randomly selected and then removed and analyzed (Fig. S15C, Fig. S16A,C,E,G). The number of hunting interval frames and total recording frames for five fish exhibiting hunting behaviors are as follows: fish 1 - 268/7495, fish 2 - 565/9774, fish 3 - 2734/13904, fish 4 - 843/7318 and fish 5 - 1066/7200. Fish 6 (number of time frames: 9388) was not exposed to a prey stimulus and, therefore, was excluded from the analysis.

To assess the impact of hunting removal on CI, we calculated the CI of the covariance matrix using all neurons recorded in each fish (without sampling to 1024 neurons). For the control case, we repeated the removal of the nonhunting frame 10 times to generate 10 covariance matrices and computed their CIs. We used a one-sample t-test to determine the level of statistical significance between the control CIs and the CI obtained after removal of the hunting frame.

Using fitted ERM parameters by full data, we performed a MDS on the control data and hunting-removed data to infer the functional coordinates. Note that the functional coordinates inferred by MDS are not unique: rotations and translations give equivalent solutions. For visualization purposes (not needed for analysis), we first used the Umeyama algorithm to optimally align the functional coordinates of control and hunting-removed data.

To identify distinct clusters within the functional coordinates, we fit Gaussian Mixture Models (GMMs) using the “GaussianMixtures” package in Julia. We chose the number of clusters K based on giving the smallest Bayesian Information Criterion (BIC) score. After fitting the GMMs, a list of probabilities pik, k = 1, 2,…,K was given for each neuron i specifying the probability of the neuron belonging to the cluster k. The mean and covariance parameters were estimated for each Gaussian distributed cluster. For visualization (but not for analysis), a neuron was colored according to cluster k* where k* = argmax1≤kK pik.

We used the following method to measure the size of the cluster and its fold change. For a 2D (recall d =2 in our ERM) Gaussian distributed cluster, let us consider an ellipse centered on its mean, and its axes are aligned with the eigenvectors of its covariance matrix C2×2. Let the eigenvalues of C be λ1, λ2. Then we set the length of the half-axis of the ellipse to be, respectively. Here c> 0 is a constant determined below. Note that the ellipse axes correspond to linear combinations of 2D Gaussian random variables that are independent and λi’s are the variance of these linear combinations. From this fact, it is straightforward to show that the probability that a sample from the Gaussian cluster lies in the above ellipse depends only on c, that is, , and not on the shape of the cluster. So, the ellipse represents a region that covers a fixed proportion of neurons for any cluster, and its area can be used as a measure for the size of the Gaussian cluster. Note that the area of the ellipse is. In Fig. S17, we plot the ellipses to help visualize the clusters and their changes. We choose c such that the ellipse covers 95% of the probability (that is, the fraction of neurons belonging to the cluster).

In the control functional map where we fit the GMMs, we directly calculated the size measure from the estimated covariance C for each Gaussian cluster. In the hunting-removed functional map, we needed to estimate the covariance C′ for neurons belonging to a cluster k under the new coordinates (we assume that the new distribution can still be approximated by a Gaussian distribution). We performed this estimation in a probabilistic manner to avoid issues of highly overlapping clusters where the cluster membership could be ambiguous for some neurons. First, we estimated the center/mean of the new Gaussian distribution by

Here the summation goes over all the N neurons in the functional space and pik is the membership probability defined above, and (xi, yi) is the coordinate of neuron i in the hunting-removed map. Similarly, we can use a weighted average to estimate the entries in the covariance matrix. For example,

Then we calculated the size of the cluster on the new map as. Finally, we computed the fold change in size as .

4.12 Renormalization-Group (RG) Approach

Here we briefly summarize the RG approach used in (20) and elucidate the adjustments required when applying the RG approach to ERM. The method consists of two stages: (i) iterative agglomerate clustering of neurons, and (ii) computing the spectrum of a block of the original covariance matrix corresponding to a cluster of the desired size based on the previous clustering result.

During each iterative coarse-graining procedure denoted k, Pearson’s correlation coefficients for each pair of neurons are calculated. This yields the maximum correlated pair, designated as a and b. These pairs are combined according to the following equation:

is used to normalize the average to ensure that the nonzero activity of the new variables is equal to one.

When the RG approach is applied directly to the ERM, the process differs slightly. Instead of combining neural activity, we merge correlation matrices to traverse different scales. During the kth iteration, we compute the coarse-grained covariance as

and the variance as

Following these calculations, we normalize the coarse-grained covariance matrix to ensure that all variances are equal to one. Note these coarse-grained covariances are only used in stage (i) and note used to calculate the spectrum.

In stage (ii), we calculate the eigenspectra of the sub-covariance matrices across different cluster sizes as described in (20). Let N0 = 2n be the original number of neurons. To reduce it to size N = N0/2k = 2nk, consider the coarse-grained neurons in step n − k in stage (i). Each coarse-grained neuron is a cluster of 2nk neurons. We then calculate spectrum of the block of the original covariance matrix corresponding to neurons of each cluster (there are 2k such blocks). Lastly, an average of these 2k spectra is computed.

To illustrate the RG approach, consider a hypothetical scenario where a set of eight neurons, labeled 1, 2, 3, …, 7, 8, are subjected to a two-step clustering procedure. In the first step, neurons are grouped based on their maximum correlation pairs, resulting in the formation of four pairs: {1, 2 }, {3, 4 }, {5, 6 }, and {7, 8 }. Subsequently, the neurons are further grouped into two clusters based on the results of the RG step mentioned above. Specifically, if the correlation between the coarse-grained variables of the pair {1, 2 } and the pair {3, 4 } is found to be the largest among all other pairs of coarse-grained variables, the first group consists of neurons {1, 2, 3, 4 }, while the second group contains neurons {5, 6, 7, 8 }. Next, take the size of the cluster N =4 for example. The eigenspectra of the covariance matrices of the four neurons within each cluster are computed. This results in two eigenspectra, one for each cluster. The correlation matrices used to compute the eigenspectra of different sizes do not involve coarse-grained neurons. It is the real neurons 1, 2, 3, …, 7, 8, but with expanding cluster sizes. Finally, the average of the eigenspectra of the two clusters is calculated.

4.13 Spectrum of three types of sampling procedures in ERM model

In section 2.4 we have considered three types of sampling procedures: random sampling (RSap), spatial sampling in the anatomical space (ASap, e.g., recording neurons in a brain region), and spatial sampling in the functional space (FSap), namely spatial sampling in functional space by subdividing the space into smaller regions, is equivalent to the previously reported renormalization group (RG) inspired process (62, 63). Here we consider the relationship between the spectrum of three types of sampling procedures.

We assume a uniform random distribution of neurons in a d-dimensional functional space, [0, L]d. For RSap procedures, the resulting neuronal density ρR is reduced to ρR = 0, with k representing the sampling ratio (k = N/N0) and ρ0 being the initial density. In contrast, FSap maintains the original density, ρF = ρ0. This constancy in neuronal density under FSap ensures that the covariance eigenspectrum remains invariant across scales for any spatial correlation functions, such as power law and exponential, as shown in Fig. S19A,B,D,E. In contrast, RSap reduces ρ, thus demanding more rigorous conditions to achieve a scale-invariant covariance spectrum (e.g., compare Fig. S19A and C).

Under ASap, sampled neurons are not spread out evenly in functional space, whereas our theoretical framework assumes a uniform distribution. To reconcile this discrepancy, we employ a uniform approximation of the neural distribution. This approach involves introducing an effective density, ρ ′, defined as the spatial average of the density function. This adjustment allows our theoretical model to accommodate non-uniform distributions encountered in anatomically spatial sampling.

where is the normalized density distribution (see Methods section 4.8.1).

using the Cauchy-Schwarz inequality, we have

thus ρ′ ≥ 0.

According to the condition, we have ρ′ ≤ ρ0, intuitively, sampling within a uniformly distributed neuron population does not increase the density.

So we have ρ0ρA0, i.e., ρFρAρR. Thus the spectrum ASap should be between FSap and RSap.

4.14 Dimensions of three types of sampling procedures in ERM model

4.14.1 Scaling of Dimensions through Random Sampling

Let us revisit the definition of the Participation Ratio (PR) dimension as defined in Equation Eq. (5):

During the random sampling process, the expected values E(σ2), E(σ4), and remain constant.

These constants allow for the estimation of the PR dimension across various scales using:

Here, k = N/N0 represents a scaling factor (fraction) associated with sampling. The key question is to understand how the dimensionality changes with k. Under random sampling, as k increases, the dimensionality will quickly approaches a saturating point defined by Eq. (1).

4.14.2 Scaling of Dimensions through Functional Sampling

In this section, we leverage the uniform ERM model to estimate dimensions within the context of functional sampling, specifically focusing on the estimation of squared pairwise covariance and dimensionality.

Adopting an approximation for a power-law kernel function f (x) ≈ ϵμxµ allows us to express the expected value of the squared covariance as follows:

For a set subjected to functional sampling with a sampling fraction k, this procedure adjusts the size of the functional space in the ERM model by a factor of k−1/d. Consequently, the for the sampled fraction k is given by:

Here we assume that E[σ2] and E[σ4] are constant across the sampling process. This model enables the estimation of the ratio µ/d as detailed in the Methods section 4.8.1.

In the large N limit, we observe distinct behaviors in the evolution of dimensionality in both theory and data: it saturates in RSap (dashed line in Fig. 5D), namely defined in Eq. (1), whereas it follows a different scaling relationship in FSap (solid line in Fig. 5D).

4.14.3 Comparative Analysis of PR Dimension Across sampling Techniques

This section examines the behavior of the Participation Ratio (PR) dimension under three sampling techniques: anatomical sampling, random sampling, and functional sampling. We show that the average PR dimension following anatomical sampling occupies a middle ground between the extremes presented by random and functional sampling.

The PR dimension, denoted DPR, reflects the sampling impact and depends on the distribution of the functional coordinates . Defining the sampling fraction as k = 1/q, the mean DPR is represented as:

where the neuron set 1, 2, …,N is segmented into q clusters, each comprising neurons.

The probability distribution corresponds to each cluster. The probability distribution for each cluster, , emerges naturally from the sampling process.

The equivalence of the mean probability density function across the sampled clusters to the original set’s probability density function leads us to the condition:

This condition is a direct consequence of the sampling process, ensuring that the aggregated probability density function of all sampled sets mirrors the overall density distribution of the neurons.

Applying the Lagrange multiplier method to optimize the mean DPR:

Here L(p, λ) is the Lagrangian, is the Lagrange multiplier, we derive the optimal condition:

yielding:

At the optimal mean DPR, each is equivalent, leading to (representative of random sampling). Hence, the mean DPR post-random sampling sets the upper limit for the mean DPR after anatomical sampling.

Let us investigate the lower bound of the mean PR dimension with the ERM model. For the minimization of mean(DPR), a key requirement is the functional spatial proximity of neurons within the same cluster, in other word, the neuron set should be distinctly separated in functional space. Consequently, achieving the minimum mean PR dimension necessitates a functional sampling strategy.

4.14.4 Simulating CCA and anatomical sampling

In this section, we estimate the dimensions of the anatomically sampled neuron set. For simplicity, we assume that the functional coordinates of neurons, Xi, and the anatomical coordinates of neurons, Yi, both follow a multivariate Gaussian distribution. We define anatomical sampling, which involves sampling on Yi, along a direction chosen arbitrarily and denote this direction as Y A. Subsequently, we perform sampling on Xi in the direction denoted by XA, which is determined to have the highest correlation with Y A according to Canonical Correlation Analysis (CCA). This process effectively mimics the scenario of functional sampling.

The key to calculating the PR dimension involves computing the expected value. In the ERM model, the distribution of Cij can be estimated by the distribution of points in the functional space. This allows for the calculation of the PR dimension across anatomical sampling by comparing the distribution of Xi after anatomical sampling with that after functional sampling.

To approximate, we need to calculate the functional coordinate probability distribution, which is the distribution of the ith neuron cluster after anatomical sampling. Here, and denote the functional and anatomical coordinates of the ith neuron cluster after anatomical sampling, respectively. Y C represents the selected direction in anatomical space, and denotes the ikth quantile of Y C, where k is the sampled fraction.

We consider only the projection of the functional coordinate onto the direction XA, which exhibits the highest correlation, denoted by RASap, with Y A. Specifically, when selecting the anatomical direction as the first CCA direction, the correlation between XA and Y A reaches its maximum, such that RASap = RCCA. In this case, anatomical sampling results in the minimization of the dimensionality. Note the following relationships and distributions:

The conditional probability distribution is equivalent to the distribution of the sum of and X0, where :

The distribution of is represented by a non-elementary function, complicating direct analysis. To facilitate approximation, we model using a normal distribution with equivalent variance. Although calculating the variance of directly presents challenges, assuming a uniform distribution for Y simplifies this task. Under this assumption, the variance of can be straightforwardly calculated as. Consequently, we approximate and as follows:

Calculating the PR dimension directly from the distribution of is difficult; thus, we approximate anatomical sampling with fraction k as functional sampling with fraction kf, leading to:

Using the equation for functional sampling (Eq. (32)):

Acknowledgements

We are grateful to Liqun Luo and Changsong Zhou for their helpful suggestions on our manuscript. QW thanks Hideaki Shimazaki for the suggestion that the functional space could be the feature space for sensory coding. QW thanks Jia Lou for improving the illustration in Figures 1 and 3. YH was supported by ECS-26303921 from the Research Grants Council of Hong Kong. QW was supported by NSFC-32071008 from the National Science Foundation of China and the STI2030-Major Projects 2022ZD0211900.