Introduction

Animals are not deterministic input-output machines, but instead display highly flexible behavioral responses even under sensory invariant conditions. A holistic understanding of the neuronal mechanisms underlying decision making requires explaining such variability in behavior at the single trial level. However, there are many putative sources of trial-to-trial neuronal variability. In particular, it is unclear whether such variability results from deterministic sources, such as internal states and long-timescale dynamics, or stochasticity. Individual neurons exhibit non-negligible noise from a variety of sources, including electrical noise due to ion channel fluctuations, synaptic noise due to biochemical processes, jitter in the timing of action potentials, and “digitization” noise due to the all-or-nothing nature of the action potential (Faisal et al., 2008).

However, trial-to-trial variability in neuronal firing is generally not independent across neurons, but instead highly correlated over repeated stimulus presentations within certain groups of neurons (Kohn et al., 2015; Shadlen and Newsome, 1998; Zohary et al., 1994), suggesting that variability in neural circuits may not be dominated by the stochasticity of individual neurons. Such structured covariation is referred to as “noise correlations” (Cohen and Kohn, 2011) because they are unrelated to the often experimenter-controlled external variable that is simultaneously encoded within the neuronal population. While certain structures of noise correlations are known to limit the information encoding capacity of the neuronal population (Moreno-Bote et al., 2014; Bartolo et al., 2020; Rumyantsev et al., 2020), there is little consensus about both their origin (Kanitscheider et al., 2015; Zylberberg et al., 2016) and behavioral impact (Zohary et al., 1994; Huang and Lisberger, 2009). In particular, there are opposing reports in different species and under different task conditions as to whether such inter-neuronal correlations can improve (Cafaro and Rieke, 2010; Zylberberg et al., 2016; Valente et al., 2021) or interfere (Cohen and Maunsell, 2009; Ruda et al., 2020; Kafashan et al., 2021) with the neural computations underlying decision making. As such, a general framework capable of reconciling such widely varying reports and explaining the functional impact of noise correlations on neural computation and behavior is currently missing.

Further, few experimental studies have linked noise correlations in sensory regions to pre-motor dynamics and ultimately behavior (Valente et al., 2021). Instead, most reports have mainly focused on the scale and structure of such covariations across sensory neurons in anesthetized animals or without regard to the animal’s behavioral state. However, recent studies have shown that many species encode the animal’s own movements in a highly distributed fashion, including across early sensory regions (Kauvar et al., 2020; Musall et al., 2019; Stringer et al., 2019). These brain-wide, motor-related fluctuations must then drive at least some of the observed neuronal covariation across stimulus presentations. Thus, noise correlations are not merely stochastic “noise”, but instead may encode important variables that are stimulus-independent and perhaps related to the animal’s proprioception and context, such as long-timescale behavioral and internal state dynamics.

Such highly overlapping representations of sensory information and additional contextual variables may underly the brain’s remarkable ability to flexibly control behavior and select the most appropriate action at any given moment based on the animal’s goals and internal state. As such, naturalistic decision making is a function of not just incoming sensory stimuli, but also internal variables such as the animal’s motivations and recent experience or learned behaviors. For example, foraging has been shown across species to be modulated by hunger (Filosa et al., 2016), long-timescale exploration versus exploitation states (Flavell et al., 2013; Marques et al., 2020), and other hidden internal variables (Sternson, 2020; Torigoe et al., 2021; Flavell et al., 2022). This evidence suggests that answering longstanding questions regarding the neuronal mechanisms underlying naturalistic decision making will require understanding the intersection of numerous neural circuits distributed throughout the vertebrate brain.

Thus, probing the structure of the neuronal mechanisms underlying behavioral variability requires high spatiotemporal resolution recording of brain-wide neuronal dynamics and behavior across many trials. While it is often necessary to average across trials to deal with both the inherent variable firing of single neurons and experimental measurement noise, such trial averaging inherently precludes any investigation of the link between neural and behavioral variability on the trial-by-trial level. Instead, methods that utilize the entire neuronal population dynamics have shown that pooling information across neurons can allow for successful decoding of information that is not possible from individual neurons. This is because simultaneous multi-neuron recordings can leverage statistical power across neurons to capture the most salient aspects of population dynamics within single trials. Such population-level approaches have revolutionized the fields of motor planning and decision making, providing highly robust brain-machine interfaces (Kao et al., 2015; Pandarinath et al., 2018) that combine information from at least a few recording sites and making it possible to predict decisions (Wei et al., 2019; Lin et al., 2020), reaction time (Afshar et al., 2011), and additional behavioral states (Kaufman et al., 2015; Marques et al., 2020) at the single trial level. Recently, optical imaging techniques (Weisenburger and Vaziri, 2018; Kim and Schnitzer, 2022; Machado et al., 2022) have dramatically increased the attainable number of simultaneously recorded neurons within increasingly larger brain volumes at high speed and resolution. Thus, large-scale neural recording provides a promising approach to both screening for potential sources of variability across the brain and identifying robust population-level dynamics at the level of single trials. However, due to the difficulty of obtaining whole-brain optical access in many vertebrates, previous studies have not been able to quantify brain-wide noise correlations at the single neuron level and link such noise modes to variable decisions at the single trial level.

In this study, we sought to identify the neural loci which drive or encode information related to behavioral variability across trials. To do so, we performed whole-brain recording at cellular resolution in larval zebrafish engaged in ethologically relevant visually-evoked behaviors to identify neuronal populations that were predictive of behavioral variability under sensory invariant conditions. While it is well-known that larval zebrafish on average exhibit a target-directed prey capture response to small visual stimuli and a target-avoidance escape response to large visual stimuli, we demonstrated that these ethological, oppositely valanced, and highly divergent responses exhibited significant variability across trials of identical stimulus presentations. Turning first to visually-evoked neural populations, we found that despite trial-to-trial variability in the responses of single neurons, visual information was reliably encoded at the population level across trials of each stimulus. As such, the visual-encoding neuronal activity patterns did not account for the larvae’s variable behavioral responses across trials.

A key feature of our system, unlike previous studies, was that it allowed investigating the geometry of trial-to-trial variability of neuronal dynamics on the whole-brain-level, by simultaneous observation of brain-wide neural variability during a visual decision-making behavior. Thus, we were able to identify whole-brain modes of neural variability which were related to the larva’s motor actions. In particular, we identified highly distributed neuronal populations whose pre-motor activity predicted the larvae’s responsiveness and turn direction on a single trial level, indicating that behavioral variability is not represented in particular anatomical neural loci. We found that this predictability exhibited two dominant timescales: a longer-timescale and pre-stimulus turn bias that was not time-locked to the stimulus presentations or motor timing; and a rapid increase in predictability and ramping activity about one to two seconds before movement initiation. Consistent with result from previous studies, we speculate that the former, longer-timescale bias is related to a circuitry that has evolved to optimize an exploration versus exploitation behavior in forging while the latter, short-timescale ramping activity likely drives the downstream motor circuitry to execute the actual selected action in each trial. Our data suggest that behavioral variability in response to repetitions of the same sensory stimulus may not be driven by a single brain region. Rather, it is more likely generated by a combination of factors encoded by neurons throughout the brain, including a time-varying and internal turn direction bias, in addition to the well-studied visuomotor transformations.

Results

Fourier Light Field Microscopy enables high-speed imaging of neuronal population dynamics

In order to investigate the whole-brain neural correlates of behavioral variability, we designed and built a Fourier light field microscope (fLFM) that was optimized for high-resolution, whole-brain recoding of neuronal activity in the larval zebrafish. Conventional light field microscopy (cLFM) provides an elegant solution to volumetric snapshot imaging by placing a microlens array (MLA) in the image plane to encode both the position and angle of incidence of the light rays onto a camera sensor (Levoy et al., 2006). Combined with 3D deconvolution algorithms for computational reconstruction of the recorded volume (Agard, 1984; Broxton et al., 2013; Prevedel et al., 2014), LFM enables imaging of an entire volume without requiring any scanning. Due to its high speed and volumetric capabilities, cLFM has been widely applied to image the dynamics of in vivo biological systems, particularly neuronal dynamics (Prevedel et al., 2014; Aimon et al., 2019; Lin et al., 2020; Quicke et al., 2020; Yoon et al., 2020). Further, additional computational strategies have been devised to image deep into scattering tissue, such as the mouse brain (Nöbauer et al., 2017; Skocek et al., 2018; Nöbauer et al., 2023). However, a primary limitation of cLFM is a dramatic drop in resolution near the native image plane (NIP) because its spatio-angular sampling is uneven across depth and highly redundant near the NIP. This low resolution at the center of the sample can only be partially overcome using techniques such as wavefront coding (Cohen et al., 2014). In contrast, fLFM has been shown to provide high-resolution imaging across a two-to three-fold extended depth by processing the light field information through the Fourier domain (Llavador et al., 2016; Scrofani et al., 2017; Cong et al., 2017; Guo et al., 2019; Liu et al., 2020); i.e., placing the MLA conjugate to the back focal plane of the objective. In this configuration, the MLA segments the wavefront by transmitting the various spatial frequencies (which correspond to angular information) into images on different regions of the camera, providing a well-aliased sampling of the spatio-angular information without significant redundancy near the NIP.

We simultaneously optimized our design for simplicity and cost-effectiveness by using exclusively off-the-shelf elements, as opposed to previous implementations of fLFM which relied on custom optical components. Our design (Figure 1A, see Methods for details) enables imaging of an approximately 750 × 750 × 200 μm3 volume, corresponding to roughly the whole brain of a 7 days post fertilization (dpf) larval zebrafish. 3D information is captured in the raw images (Figure 1B) because each lenslet in the MLA segments a region of the Fourier plane and thus forms an image of the sample from a unique perspective. Experimentally, the point spread function (PSF) full width at half maximum (FWHM) of our system measured 3.3 ± 0.21 μm (95% confidence interval) laterally and 5.4 ± 0.4 μm axially (Figure 1C), consistent with theoretical estimates of 3.1 μm and 4.7 μm, respectively. This represents a substantial improvement in axial resolution compared to our previous cLFM, which had a PSF FWHM of 3.4 μm laterally and 11.3 μm axially (Prevedel et al., 2014). As such, our design enabled cellular-resolution imaging across the whole brain of the larval zebrafish.

Fourier Light Field Microscopy (fLFM) provides a simple and cost-effective method for whole-brain imaging of larval zebrafish during behavior

A. Schematic of the fLFM system. The sample is illuminated with a 470 nm LED through a 20×/1.0-NA imaging objective (Obj). The fluorescence is sent to the imaging path via a dichroic mirror (DM). An Olympus-style f=180mm tube lens (TL) and f=180mm Fourier lens (FL) are used to conjugate the back focal plane of the objective onto a microlens array (MLA). An sCMOS sensor is positioned at the focal plane of the MLA to capture the raw fLFM images.

B. An example raw sensor image. Each lenslet in the 8×8 array forms an image of the sample from a slightly different perspective, allowing for reconstruction of the 3D volume.

C. Experimental measurement of the point spread function (PSF). Left: the x-y (top) and x-z (bottom) profiles of a reconstructed image of a 1 μm fluorescent bead. Right: corresponding cross sections of the PSF (points). A Gaussian profile (lines) was fit to these data to measure the full width at half maximum (FWHM), which was 3.3 μm laterally and 5.4 μm axially.

D. Schematic of our fLFM data processing pipeline (see Methods for detailed description).

E. A maximum intensity projection (MIP) of a conventional LFM (cLFM) larval zebrafish recording resulting in strong grid-like artifacts and low resolution near the native image plane (black arrows). A reconstructed volume of 750 × 750 × 200 μm3 is shown.

F. An MIP of a fLFM recording. fLFM exhibits higher axial resolution and does not contain artifacts at the native image plane. A reconstructed volume of 750 × 375 × 200 μm3 is shown.

G. A heatmap of extracted neuronal activity from an example one hour recording of 15,286 neurons. Neurons are sorted using the rastermap algorithm (Stringer et al., 2019), such that nearby neurons exhibit similar temporal activity patterns. Multiple distinct activity patterns are seen, included strong activity during two sets of drifting grating-induced optomotor response trials (black arrows).

H. Example neurons tuned to tail movements and visual stimuli. In black are six example neuron traces from the designated region in panel G, which exhibited correlations with the GCaMP-kernel-convolved tail vigor (top trace, see Methods for details). In red are six example neuron traces from the designated region in panel G, which exhibited correlations with the visual stimuli presented in the left visual field (denoted by the red lines at the bottom of the plot).

Next, we developed a custom pipeline (Figure 1D) for 3D reconstruction and neuronal segmentation to extract the dynamics of neurons across the field of view. This consisted of three main steps: denoising, 3D reconstruction, and neuronal segmentation. As one of the key limitations of LFM is signal-to-noise ratio (SNR) due to the distribution of emitted photons onto a sensor array, we first denoised the raw sensor images. To do so we trained a DeepInterpolation (Lecoq et al., 2021) deep neural network model, which has been shown to increase the SNR of the resulting neuronal timeseries. Next, the full 3D volume was reconstructed from denoised sensor images using Richardson-Lucy deconvolution (Agard, 1984; Broxton et al., 2013; Prevedel et al., 2014) and an experimentally measured PSF. In addition to enabling high-resolution imaging across an extended depth of field, a key advantage of fLFM is that the reconstruction of the volume images can be performed approximately 100 times faster than in cLFM, due to fLFM’s shift-invariant point spread function. Finally, to identify neuronal regions of interest (ROIs) and their dynamics within the reconstructed volume, we utilized CNMF-E (Zhou et al., 2018), a constrained matrix factorization approach to extract in vivo calcium signals from one-photon imaging data. The CNMF-E algorithm was applied to each plane in parallel, after which the putative ROIs from each plane were collated and duplicate neurons across planes were merged.

To validate our setup, we imaged whole-brain dynamics from head-immobilized larval zebrafish expressing nuclear-localized GCaMP6s (NL-GCaMP6s) pan-neuronally (Vladimirov et al., 2014) while monitoring the larva’s tail movements using a high-speed camera and presenting visual stimuli, which as we will discuss below consisted of single dots of 3-5 different sizes moving from the center toward the left or right visual field. Across our 1-2 hour recordings, we identified 16,524 ± 3,942 neurons per animal (mean ± 95% confidence interval; range: 6,425 to 35,928). Thus, the combination of fLFM and the improvement in our reconstruction pipeline enabled the detection of significantly higher neuron numbers than the ∼5,000 neurons as reported in our previous cLFM realizations (Prevedel et al., 2014; Lin et al., 2020). The performance improvement over cLFM could also be visualized by inspecting the maximum intensity projection (MIP) of a zebrafish recording in each modality. While cLFM (Figure 1E) exhibits lower resolution and grid-like artifacts near the native image plane (black arrows) due to redundant sampling near the NIP, fLFM’s well-aliased sampling of spatio-angular information provides cellular-resolution imaging throughout the entire volume (Figure 1F).

Within these observed large-scale dynamics, we found a diversity of neuronal activity patterns (Figure 1G). For example, we identified neurons tuned to the vigor the larva’s tail movements (see Methods for details) and those tuned to the presentation of visual stimuli as described below (Figure 1H), providing a proof of principle that fLFM enables whole-brain imaging of the relevant neuronal population activity encoding sensory inputs and behavior.

Larval zebrafish exhibit highly variable motor responses to visual stimuli

Whole-brain, cellular-resolution imaging at high speed enabled us to screen for regions across the brain which exhibited covariations related to variability in behavior and naturalistic decision making. We thus set out to investigate the neuronal basis of trial-to-trial variability in action selection during two different ethologically-relevant behaviors which are known to be generated by brain-wide sensorimotor circuitries (Chen et al., 2018). In the larval zebrafish, several visually-evoked behaviors such as prey capture (Borla et al., 2002; Patterson et al., 2013) and escape response (Temizer et al., 2015) have been widely studied. In particular, varying a single stimulus parameter, the size of a visual object, can elicit dramatically distinct average behavioral responses (Barker and Baier, 2015), ranging from target-directed responses (e.g., prey capture) to small stimuli, to target-avoidance behaviors (e.g., escape response) to large stimuli. These distinct behaviors can be evoked in the laboratory using simple visual stimuli and in a head-fixed imaging preparation (Bianco et al., 2011; Semmelhack et al., 2014; Temizer et al., 2015; Barker and Baier, 2015; Bianco and Engert, 2015; Timothy W. Dunn et al., 2016; Filosa et al., 2016; Förster et al., 2020; Oldfield et al., 2020). As such, the neural circuitry underlying each of the individual behaviors is well understood (Portugues and Engert, 2009; Bollmann, 2019) on the trial-averaged level. However, importantly these behaviors are not reflexive. They involve the processing of information across multiple brain regions and are subjected to neuromodulatory effects and longer time scale changes of internal states (Filosa et al., 2016; Marques et al., 2020). As such, the larvae do not deterministically perform the same actions over multiple sensory invariant trials. Even under highly optimized and invariant conditions, prey capture is only observed in up to 30% of trials (Bianco et al., 2011), whereas escape response is more consistently observed in 80% of trials (Temizer et al., 2015). However, the neuronal populations and the underlying circuitry driving such variability in responsiveness and action selection in either of these paradigms have not been identified, yet these ethologically highly divergent behaviors can be elicited by tuning a single parameter. We thus hypothesized that modulating the size of the visual object should reveal a regime of behavioral variability between these highly divergent, naturalistic behaviors, providing a useful means and context to study questions such as how behavioral strategies may switch across trials or how neural and behavioral variability changes as a function of the ambiguity of the presented stimulus.

Thus, we first sought to characterize the trial-to-trial variability of the larvae’s behavioral responses to repetitions of simple visual stimuli of various sizes. In a freely swimming context (Figure 2A), we displayed moving dots of various sizes ranging from 1 to 40 visual degrees (see Methods for details) on a screen positioned below the larvae. Analyzing these behavioral recordings for different dot sizes, we indeed found characteristic target-avoidance bouts similar to escape response (Figure 2B) and target-directed bouts similar to prey capture response (Figure 2C; see also Video 1). For each bout of behavior in which a larva was close to a visual object, we quantified whether it approached or avoided the target using the larva’s change in distance from the object as a metric (see Methods for details). Consistent with previous work (Barker and Baier, 2015), on average the larvae exhibited target-directed behavior to small stimuli (1-5°), followed by a crossover point at which the larvae were not preferentially target-directed or target-avoiding for stimuli of 7° in size, and then a target-avoidance behavioral regime to stimuli at above 10° or larger which elicited characteristic escape responses on average (Figure 2D). Moreover, across individual trials the larvae’s responses exhibited significant variability, yielding many trials in which the larvae either did not respond or responded in the opposite direction of the average response, as indicated by the large standard deviation across bouts for the same stimulus size (Figure 2D).

Zebrafish exhibit highly variable motor responses to visual stimuli

A. Schematic of the freely-behaving experimental setup. A single larva is placed in a 90 mm petri dish with a shallow (approximately 3 mm) amount of water. The dish is placed on a screen which displays visual stimuli consisting of dots of various sizes drifting in a constant direction and speed. The behavior is monitored with a camera from above.

B. An example target-avoidance bout. A composite image taken of four frames from start (white arrow) to near the finish (black arrow) of a bout in which the larvae avoided a large visual stimulus. See also Video 1.

C. An example target-directed bout. A composite image taken of three frames from start (white arrow) to finish (black arrow) of a bout in which the larvae made a directed movement toward a small visual stimulus. See also Video 1.

D. The behavioral response of larvae to various size visual stimuli. For each size stimulus, all bouts of movement in which the larva was within 10 body lengths of a visual stimulus were considered. For each bout, the change in distance between the larvae and the stimulus during the movement was monitored. Thus, target-directed bouts exhibited negative values corresponding to a decrease in the distance to the stimulus, whereas target-avoidance bouts exhibited positive values. Stimuli less than 7° in size evoked target-directed responses on average, whereas stimulus greater than 10° evoked target-avoidance responses. Shown is the mean ± standard deviation of n=9 larvae.

E. Schematic of the head-fixed experimental setup. A larva is embedding in agarose in order to keep its head fixed during whole-brain imaging. The agarose around the tail and eyes are removed, such that the tail can be tracked (indicated by black dots along the tail) and visual stimuli can be presented. Visual stimuli of various sizes are presented moving from the center of visual field to either the left or right.

F. The behavioral response of larvae to various size visual stimuli during head fixation. Visual stimuli are presented in either the left (pink) or right (green) visual field. The directedness of tail movements is monitored by computing the mean tail curvature during a bout of movement, with positive values indicating leftward motions. Similar to freely-behaving experiments, visual stimuli of 7° or less evoked target-directed responses, whereas stimuli larger than 10° evoked target-avoidance. Shown is the mean ± standard deviation of n=10 larvae. Asterisks indicate stimuli with significant difference between presentations on the left and right visual field (p<0.05, paired t-test).

G. Example behavioral responses to various stimuli. For each of four stimulus sizes on the left and right visual fields, a histogram of the mean tail curvature during bouts is shown for an example larva. While stimulus-evoked behavioral responses are variable in all cases, they appear the least variable in the case of largely target-avoidance bouts to large 44° stimuli (rightmost column).

Next, we confirmed that these behaviors were preserved in a head-fixed imaging preparation, in which the larvae were embedded in agarose with their eyes and tail cut free (Figure 2E), such that their variable behavioral responses to visual stimuli could be tracked while performing whole-brain neuronal recordings using our fLFM system. To do so, we presented visual stimuli as dots of a certain size moving in either the left or right visual field of the animals (see Methods for details). Visual stimuli were created using a projector that projected onto the edge of the circular 35 mm petri dish in which the larvae were embedded. Utilizing curvature of the tail to the left or the right as a metric for target-directed versus -avoidance behavior, we again identified the stimulus-size dependence of the larvae’s average behavioral response (Figure 2F) that recapitulated what was observed in freely behaving larvae. We also confirmed that the corresponding opposite tail responses were observed when the stimuli were presented on the other side of the visual field, demonstrating that even during head-fixation the larvae exhibit on average directed behaviors that depend on the stimulus size. However, at the single trial level, we observed behavioral variability to nearly all stimulus sizes (Figure 2G), even in the stimulus size regime known to optimally drive prey capture and escape response behaviors. The most widely variable responses were at 4° and 11°, corresponding to stimuli in the target-directed regime and near the crossover between target-directed and avoidance, respectively. This high degree of variability among responses to small and intermediate stimuli is consistent with previous literature estimates that even in optimized setups, the prey capture response is observed at rates less than 30% during head-fixation (Bianco et al., 2011) and any motor response at all to small visual stimuli is observed at rates up to 60% of trials in free behavior (Filosa et al., 2016). However, the neural mechanisms and circuitry underlying such trial-to-trial behavioral variability are not understood. In particular, it is unclear whether such variability within these ethologically relevant, yet divergent behaviors is driven by noise and inherent stochasticity in neuronal circuits, or represents modulation by time-varying internal states, such as motivation and hunger.

Trial-to-trial variability in visually-evoked neurons is largely orthogonal to visual decoding dimensions

Previous studies have reported that individual neurons tuned to specific stimuli often exhibit significant variability in their responses over multiple presentations of the same stimulus (Montijn et al., 2016; Rumyantsev et al., 2020; Shadlen and Newsome, 1998; Zohary et al., 1994; Zylberberg et al., 2016). Thus, one potential mechanism underlying our observed behavioral variability could be trial-to-trial deviations in the neural encoding of visual stimuli. Given that we had observed behavioral variability across the entire studied range of stimulus sizes, we proceeded to record whole-brain dynamics with the fLFM while presenting many repetitions of 3 to 6 different stimulus sizes spanning 1° to 40° (see Methods for details). Investigating the responses of individual neurons across the whole brain, we indeed found that even neurons highly tuned to a particular stimulus exhibited variability in their responses across multiple presentations of the same stimuli (Figure 3A). Given that downstream decision-making neuronal ensembles likely pool information from many visually tuned neurons, we asked whether an optimal population decoder could reliably extract information about the visual stimulus across trials. We proceeded to build various logistic regression classifiers to decode which visual stimulus was presented from the whole-brain neuronal activity pattern during the stimulus presentation period of each trial. We found that despite the observed variability on the single neuron level, the stimuli could be reliably decoded from the visually-tuned neurons identified within the whole-brain population activity (Figure 3B) at the single trial level with an accuracy of 94 ± 2% (mean ± 95% confidence interval). This robust decodability suggests that the observed single neuron variability does not severely limit the information encoding capacity at the population level.

Trial-to-trial variability in visually-evoked neurons is largely orthogonal to visual decoding dimensions

A. Individual visually tuned neurons exhibit trial-to-trial variability in their responses to stimuli. For each of the six visual stimuli (three object sizes on both the left and right visual fields), the two neurons exhibiting the highest correlation with the stimulus kernel (see Methods for details) are shown for an example larva. Each column represents a neuron, and each line represents its response to a given stimulus during a single trial.

B. Visual stimuli are reliably decodable from whole-brain dynamics on the single trial level. The visual stimuli were decoded using a logistic regression classifier with lasso regularization (see Methods for details). For each larva, a confusion matrix is computed for the test trials during 6-fold cross-validation. Shown is the average confusion matrix across n=8 larvae which were shown six visual stimuli.

C. Schematic of potential geometric relationships between sensory decoding and neural variability dimensions. In each plot, each dot represents the neural response during a single presentation of stimulus A or B. The decision boundary for an optimal classifier is denoted with a dashed line, and the optimal stimulus decoding direction is denoted by the vector . The direction representing the maximal trial-to-trial variance is denoted by , and can be calculated by finding the first eigenvector of the noise covariance matrix (see Methods for details). These vectors can be: (i) orthogonal, such that neuronal variability does not limit the stimulus decoding; (ii) show little relationship, for example in the case of uniform variability; or (iii) aligned, such that variability likely limits the information encoding capacity along .

D. The trial-to-trial noise correlation matrix appears multi-dimensional. Shown is the average noise correlation matrix across all stimulus types presented. The neurons are sorted using rastermap, which produces a one-dimensional embedding of the neurons, such that neurons which show similar correlation profiles are placed near to one another. A number of neuronal populations exhibiting correlations across trials are apparent from the clusters of high correlations near the diagonal.

E. Trial-to-trial variability in the visually-evoked neurons is largely orthogonal to visual decoding dimensions. The fraction of trial-to-trial variance explained by each noise mode is plotting against the angle between and the optimal stimulus decoding direction . Shown are the for n=8 larvae, colored by their rank order α based on the fraction of variance explained. The largest noise modes were approximately orthogonal (∼90°) to the stimulus decoding direction, whereas only a few of the smallest noise modes exhibited angles less than 90°.

F. Example projections of single trial neural activity along the stimulus decoding and noise dimensions. Each dot represents the average neural activity within a single trial projected along and for an example larva. Each of the six visual stimuli, three object sizes presented on either the right (green) or left (pink) visual field, are robustly encoded along across trials; however, in all stimuli there is strong orthogonal variability along , the largest noise mode representing 39.5% of the trial-to-trial variance.

G. Example neuron coefficients for . Shown are the 366 neurons with the largest weights over a maximum intensity projection of the recorded volume. The visually-evoked neurons which encode stimulus information are concentrated within the optic tectum.

H. Example neuron coefficients for . Shown are the 828 neurons with the largest weights over a maximum intensity projection of the recorded volume. The visually-evoked neurons contributing to the largest noise mode are highly overlapping with the neurons contributing to in panel G.

I. The neural noise modes are highly correlated with tail movements. Shown are both the GCaMP kernel-convolved tail vigor and the neuronal projection onto the first noise mode for a representative larva. Over the full two-hour recording, the tail vigor and noise mode projection exhibit a significant correlation of r=0.58, p<10-6.

J. The largest neural noise modes reflect brain-wide motor encoding. Shown are the correlations between the first noise mode and the tail vigor (blue) or the first principal component (PC 1, orange) of whole-brain data (see Figure S3C). Dots show data from individual larvae, whereas the violin plots below show the null distribution for temporally shuffled data, in which the tail vigor or PC 1 are circularly permuted.

Given the wealth of literature debating whether trial-to-trial noise correlations could interfere with stimulus encoding (Moreno-Bote et al., 2014; Kanitscheider et al., 2015; Zylberberg et al., 2017; Bartolo et al., 2020; Rumyantsev et al., 2020; Valente et al., 2021; Kafashan et al., 2021) and potentially drive behavioral variability, we next asked how the observed neuronal variability was structured to prevent degradation of the visual encoding. We began by considering the dimensions that maximally explain trial-to-trial neuronal variability within the high-dimensional space of neuronal population dynamics, which are sometimes referred to as “noise modes” (Rumyantsev et al., 2020). Historically, the term “noise” is used to describe such trial-to-trial variability that is not related to the external stimulus; however, such noise could represent significant internal neuronal dynamics and not just measurement shot noise, for example. If the primary noise modes (represented by neural activity along the vectors ) are orthogonal to the optimal stimulus decoding direction , then trial-to-trial variability will not hinder the visual stimulus decoding (Figure 3C(i)). In this case, the neuronal population may carry a multi-dimensional set of at least two variables and the visual stimulus encoding would be preserved. Trial-to-trial variability could also not be limited to a particular set of dimensions, for example if it represented noise that is uniformly distributed in the neural state space (Figure 3C(ii)), in which case pooling or averaging across sufficient neurons may still recover adequate stimulus information. Finally, trial-to-trial noise modes may be aligned with the stimulus decoding dimension (Figure 3C(iii)), which would maximally degrade the visual stimulus decoding.

To assess which of these scenarios applied to the geometry of trial-to-trial neural variability in our data, we first utilized partial least squares (PLS) regression to identify a low-dimensional subspace of the whole-brain neural activity that optimally preserved information about the visual stimulus (Figure S3A, see Methods for details), which we refer to as the visually-evoked neuronal activity patterns. Importantly, this approach ensures that we are identifying trial-to-trial neural variation that is contained within the visually-evoked neurons, as opposed to variability that is completely unrelated to visual encoding. Within these visually-evoked neurons, PLS regression also identifies the optimal stimulus decoding direction (Figure S3B). Additionally, trial-to-trial variability of the visually-evoked neurons is summarized by their noise covariance matrix, which describes how the activity of neurons covaries across trials (Moreno-Bote et al., 2014; Kohn et al., 2015). The average noise correlation matrix across all stimuli (see Methods for details) appeared highly structured and multi-dimensional (Figure 3D), indicating the presence of multiple neuronal ensembles whose activity is strongly correlated across trials. We analyzed the structure of this trial-to-trial noise by finding the eigenvectors of the average neural noise covariance matrix across all stimuli, following (Rumyantsev et al., 2020). As such, these noise modes represented the dimensions of neuronal activity within the visually-evoked neurons that maximally covaried across trials. We found that the noise modes were strongly structured since the trial-to-trial variance was concentrated in relatively few dimensions. The single largest noise mode captured up to 50% of the variance across larvae (Figure 3E, y-axis), indicating that such activity is likely not merely “noise” but potentially physiologically relevant. Thus, the observed neuronal variability was not independent across neurons, but strongly correlated. To assess the impact these noise modes could have on the stimulus decoding, we calculated the angles between the noise modes and . We discovered that the noise modes containing the majority of the variance were orthogonal to , while only the smallest noise modes contained any overlap the stimulus decoding direction (Figure 3E). This finding demonstrates that the structure underlying trial-to-trial neuronal variability in the zebrafish visual system lies in a regime closely resembling the orthogonal variability as illustrated in Figure 3C(i), as also previously observed in a subregion of mouse primary visual cortex (Rumyantsev et al., 2020). Thus, while even a small degree of observed overlap between sensory and noise modes can in principle limit the information encoding capacity of the visually-evoked neurons (Moreno-Bote et al., 2014), this orthogonal structure minimizes this limitation and is consistent with our previous finding that the stimuli presented in this study are highly decodable from the neuronal population dynamics (Figure 3B). The orthogonal relationship between sensory and noise modes can be visualized by projecting average neural activity within each trial onto the stimulus decoding direction and the noise modes (Figure 3F).

We next mapped these neural coefficient vectors defining the stimulus decoding direction and the noise modes onto their anatomical locations of the larvae’s brain. As expected, the visually-evoked neurons were highly concentrated within the optic tectum, such that both the stimulus decoding direction (Figure 3G) and noise modes (Figure 3H) highly overlapped. The significance of each neuron’s contribution to and was assessed by comparing its coefficient to a distribution of coefficients derived from trial-shuffled data (see Methods for details). Overall, 30 ± 6% of visually-evoked neurons that significantly contributed to the largest noise mode also exhibited significant contributions to the stimulus decoding direction (mean ± 95% CI across n=10 larvae). Therefore, while many visually-evoked neurons have response components along both and , the neuronal population dynamics are highly structured such that activity patterns encoding external visual stimuli are reliable and largely independent from any trial-to-trial noise modes.

Given that these noise modes represented highly-structured neuronal variability across trials, we next asked whether they were at all related to the larvae’s behavior. First, we asked whether any noise modes were correlated with the vigor of the larvae’s tail movements, defined as the absolute tail curvature time series convolved with the GCaMP response kernel (see Methods for details). We found that the largest noise modes were correlated with the larvae’s instantaneous tail vigor (Figure 3I). While we identified the noise modes within the visually-evoked neuronal population, these noise modes were also highly correlated with the largest principal component (PC) of the whole-brain dynamics (Figure 3J). This indicates that both the observed trial-to-trial neuronal variability in visual regions and elsewhere across the brain are dominated by behavior-related activity (Figure S3C). In fact, 35 ± 15% of neurons with significant coefficients in the largest visual noise mode also significantly contributed to the first brain-wide PC (mean ± 95% CI across n=10 larvae, see Methods for details). While the lower variance noise modes did not exhibit clear motor-related activity, we suspected they may be involved in additional internal computations that may modulate decision making across trials. Taken together, these results show that visual stimuli are faithfully represented on the population level, and that the observed behavioral variability is not driven by changes in the fidelity of sensory encoding across trials. Instead, behavioral variability may be explained by additional, orthogonal dimensions of neuronal activity either within the visually-evoked population or elsewhere in the brain that are stimulus-independent but predictive of the larvae’s behavior.

Pre-motor neuronal populations predictive of single-trial behavior

We hypothesized that there exists a pre-motor neuronal population that would be predictive of the larvae’s behavioral response in any given trial before movement initiation. As such, we aimed to identify any neural locus predictive of the larvae’s turn direction at various timepoints through each visual stimulus presentation. Thus, we looked beyond only the visually-evoked population and returned to the whole-brain activity patterns. In order to build predictive models from the neuronal dynamics despite varying reaction time across trials, we applied time-warping to the neuronal timeseries within each trial such that the stimulus onset and movement initiation timepoints were aligned across trials (Figure 4A, see Methods for details). We selected trials in which the larvae had a reaction time of at least one second from the stimulus onset (providing sufficient pre-motor neuronal data) and made its first tail movement with a defined minimum tail curvature to either the left or right, so as to separate clear left and right turns from simple forward locomotion. Further, we utilized trials from all visual object sizes to collect enough trials to reliably train models and thus identify populations predictive of decisions regardless of incoming sensory information. We found that there appeared to be neurons with differing time-warped activity patterns during left and right turn trials (Figure 4A, bottom). Additionally, we extracted trials in which the larva did not respond following the stimulus presentation, which were time-warped according to a randomly selected reaction time from the distribution in responsive trials. As such, each trial could be categorized according to the stimulus presented, the larva’s responsiveness (response/no response), and its turn direction (left/right) within responsive trials.

Pre-motor neuronal populations predictive of single-trial behavior.

A. Trials are time-warped to align stimulus and decision onsets before classifying the turn direction. Top: Time-warped tail curvatures for trials in which the fish performed leftward or rightward turns, on the left and right, respectively. Bottom: Trial-averaged and time-warped neuronal timeseries for 15,286 neurons during left and right turns. The neurons are sorted using the rastermap algorithm. A time window is swept across the stimulus and decision timepoints to train binary classification models to predict the turn direction from the associated neuronal dynamics.

B. Stimulus classification accuracy peaks after the onset of visual stimulation. The mean F score across n=7 larvae is used to assess the performance of 6-way multiclass classification of the presented visual stimulus as a function of warped time surrounding the stimulus onset (Stim.) and decision timepoint (Dec.). Shown is the mean ± 95% confidence interval of the F score for the best time window ending at the given timepoint (Data), compared to shuffled data in which the class labels are randomized. The black bar at the bottom indicates timepoints where the data have a significantly higher F score than the shuffled data (p<0.05, paired t-test).

C. Binary classification of responsiveness, whether or not the fish responds in a given trial, is significant throughout all time periods but accuracy peaks near movement initiation. As in panel B, except for binary classification of responsiveness. Nonresponsive trials are time-warped by randomly selecting a reaction time from the response trials and applying the same transformation.

D. (i) Turn direction classification accuracy is significantly higher than shuffled data across the entire time-warped interval, but peaks near movement initiation. As in panel B, except for binary classification of turn direction. (ii) Single trial classification of turn direction across larvae. The mean confusion matrix across n=7 larvae, which show an accuracy of 77 ± 4% (mean ± 95% confidence interval).

E. Single trial trajectories are separated based on responsiveness and turn direction. Shown are neural activity trajectories during single trials in an example larva projected onto the brain-wide neural dimensions that optimally separated turn direction and responsiveness.

F. Consistent trial-averaged trajectories across larvae. As in panel F, except for the trial-averaged responses for n=6 example larvae. For the one bold animal, timepoints across the trial are indicated by a circle for trial start, diamond for the decision timepoint, and an X for the trial end.

G. Real-time single-trial dynamics in an example larva. Along the turn direction neural projection, left and right trials are separated for many seconds before the decision timepoint, which is longer than the three second length of visual presentations. Activity along this dimension shows consistent ramping across trials approximately one second before movement.

H. Example turn direction neuronal ensemble. Shown are the coefficients for all neurons which showed significantly higher (one-tailed t-test, p<0.05) absolute coefficients in the real models compared to shuffled data in which the turn direction labels are randomly permuted.

I. Highly distributed encoding of turn direction across larvae. The percentage of significant turn direction neurons located with the four major brain regions (Tel – telencephalon, Tec – optic tectum, Cer – cerebellum, and Hind – hindbrain) are shown for n=10 larvae. There is no significant difference between the percentage of neurons across brain regions (p>0.05, paired t-test).

J. Example responsiveness neuronal ensemble. As in panel I, except for responsiveness. Shown are the coefficients for all neurons which showed significantly higher (one-tailed t-test, p<0.05) absolute coefficients in the real models compared to shuffled data in which the turn direction labels are randomly permuted.

K. Highly distributed encoding of responsiveness across larvae. As in panel J, except for responsiveness. The percentage of significant turn direction neurons located with the four major brain regions (Tel – telencephalon, Tec – optic tectum, Cer – cerebellum, and Hind – hindbrain) are shown for n=10 larvae. There is no significant difference between the percentage of neurons across brain regions (p>0.05, paired t-test).

We first asked whether the larvae’s responsiveness and turn direction could be accurately classified from the whole-brain neuronal dynamics within single trials, as well as how this predictability behaved over the pre-motor period. To identify the timepoints throughout the trial within which these could be accurately classified, we varied the time window of neuronal dynamics used to build a binary logistic regression classification model (Figure S4A). Neuronal activity across all neurons was integrated from the window start to the window end separately for each trial and then a model was trained on 80% of the trials. To assess the prediction accuracy at various timepoints throughout the trial, we utilized the mean F score across larvae on held-out trials, which measures the harmonic mean of the precision and recall. As expected, the multiclass stimulus prediction accuracy peaked shortly after the stimulus onset and remained high throughout the pre-motor and decision periods (Figure 4B). For both the responsiveness (Figure 4C) and the turn direction (Figure 4D(i)), the mean F score was consistent across the pre-stimulus and stimulus periods, before quickly ramping near the movement initiation timepoint. Interestingly, the classification performance was significantly higher than shuffled data at all timepoints (Figure 4C-D), including before the start of the stimulus presentation, with an average accuracy of 77.4 ± 4.4% (mean ± 95% CI across n=10 larvae) during the pre-motor period (Figure 4D(i)). These results suggest there are two dominant timescales which contribute to the larva’s behavioral response: a longer-timescale and relatively weak (but significant) activity pattern that does not appear aligned to stimulus or behavioral events, which we term the pre-stimulus turn bias; and a fast-timescale dramatic increase in predictability very near the decision, which we term the movement initiation dynamics.

Given that we observed the largest degree of behavioral variability in the responses to small-to-intermediate size stimuli (Figure 2G), we hypothesized that the pre-stimulus turn bias would be most influential (and thus most predictive of behavior) during such stimulus presentations, and less so during presentations of larger stimuli. Indeed, while the mean F scores for turn direction prediction were highly variable across larvae, on average the largest stimuli exhibited the lowest predictability from the pre-stimulus turn bias signal (Figure S4B, blue line). This likely reflects a more salient stimulus drive during large stimulus trials, which in the natural environment could reflect predators and a potential life-or-death decision for the larvae.

However, significant predictability was observed across all stimulus sizes during the movement initiation dynamics (Figure S4B, orange line). We thus argue that the pre-stimulus turn bias is subsequently combined with incoming stimulus information during action selection. Ultimately, the larva’s decision is then signaled by the dramatic increase in predictability during the movement initiation dynamics.

These single-trial predictions could be visualized by projecting the brain-wide neuronal activity during each trial onto the neural dimensions that optimally classified turn direction and responsiveness. Consistent with our classification performance, we found that the trajectories during single trials (Figure 4E) were highly separated according to the ultimate behavioral response. On average and across larvae, these trajectories remained localized in distinct quadrants throughout the entire trial (Figure 4F), including before and during the stimulus presentation period. This suggests that the observed behavioral variability is due in part to the initial internal state of the neural dynamics at the time of stimulus presentation and decision. Indeed, by removing the time-warping and observing the dynamics during single trials in real time, left and right turn trials appear separated for many seconds prior to the movement (longer than the 3 second visual trials). Ultimately, the decision to turn during the movement initiation dynamics appeared driven by a strong ramping activity approximately one second before the turn (Figure 4G). Thus, the identified neuronal dimensions exhibited reliable decoding within single trials driven by an initial bias and ultimately resulting in the observed movement initiation dynamics.

Next, we asked which specific neuronal populations across the brain contributed to the accurate pre-motor classification of behaviors, and thus could be involved in driving the decision of when and how to act. We considered a neuron to be significantly contributing to the behavioral prediction if its distribution of absolute coefficients across 6-fold cross-validation was significantly higher (one-tailed t-test, p<0.05) when fit to real behavioral data as opposed to a null distribution created by randomly permuting the trial type labels. We found that both the ensemble of neurons which significantly contributed to the turn direction prediction (Figures 4H-I) and the responsiveness ensemble (Figures 4J-K) were highly distributed across the brain. Both of these distinct neuronal ensembles showed no significant difference in their distribution across the four major brain regions: the telencephalon (Tel), optic tectum (Tec), cerebellum (Cer), and hindbrain (Hind). Further, while this population included visually-evoked neurons within the optic tectum, we found that our previously identified noise modes were similarly predictive of single trial turn direction (Figure S4C), whereas the optimal stimulus decoding direction (Figure 3G) was not able to predict single-trial turn direction (Figure S4D). Thus, our data suggest that the neural dynamics underlying single-trial action selection are the result of a widely-distributed circuit that contains subpopulations encoding internal time-varying biases related to both the larva’s responsiveness and turn direction, yet distinct from the sensory encoding circuitry.

What could be the origin of these trial-to-trial biases in the larvae’s behavior? Given that our results demonstrate that it is possible to predict a behavioral bias before the visual stimulus is shown, we hypothesized these results could reflect the influence of the zebrafish neuronal circuitry for spontaneous turns, which includes the hindbrain oscillator (or anterior rhombencephalic turning region, ARTR) known to bias sequential turns towards a given direction (Dunn et al. 2016). While our pre-motor neuronal population is not exclusively localized to the ARTR, we did identify many neurons in the cerebellum and hindbrain (Figures 4J-K) whose locations were consistent with these known populations. As such, we asked whether spontaneous turns could be predicted from this same turn direction neuronal ensemble (Figure 5A). To address this question, we selected spontaneous turns that occurred either during inter-trial intervals or within a 2-minute period at the beginning of each recording without any stimulus presentations, utilizing the same criteria to define left and right turns as previously (see Methods for details). We then asked whether the spontaneous turn direction during periods without sensory stimulation could be predicted from the same turn direction neuronal ensemble and coefficients previously identified in Figure 4H-I. We found a significant correlation between the pre-motor turn direction predictions and the observed spontaneous tail curvature across larvae (Figure 5B, r=0.41, p<0.05), suggesting this same ensemble is activated prior to both spontaneous and visually-evoked turns. This ensemble had a pre-motor turn direction classification accuracy of 70.2 ± 6.0% (mean ± 95% CI across n=5 larvae, Figure 5C). Thus, these results demonstrate that a portion of the observed behavioral variability is overlapping with the larva’s motor circuitry responsible for generation of spontaneous behavior.

Spontaneous turns are predictable from the same pre-motor neuronal population

A. Schematic of the approach to predict spontaneous turns. Models are fit to predict left or right turn direction during visual trials (top), as in Figure 4. They are then tested on pre-motor periods one second before the spontaneous turn.

B. Spontaneous turns are predicted from visual-evoked model. Shown is the relationship between the spontaneous tail curvature and the predicted pre-motor turn direction using the visual-evoked model. Each dot is a single spontaneous turn and each color represents a different larva. They exhibit a significant correlation of r=0.41, p<0.05.

C. Spontaneous turn classification accuracy. The mean cross-validated confusion matrix for spontaneous turn classification over n=5 larvae. Spontaneous turns are predicted with an accuracy of 70.2 ± 6.0% (mean ± 95% CI across n=5 larvae).

Our data highlight that the neural mechanisms involved in single-trial decision making are reflected in a highly distributed ensemble of neurons across the larval zebrafish brain. Further, behavioral variability, particularly in the larvae’s responses to small-to-intermediate size stimuli, can be partially explained by an internal, task-independent turn bias that is unrelated to the visual stimulation, while their stimulus-driven responses to larger predator-like stimuli exhibited a comparatively lower level of behavioral variability. This suggests the importance of considering how internal biases and circuits governing spontaneous movements interact with the well-studied sensorimotor circuits to govern how the brain generates behavior in a naturalistic context.

Discussion

In this study, we designed and built an optimized, high-resolution Fourier light field microscope (fLFM) to perform whole-brain imaging of the larval zebrafish in order to investigate the nature of trial-to-trial variability in neuronal dynamics and behavior on whole-brain yet cellular level. To do so, we studied visually-evoked turns within two ethologically relevant behaviors with opposing valance, prey capture and escape response, each of which are on average driven by distinct pre-motor neuronal ensembles (Bollmann, 2019) dependent on the size of the given sensory stimulus. Consistent with previous results (Barker and Baier, 2015), we found that the larvae’s behavioral responses were highly variable across presentations of the same stimulus, with behavioral variability peaking in response to stimulus of an intermediate size between those which optimally drive attraction or avoidance behaviors. Given that we lack a mechanistic understanding of which specific neuronal circuits drive the observed variance across trials, we utilized fLFM to screen whole-brain dynamics at cellular resolution and identify neuronal ensembles which contribute to behavioral variability at the single-trial level.

We first asked whether the observed behavioral variability could be explained by noise or trial-to-trial deviations in encoding of the visual stimuli. We found that despite the variable responses of individual neurons to visual stimuli across trials, the population response faithfully and reliably encoded the visual stimulus in each trial. Further, we discovered that the visually-evoked neuronal activity patterns were largely orthogonal to those dimensions that maximally covaried across stimulus repetitions, i.e. the noise modes, indicating that the source of behavioral variability was not unfaithful representation of the sensory inputs. Instead, we found that at least a third of the visually-evoked neurons which contributed to noise modes were related to motor behavior. We ultimately identified two brain-wide neuronal populations which could predict the larvae’s turn direction and responsiveness at the single-trial level. Surprisingly, a pre-stimulus bias component of these neuronal population activity could predict the larvae’s trial-by-trial turn direction even before the onset of the stimulus presentation with an average accuracy of 77%. while after stimulus onset, a sharp ramping of the activity of this neuronal population approximately one second before movement initiation allowed for an increased turn direction prediction accuracy of 90%. Taken together, our data show that the larva’s trial-by-trial decisions in response to any given stimulus are partially explained by pre-stimulus turn biases, and then ultimately driven by movement initiation dynamics which are broadcast within brain-wide neuronal ensembles.

In this context, the design of our behavioral paradigm has allowed us to gain insights into the nature of trial-to-trail neuronal and behavioral variability within and across two different ethologically relevant but highly divergent behaviors by tuning only a single parameter. This has allowed us to show that functionally different neuronal circuits have evolved to exhibit different levels of variability. Further, the differential interaction of such neuronal variability with the well-studied sensorimotor circuits may govern how the brain generates optimal behavior in different naturalistic contexts.

In addition, our study represents the first to our knowledge that identifies behavior-related noise correlations throughout an entire brain-wide sensorimotor decision-making circuit. The nearly orthogonal relationship between the stimulus decoding direction and trial-to-trial noise correlations is highly consistent with previous studies which found that noise correlations within single brain regions are organized so as to prevent additional information encoding channels within the same population from limiting sensory encoding capacity (Kafashan et al., 2021; Rumyantsev et al., 2020). However, even a small degree of overlap between noise and sensory dimensions may decrease the information encoding capacity of the population (Moreno-Bote et al., 2014). Why then might the brain favor a coding scheme in which populations mix information encoding with additional noise correlation structure?

Our data offer one possibility: that such additional correlations could encode internal or behavioral states that modulate sensorimotor transformations and behavioral output, enabling dynamic and flexible behavioral responses. Indeed, we found that a large fraction of the observed trial-to-trial neuronal variability was related to the larva’s behavior: including activity correlated the instantaneous behavioral state of the animal as well as pre-stimulus turn biases that correlated with responsiveness and turn direction. We found that the largest trial-to-trial noise mode within the visually-evoked neurons was a subset of a brain-wide population correlated with the vigor of the larva’s tail movements. This is conceptually similar to recent evidence in rodents that a large fraction of spontaneous neural activity reflects spontaneous and uninstructed movements (Musall et al., 2019; Stringer et al., 2019).

Taken together, our study represents the first whole-brain confirmation that behaviorally relevant information is highly mixed throughout neuronal populations involved in processing sensory and other information, potentially enabling flexible and context-dependent behavior. Our observation of time-varying neuronal ensembles encoding a turn direction bias is reminiscent of the larval zebrafish’s spontaneous swimming circuitry, which involves a central pattern generator (the ARTR) that biases the larva to swim in chains of turns in alternating directions (Timothy W Dunn et al., 2016); however, the neuronal populations we identified were not localized to the previously identified circuitry, and instead were distributed across the brain. While such spontaneous turn sequences are thought to underlie efficient exploration of local environments, our data could reflect a similar sensorimotor circuitry that injects variability as a learning mechanism to explore the space of possible sensorimotor consequences. Given the brain-wide distribution of the behavior-related ensembles within the variable visually-evoked behaviors studied here, we expect that they are not driven solely by a single central pattern generator as with ARTR and spontaneous turns, but instead involve the interaction of such pattern generators with the larva’s internal state and recent experience. It has been shown that such hindbrain oscillatory activity integrates recent visual to enable efficient phototaxis (Wolf et al., 2017), suggesting these pattern generators could act in concert with additional contextual populations. For example, it has been shown that hunger shifts decisions from avoidance to approach due to neuromodulation from the hypothalamic and serotonergic systems (Filosa et al., 2016). Further, our observed dynamics could also reflect the foraging state of the larva, which is known to be encoded by an oscillatory, neuromodulatory network distributed across the brain (Marques et al., 2020) and is a likely candidate to modulate a larva’s response to prey-sized stimuli. We envision that utilization of the wide array of genetically labelled lines that have been associated with these sensorimotor and neuromodulatory circuits (Semmelhack et al., 2014; Barker and Baier, 2015; Marques et al., 2020; Okamoto et al., 2021) could disentangle the specific contributions of the sensory, pre-motor, and internal state circuitry within the brain-wide populations observed here. Of particular interest will be how the activity of these populations is organized over longer timescales and modulated by visual input or internal states related to hunger and foraging, which are critical to balance feeding behaviors and predator avoidance within the developing larvae.

Acknowledgements

We thank Q. Lin and S. Otero Coronel for discussions and feedback on the manuscript, T. Nöbauer and Y. Zhang for discussions regarding optical designs, and the members of the Vaziri Lab for discussions regarding the experiments and data analysis. We also thank A. Kaczynska, S. Campbell, and J. Hudspeth for housing zebrafish, and the Rockefeller University High Performance Computing Cluster for compute access. This work was supported in part by the National Institute of Neurological Disorders and Stroke of the National Institutes of Health under award numbers 1RF1NS113251 and 1RF1NS110501 and the Kavli Foundation through the Kavli Neural Systems Institute.

Author contributions

J.M. contributed to the project conceptualization, designed and built the imaging system, performed experiments, analyzed data, and wrote the manuscript. A.V. conceived, led and supervised the project, designed the imaging system and biological experiments, guided data analysis, and wrote the manuscript.

Methods

fLFM setup

The sample is illuminated with an LED excitation light (470 nm, Thorlabs M470L4) via Köhler illumination through a standard GFP filter set (Thorlabs MDF-GFP) and a 20×/1.0-NA water immersion objective (Olympus XLUMPLFLN). In order to reduce the amount of excitation light reaching the eyes of the zebrafish, an aluminum mask is placed in the Köhler illumination path conjugate to the sample plane. The imaging path consisted of a f=180 mm Olympus-style tube lens (Thorlabs TTL180-A), a f=180 mm achromatic doublet (Thorlabs AC508-180-A-ML) for the Fourier lens, and a 13×13 mm microlens array with 1.5 mm pitch and r=13.9 mm radius of curvature (OKO Tech APO-GB-P1500-R13.9). The microlens array is mounted inside a five-axis kinematic mount (Thorlabs) to allow precise placement of the array orthogonal to the optical axis. A 5.5-megapixel sCMOS camera (Andor Zyla) is then positioned at the focus of the microlens array to capture the fLFM images.

The theoretical lateral resolution of an fLFM Is limited by the numerical aperture NAML of the microlenses and the diffraction-limited spot size at the sensor. For an emission wavelength λ, Fourier lens focal length ffl, objective magnification of M, and microlens diameter dMLA, the Abbe diffraction limit when converted to object space and under the paraxial approximation is given by . The theoretical axial resolution for a microlens array with a radius dmax is given by (Guo et al., 2019).

Zebrafish experiments

For head-fixed zebrafish experiments, n=13 huc:h2b-gcamp6s larvae with pan-neuronal and nuclear-localized GCaMP6s (NL-GCaMP6s) expression were imaged 6-9 days post fertilization (dpf). The day before experiments were performed, we immobilized larvae by embedding them in 2.5% agarose approximately 10 mm from the edge of a 35mm petri dish. We then removed the agarose around their eyes and tail to allow for visual stimulation and tail movement. Larvae were not fed after immobilization and were thus in a starved state on the day of the experiment.

The petri dish was covered in a rear projection film (Screen Solutions Int.) such that visual stimuli could be projected directly onto the dish. The projector (BenQ TH671ST) output was filtered with a 610 nm longpass filter (Schott RG610) such that only red light was displayed, removing crosstalk between the stimulation and GCaMP fluorescence. The tail was illuminated with an 850 nm LED and then monitored from below using a near infrared CMOS camera (Ximea MQ013RG-ON).

Visual stimulation was controlled by Stytra (Štih et al., 2019). To check for intact visual circuitry, drifting gratings with a period of 40° (visual angle) and speed of 40°/s to the left or right were shown three times every 30 minutes. Only larvae that displayed a robust optomotor response were further analyzed. The remaining trials consisted of drifting dots that began in the center of the visual field and moved to either the left or right at 30°/s. Dots were shown at maximum contrast against a dark background. Various diameters were shown from 0.3° to 44° visual angle, with 3 to 6 different sizes used in a recording. Each trial lasted 3 seconds and the inter-trial interval was randomly distributed with a mean of 9 seconds and a standard deviation of 3 seconds.

Data acquisition

The fLFM camera was triggered at 5 or 10 Hz using a microcontroller (Adafruit Grand Central). The first trigger was used to initiate the tail behavior camera. The tail was monitored in real-time using Stytra (Štih et al., 2019) at approximately 200 Hz. The recording duration lasted 1 to 2 hours.

Data processing

Raw fLFM images were denoised using a custom-trained DeepInterpolation (Lecoq et al., 2021) model. We trained a “unet_single_1024” model with 5 pre and post frames on a training set of n=20 example recordings and validation set of n=5 recordings. We ran the training until convergence of the validation loss. Denoised raw fLFM images were then reconstructed using the Richardson-Lucy deconvolution algorithm. We performed the deconvolution with an experimental PSF found by measuring the profile of a 1 μm fluorescent bead through the axial depth in 2 μm increments. The Richardson-Lucy algorithm took about 1 second per iteration on a GPU (NVIDIA TITAN V) and we used 12 to 20 iterations per frame. This resulted in a reconstructed volume of approximately 760 × 360 × 280 μm3 with voxels of 2.4 μm laterally and 4 μm axially.

Neuronal ROIs and timeseries were then extracted using the CNMF-E (Zhou et al., 2018) variant of the CaImAn software package (Giovannucci et al., 2019), which is optimized for processing of one-photon imaging data. Each plane was analyzed individually and then the results were merged. To determine neuronal ROIs, the SNR threshold was set at 2, the spatial consistency (rval) threshold was set at 0.8, and the CNN-based classifier threshold was set at 0.9. Neuronal timeseries were not deconvolved. Background fluorescence was factored out using the CNMF-E ring model. The planes were then collated by merging any units with a correlation greater than 0.7, lateral distance less than 10 μm, and axial distance less than 16 μm. Overlapping units were merged by summing each timeseries weighted by its SNR. Effects of photobleaching were removed by normalizing by a fit to the mean signal f(t) across all neurons over the recording with a bi-exponential model . Each neuronal timeseries was then z-scored before all following analyses.

Analysis of freely swimming behavior

For freely swimming experiments, n=9 6-9 dpf huc:h2b-gcamp6s larvae were individually placed in a 90 mm petri dish with a shallow (∼3 mm) amount of water. The dish was placed above a display (Apple iPad) with a 6 mm piece of clear acrylic in between. Black dots of sizes ranging from 1° to 30° were randomly shown at maximum contrast and moving across the dish in one direction at 30°/s. A camera (FLIR Grasshopper3 GS3-U3-41C6M-C) with a zoom lens (Navitar MVL7000) was mounted above to monitor the larvae’s behavior at >100 Hz.

Each larva’s movements were tracked using DeepLabCut (Mathis et al., 2018), which was trained to track the position of the eyes, the swim bladder, and four points along the spline of the tail. Anipose was used to build a pipeline to process large numbers of videos (Karashchuk et al., 2020). Movement bouts were extracted by applying a threshold to the velocity of the centroid of the tracked points. Only movement bouts in which the centroid was within approximately six body lengths of the visual stimulus were further analyzed. For each bout, we then calculated the mean change in distance to the visual stimulus, with positive values denoting the larvae moving further away from the visual stimulus.

Analysis of head-fixed behavior

During head-fixed visual experiments, 20 points along the spline of the larva’s tail were tracked in real time using Stytra (Štih et al., 2019). The overall tail curvature was then quantified by summing the angles between each segment of the tail tracking. For comparison with neural data, this was convolved with the NL-GCaMP6s kernel to define the tail direction kernel, while the GCaMP-kernel-convolved absolute tail curvature defined the tail vigor kernel.

Bouts of movement were extracted by thresholding the absolute tail curvature with a value σactive set to one standard deviation of the absolute tail curvature over time. To study the behavioral response to various stimuli, the mean tail curvature during active bouts was computed for each stimulus presentation period; trials in which there were no movement bouts were excluded. For turn direction analyses, trials with a mean tail curvature during bouts > σactive were considered as left turn trials, whereas trials with a mean tail curvature during bouts < −σactive were considered right turn trials.

Visual stimulus decoding

For the decoding shown in Figure 3B, the stimulus-evoked neural response was taken as the average response of each neuron during a single stimulus presentation. For each larva, a logistic regression model with lasso regularization was trained on all trials using 6-fold cross-validation with 3 repetitions. The lasso regularization parameter, which induces sparsity, was swept and the model with the highest average cross-validated accuracy was used. The confusion matrix, computed on the held-out testing trials, was reported as the average over all rounds of cross-validation.

Identification of stimulus decoding and noise dimensions

The comparison of the neural dimensions encoding visual stimuli versus trial-to-trial noise was modeled after (Rumyantsev et al., 2020). Partial least squares (PLS) regression was used to find a low-dimensional space that optimally predicted the visual stimuli, which we refer to as the visually-evoked neuronal activity patterns. To perform regression, a visual stimulus kernel was constructed by summing the timeseries of each individual stimulus type, weighted by the stimulus size and negated for trials on the right visual field, thus providing a single response variable encoding both the location, size, and timing of all the stimulus presentations. This stimulus kernel was the convolved with the temporal response kernel of our calcium indicator (NL-GCaMP6s). The dimensionality d of PLS regression was optimized using 6-fold cross-validation with 3 repeats and choosing the dimensionalities with the lowest cross-validated mean squared error. The optimal stimulus decoding direction was identified by re-fitting the optimal PLS regression model with all timepoints.

For each stimulus type, the noise covariance matrix C was computed in the low-dimensional PLS space, given that direct estimation of the noise covariances across many thousands of neurons would likely be unreliable. A noise covariance matrix was calculated separately for each stimulus, and then averaged across all stimuli. The noise modes for α = 1 … d were identified by eigendecomposition of the mean noise covariance matrix across all stimuli, . The angle between the optimal stimulus decoding direction and the noise modes is thus given by .

The significance of each neurons’ contribution to a given neural population vector was evaluated by comparison to shuffled datasets. To created shuffled datasets, the ground truth stimulus presentation labels were randomly permuted. The same analysis above was performed on n=10 shuffled datasets, to identify a distribution of shuffled coefficients for the optimal stimulus decoding direction, noise modes, and largest principal component. A neuron was then considered to be significantly contributing to a given population vector if its absolute coefficient was greater than three standard deviations above the mean of that of the shuffled datasets.

Classification of single trial turn direction, responsiveness, and stimulus

To predict decisions from neural activity despite the variable reaction times, each trial was time-warped in order to align the stimulus onset and the movement initiation timepoints, following (Lin et al., 2020). For no response trials, a random reaction time was selected from the responsive trials and the corresponding time-warping was applied. Logistic regression binary classification models with lasso regularization were fit to classify left versus right turn trials or response versus no response trials from various time windows of the time-warped neural activity. Thus, time-warped neuronal activity was averaged over windows from various start and end timepoints to determine the predictability at each time window. The optimal predictability at any given timepoint was then considered as the best time window that ended at that timepoint. For each larvae, 6-fold cross-validation with 3 repetitions was utilized to determine the cross-validated F score, accuracy, and confusion matrix. For stimulus classification, logistic regression was similarly applied for multi-class classification using the one-versus-rest scheme. Overall classification performance was quantified using the mean F score over larvae and was compared to shuffled data, in which the class labels had been randomly permuted.

Visualizations of single trial trajectories were made by projecting the time-warped trials onto the weights of an example, randomly chosen logistic regression model.

Significant neurons contributing to turn direction and responsiveness classification were identified by comparing a neuron’s absolute coefficient across the multiple cross-validation rounds to that from models fit to shuffled class labels. Neurons were deemed to be significantly contributed using a one-tailed t-test, p<0.05. Brain regions (telencephalon, optic tectum, cerebellum, and hindbrain) were manually annotated to quantify the brain-wide distribution of significant neurons.

Supplementary Figure S3, related to Figure 3, trial-to-trial variability in visually-evoked neurons is largely orthogonal to visual decoding dimensions

A. Schematic of trial-to-trial noise mode identification. First, dimensionality reduction is applied to the whole-brain dynamics (left) to identify the neuronal activity patterns that optimally preserve visual information using partial least squares regression. Within these visually-evoked neurons, each trial’s average activity during the visual presentation is computed (middle, each dot represents a single trial corresponding to the pink or green stimulus condition). Finally, the covariance across trials is decomposed to find the noise modes , which represent orthogonal directions within the visually-evoked neurons and ranked in order of decreasing variance.

B. robustly encodes the visual stimulus. Shown are example time traces of the stimulus kernel (blue, where the size of the stimulus is encoded by the magnitude and which visual field it is presented in is encoded by the sign) and the neural projection onto (orange), which exhibit a correlation of r=0.88, p<10-6.

C. The largest brain-wide principal component (PC) encodes motor behavior. (i) Example time traces of the tail vigor (blue) and PC 1 (orange), which exhibit a correlation of r=0.51, p<10-6. (ii) The coefficients of the neurons contributing most to the PC, which are distributed across the brain and are nearly all positively correlated with tail vigor.

Supplementary Figure S4: related to Figure 4, pre-motor neuronal populations predictive of single-trial behavior.

A. Classification of turn direction as a function of time window analyzed. The mean F score across n=7 larvae is used to assess the performance of multi-class classification of the visual stimulus. Top: a time window is swept across various start timepoints (y-axis) and end timepoints (x-axis) to determine the classification performance across various time intervals of the neuronal dynamics. Bottom: Mean ± 95% confidence interval of the F score for the best time interval ending at the given timepoint (Data), compared to shuffled data in which the stimulus labels are randomized. The classification accuracy is significantly higher than shuffled data (p<0.05, paired t-test) only after stimulus onset.

B. Turn direction classification as a function of stimulus size. On average before stimulus onset (blue line), the F score is highest for the smallest size stimuli, whereas the F score is relatively constant at the time of movement initiation (orange line, mean ± 95% CI across n=5 larvae).

C. As in A, except for classifying left or right turn direction from the projections onto the trial-to-trial noise modes , as opposed to the whole-brain dynamics as used in Figure 4B.

D. As in A, except for classifying left or right turn direction from the projection onto the optimal stimulus decoding dimension .

Video 1: related to Figure 2. Example target-directed and target-avoidance bouts in free behavior.

On the left, an example target-directed bout showing a 7dpf larval zebrafish approaching a stimulus presented on a screen below the dish (see Figure 2A). On the right, a similar target-avoidance bout from the same larva. The videos are slowed down 10x. These videos correspond to the composite images shown in Figure 2B-C.