Introduction

Neural responses exhibit trial-to-trial variability, even in response to repeated presentations of the same stimulus [15]. Several hypotheses have been proposed for the source of this variability, including noise in synaptic transmission [6], fluctuations in balanced excitation and inhibition [7], noisy peripheral sensors [8], recurrent dynamics [9], neural adaptation [10] and synaptic plasticity [11]. To assess the impacts of variability on neural coding, researchers have developed a variety of statistical models that seek to capture both the stimulus dependence and stochasticity of neural spike responses [1217]. The simplest such model for spike train data is the Poisson process [18, 19], which assumes that spike count variability in disjoint time intervals is independent. Under this model, the variance of the binned spike counts is equal to the mean for any choice of time bin. However, a large literature has shown that this relationship does not hold precisely in many brain areas. Rather, spike count variance commonly exceeds the mean, a phenomenon known as “overdispersion” [1, 7, 2030].

To account for super-Poisson response variability in the visual pathway, Goris et al. [1] introduced the “modulated Poisson” model, which partitions neural variability into a stimulus-driven component and a modulatory gain component. Specifically, the model assumes that spike counts are Poisson distributed, with a mean determined by the product of two terms: a stimulus-related component that reflects the neuron’s stimulus tuning, and a stimulus-independent stochastic gain that varies randomly across trials (Figure 1A). Goris et al. [1] showed that this model provides an accurate account of spike count variability across the early visual pathway, with the degree of overdispersion linked to the strength of the modulatory component.

(A) Schematic of the Goris modulated Poisson model [1], which assumes that spike counts (y) are generated by a Poisson process whose rate is the product of two components: a stimulusdependent drive f (s) and a stimulus-independent gain g. The gain g is a scalar and is independently drawn across trials. (B, C) Two continuous-time extensions of the Goris model. (B) Constant gain model: assumes that the sampling interval of the gain equals the trial duration. Left: stimulus-induced rate, gain process, and instantaneous firing rate for three example trials. Middle: gain autocorrelation function. Right: Fano factor as a function of bin size (dashed line indicates the baseline Poisson model). (C) Independent gain model: assumes that the sampling interval of the gain is much shorter than the length of a single trial. Panels are organized as in (B). (D) Schematic of the proposed Continuous Modulated Poisson (CMP) model. Here, spike trains y(t) are generated by a Poisson process whose rate is the product of a time-varying stimulus drive fs(t) and a continuous-time stochastic gain process g(t). The gain process g(t) follows a Gaussian process (GP) with temporal correlations governed by an exponentiated power law kernel, and is independently drawn across trials. (E) Panels are organized as in (B), for the CMP model.

However, an important drawback of the modulated Poisson model is that it applies to binned spike counts but not to spike trains. It describes the variability of spike counts measured in discrete time bins, and thus cannot characterize variability on shorter timescales. Moreover, it does not account for how overdispersion depends on the size of the time bins used to count spikes.

To address this limitation, we begin by formulating a continuous-time version of the modulated Poisson model introduced by Goris et al. [1]. First, we replace the (scalar) stimulus component with a stimulus-dependent time-varying Poisson firing rate. Then, we specify a sample rate for the stochastic gain variable; we assume that the gain is constant within each discrete sample interval and is drawn independently for each interval. For example, we might assume that the sample interval for the modulatory gain is equal to the stimulus duration (Figure 1B). In this case, the gain is constant for each trial and independent across trials. This produces a gain “process” with a flat temporal auto-correlation, and a Fano factor (variance-to-mean ratio) that grows linearly with the size of the window used to count spikes (Figure 1B, right). Alternately, we could assume that the sample interval is much shorter than the length of a single trial (Figure 1C). In this case, the auto-correlation of the gain process is narrow, and the Fano factor curve remains flat for all bin sizes larger than the sample interval (Figure 1C, right).

These continuous interpretations of the Goris model rely on the assumption that the modulatory gain is constant during each sample interval and then jumps discontinuously between intervals, which seems unrealistic for biological systems. Moreover, such models have limited ability to capture complex relationships between time bin size and Fano factor, given that they can only produce curves that grow linearly and then saturate (Figure 1B,C).

To overcome these limitations, we introduce the continuous modulated Poisson (CMP) model, a flexible extension of the Goris model to continuous-time spike responses. The CMP model describes spike trains as arising from a Poisson process with a rate given by the product of two continuous-time processes: (1) a “stimulus-driven” component, which describes the firing rate fluctuations induced by the stimulus; and (2) a modulatory “gain” component, which captures stimulus-independent trial-to-trial variability using a stochastic process (Figure 1D). Both components are modeled as exponentiated Gaussian processes (GPs), providing a flexible framework for capturing smooth, continuous dynamics with tunable temporal structure [3133].

To more precisely capture the temporal correlations in the modulatory gain process, we introduce a novel covariance function that allows these correlations to decay more slowly than those described by the standard radial basis (or “Gaussian”) kernel. We refer to this as the Exponentiated Power Law (EPL) kernel, which models autocorrelation decay according to an exponentiated power law with exponent q ≤ 2 (Figure 1E, middle). This added flexibility enables the model to more accurately capture nonlinear relationships between bin size and Fano factor (Figure 1E, right) compared to previous approaches. Moreover, the CMP framework provides closed-form expressions for how the Fano factor scales with time bin size, offering a direct way to probe how variability accumulates and transforms across different temporal resolutions.

We applied the CMP model to spike train recordings from four visual areas (LGN, V1, V2, and MT) [1, 34] and uncovered novel insights into variability across the visual hierarchy. In particular, we observed a systematic increase in the magnitude of gain fluctuations with progression along the visual hierarchy, consistent with prior observations [1]. Furthermore, comparisons of the inferred EPL kernels across visual areas revealed that gain autocorrelation exhibits slower temporal decay at higher levels of the visual hierarchy. A comparison of the variance of the stimulus drive with the variance of the modulatory gain process across neurons further revealed that these two components tend to be negatively correlated in all visual areas. These findings support existing hypotheses while providing new insight into variability structure and demonstrate that the CMP model outperforms existing methods in capturing time-resolved overdispersion.

In summary, this study makes five main contributions. (1) We extend classical variability partitioning approaches by introducing a continuous-time model that captures how neural variability evolves across temporal scales, overcoming the limitations of bin-based analyses. (2) We introduce the Exponentiated Power Law (EPL) kernel, which models slow, heavy-tailed correlations in modulatory gain and links them to distinct timescales of variability. (3) We derive analytic expressions for the Fano factor as a function of time bin size, enabling variability to be partitioned across temporal scales within a single unified model. (4) We develop a scalable inference framework that jointly estimates stimulus-driven and modulatory components directly from spike train data. (5) By applying CMP to large-scale recordings across multiple stages of the visual hierarchy, we reveal systematic trends in gain dynamics and stimulus modulation, providing new insights into how variability is structured and propagated across cortical processing stages. Altogether, CMP offers a principled and flexible framework for partitioning neural variability in continuous time across stimulus conditions and brain areas.

Results

Continuous modulated Poisson (CMP) model

Our proposed model describes spike trains using a doubly stochastic point process model [35], in which the firing rate of a Poisson process is itself a stochastic process. Specifically, we model a spike train y(t) elicited in response to a stimulus s using a Poisson process:

where the firing rate, or “conditional intensity”, λ(t), is a random variable defined as the product of two terms:

with fs(t) representing the stimulus-dependent component that captures stimulus-locked fluctuations in firing rate, and g(t) denoting a stochastic gain process that accounts for stimulus independent fluctuations in neuronal excitability (Figure 1D-E).

We assume that the stimulus component fs(t) varies independently across stimuli and evolves smoothly in time. To express this smoothness, we place a Gaussian process (GP) prior over the log firing rate:

where is the standard radial basis function (RBF) covariance function, which describes the covariance of fluctuations in log firing rate at arbitrary points in time [31]. Specifically, the covariance between the log firing rate at any two time points t1 and t2 during stimulus presentation interval [0, T] is given by:

where and are hyperparameters controlling, respectively, the variance and temporal smoothness (length scale) of fluctuations in the log firing rate. To flexibly capture diverse tuning profiles, each stimulus condition is assigned its own set of stimulus-related hyperparameters.

The gain process g(t) is modeled analogously as an exponentiated Gaussian process:

where Kg is the covariance function that characterizes temporal correlations in the gain process. Unlike the stimulus component, the gain is a latent variable that varies independently across repeated presentations of the same stimulus, thereby inducing greater-than-Poisson variability in the spike counts. For a fixed stimulus s = s* presented N times, the model includes a shared stimulus component and an independent gain process gi(t) for each repeat i ∈ {1, …, N}. Also, in contrast to the stimulus drive, because the gain process reflects stimulus-independent variability, its hyperparameters are shared across all stimulus conditions. This formulation provides a unified and parsimonious framework for modeling trial-to-trial variability across stimuli within a single generative model.

To more accurately model the stochastic structure of real neural spike trains, we introduce a novel covariance function for the gain process, referred to as the Exponentiated Power Law (EPL) kernel:

where ρg denotes the marginal variance, g is the temporal length scale, and q is the power-law exponent controlling the rate of correlation decay over long timescales. We show in SI Appendix that this kernel defines a valid covariance function for 0 < q ≤ 2. Because the gain reflects stimulus-independent variability, its hyperparameters are shared across all stimulus conditions. This formulation provides a unified and parsimonious generative model that captures both smooth, stimulus-locked activity and structured trial-to-trial variability across stimuli.

Figure 2 illustrates example gain covariance functions under different settings of the hyperparameters g (length scale) and q (power-law exponent). When q = 2 (right column), the EPL kernel reduces to the standard RBF covariance function defined in equation 4. For q < 2, the correlations exhibit heavier tails than in the RBF case, decaying more slowly at long time lags. In the limiting case of g → ∞ or q → 0, the gain becomes effectively constant over each stimulus presentation interval [0, T], corresponding to the continuous-time Goris model shown in Figure 1B.

Illustration of different settings of the proposed Exponentiated Power Law (EPL) covariance function with marginal variance ρg = 0.1.

(A) EPL auto-covariance functions with different settings of length-scale (top row: g = 0.02, bottom row: g = 0.1) and power-law exponent (columns left to right: q = 0.5, q = 1, q = 2). Colored traces below each auto-covariance show three sample gain processes g(t), obtained by sampling from the corresponding GP and exponentiating. (B) Fano Factor curves showing how Fano Factor changes as a function of the bin size used to count spikes, for EPL covariances in the top and bottom rows in (A), respectively.

Partitioning variability with CMP

Under the CMP model, spike train variability can be partitioned into a Poisson component, which is equal to the mean, and a modulatory component, which controls the degree of overdispersion. The continuous-time formulation of the CMP model allows us to compute this partitioning for arbitrary time intervals, and thus to analytically characterize how overdispersion grows with time bin size. In the case of a non-fluctuating stimulus component , the mean spike count in a time bin of size Δ is given by:

where denotes the mean firing rate under the model. The spike count variance is then given by:

where

is the average of the exponentiated covariance function exp(Kg) over all pairs of times in the interval [0, Δ]. (Note: this formula holds for any covariance function, not just the EPL covariance function we use for Kg). From these two terms, the Fano factor as a function of bin size can be computed as the ratio of variance to mean:

This shows that the Fano Factor approaches 1 in the limit of small bin size Δ, low firing rate , or when κ(Δ) approaches 1, which occurs when the variance of the gain process ρg goes to zero. Note that κ(Δ) is a decreasing function of Δ, reflecting the decay of correlations in the gain process, and it remains constant only for a gain process with infinite lengthscale (for example, Figure 1B). Thus, the Fano factor of the CMP model grows at most linearly with time bin size. Corresponding expressions for the general case, where fs(t) is a fluctuating stimulus component, are provided in the Methods and Materials (see SI Appendix for derivations).

Model fitting

To fit the CMP model to data, we need to compute the probability over spike counts in finite-sized time bins, given that spike trains are generally measured at a finite sampling rate. Computing this probability involves integrating a conditionally Poisson spike count probability over all realizations of the gain process g(t):

where is the expected spike count in the bin [t, t + Δ] given the stimulus s and gain process g(t), and µ(g(t)) is the probabilistic measure of the gain process under the Gaussian process prior (equation 5). However, this (infinite-dimensional) integral cannot be computed in closed form, since it involves the product of Poisson and Gaussian distributions. Thus, we introduce an efficient variational inference procedure [36], which allows us to optimize the hyperparameters governing the GP covariance functions and infer the stimulus components fs(t) for each stimulus (see Methods and Materials). The fitting method also provides a log-Gaussian approximation to the posterior over the gain process g(t) for each trial, allowing us to visualize the most likely modulatory stochastic gain trajectory for each repeat of a given stimulus.

To validate our inference procedure, we applied it to a synthetic dataset generated from the CMP model, as shown in SI Appendix. Figure S1A illustrates the true and inferred stimulus-drive and gain processes for two trials of two different stimuli, while Figure S1B compares the true and inferred Gaussian process hyperparameters. These results demonstrate that the proposed inference framework accurately recovers ground-truth hyperparameters, tuning functions, and gain processes on a discretized time grid. Note that the model itself is defined in continuous time and can therefore be consistently applied to data measured at arbitrary temporal resolutions.

Continuous partitioning of variability in the macaque V1

We applied our method to partition response variability in five populations of neurons recorded from the superficial layers of the macaque primary visual cortex (data from [34]). These datasets were collected from three monkeys using drifting grating stimuli presented at S = 72 equally spaced orientations. Each grating was displayed for 1280 ms, followed by a 1280 ms blank period, resulting in a total trial duration of 2.56 s. Each grating orientation was repeated N = 50 times. In total, we analyzed spiking activity from 654 isolated units (147, 113, 113, 133, and 148 neurons across the five datasets).

Figure 3A shows the inferred time-domain activity of a representative neuron from dataset 1, using a temporal grid resolution of dt = 0.02 s, for three different drift orientations (00, 1050, and 2800). The rows (top to bottom) show the PSTH derived from the data, the estimated stimulus drive, and–for two different trials per orientation–the inferred gain processes, firing rates, and observed spike rasters. While the estimated stimulus drive closely matches the PSTH, the inferred gain processes capture trial-specific temporal fluctuations that are clearly visible in the spike rasters. Notably, when the neuron is strongly driven by the stimulus (middle column), the instantaneous per-trial firing rate is dominated by the stimulus drive. In contrast, when the neuron is weakly driven (left column), the stochastic gain process plays a larger role in shaping the per-trial firing rate. These observations indicate that the CMP model meaningfully partitions response variability across stimulus conditions. Figure 3B shows the estimated EPL gain covariance, corresponding to optimal hyperparameters ρg = 2.9, g = 0.93, and q = 0.95.

Performance comparisons of different models on neural population data recorded from macaque primary visual cortex under 72 drifting sinusoidal gratings (data from Graf et al. [34]).

(A) Top to bottom: Tuning curve of a sample neuron; for three selected orientations (00,1050,2800), the PSTH, inferred stimulus drive, inferred gain (trial 1), inferred firing rate (trial 1), observed spike raster (trial 1), inferred gain (trial 2), inferred firing rate (trial 2), observed spike raster (trial 2), and Fano factor vs. bin size curves for real data (black), CMP model (red), Goris-independent gain model (dark blue dashed), Goris-constant gain model (light blue), and Poisson-GP model (gray). (B) EPL gain covariance (top) and autocorrelation (bottom) under the CMP model for the neuron in panel A. (C) Average test log-likelihood improvement of each model relative to the baseline Poisson model, computed on held-out data across all 654 V1 neurons and 72 stimulus orientations.

The continuous-time formulation of the CMP model allows variability to be partitioned at arbitrary temporal resolutions, enabling analytical characterization of the relationship between overdispersion and bin size (see Methods and Materials for details). The bottom row of Figure 3A shows Fano factor curves derived from the CMP model (red), compared with curves computed from the data (black), the Goris model with independent gain (dark blue dashed), the Goris model with constant gain (light blue), and the Poisson-GP model (gray). The two variants of the Goris model are continuous-time extensions of the original modulated Poisson model [1]: the independent gain model assumes that the sampling interval of the gain process is much shorter than the trial duration, whereas the constant gain model assumes that the gain remains fixed throughout the entire trial. The Poisson-GP model corresponds to the CMP model without a gain component, meaning it includes only a stimulus drive (see Methods and Materials for details of all model variants). These results show that the CMP model closely matches the overdispersion observed in real data across orientations and bin sizes, outperforming all other methods. While both Goris model variants produce overdispersion [1], the independent gain model yields Fano factor curves that are largely flat, and the constant gain model produces curves that increase linearly with bin size, neither of which captures the true empirical relationship.

We next compared model performance using the average log-likelihood improvement relative to the baseline Poisson model on held-out data from all 654 V1 neurons across all 72 orientations (Figure 3C). The CMP model achieved the highest predictive performance across all neurons. In Figure S2, we further compared the CMP model to three variants with different gain kernels: CMP-RBF (where the EPL kernel in equation 6 is replaced by the standard RBF kernel [31]), CMP-Matérn 0, and CMP-Matérn 1 (with Matérn kernels of order 0 and 1, respectively). These results demonstrate that the EPL kernel provides the best fit to the gain structure in the data.

To evaluate the utility of sharing gain hyperparameters across stimuli, we compared the CMP model with a variant in which the gain hyperparameters g, g, q} are fit independently for each of the s = 1, …, S stimulus conditions, denoted by . Figure S3A shows the inferred activity under both models, and Figure S3B presents the corresponding log-likelihoods. The performance of the two models was nearly identical, suggesting that the CMP model can robustly capture trial-to-trial variability across stimuli using a single shared set of gain hyperparameters.

Finally, we assessed the consistency of CMP inference with respect to temporal discretization by comparing inference results across three sufficiently fine grid resolutions: dt = 0.02 s, dt = 0.05 s, and dt = 0.1 s. As shown in Figure S4A, the estimated gain hyperparameters remain stable across these resolutions, indicating that the inference procedure is robust to moderate changes in grid size once the temporal discretization is fine enough to approximate the underlying continuous-time model. Figure S4B shows corresponding test log-likelihoods across models and grid resolutions, confirming that CMP consistently outperforms alternative models over this range.

Continuous partitioning of variability along the visual hierarchy

Next, we evaluated model performance across different areas of the visual hierarchy by applying each method to neural responses recorded in the lateral geniculate nucleus (LGN), V1, V2, and MT (data from Goris et al. [1]). The recordings were obtained using drifting sinusoidal gratings of the preferred size and speed, varying either in S = 12 spatial frequency (ranging from 0 to 10 cycles/deg) or in S = 16 equally spaced drift direction. Each grating was presented for one second and repeated at least N = 5 times. We analyzed spiking activity from 48, 376, 167, and 127 neurons from LGN, V1, V2, and MT, respectively, using a grid resolution of dt = 0.02 s.

Figure 4A shows the average log-likelihood improvement on held-out data, relative to the baseline Poisson model, for each method across all four areas. The CMP model consistently outperformed all other models across the visual hierarchy. To examine performance at the single-neuron level, Figure 4B presents scatter plots comparing the log-likelihoods of the CMP model with those of the Poisson-GP model, the Goris independent gain model, and the Goris constant gain model. These results show that the CMP model systematically achieves better fits across individual neurons in all regions.

Performance comparisons of different models on neuronal population data recorded from different areas along the visual hierarchy—the lateral geniculate nucleus (LGN), V1, V2 and MT—under drifting sinusoidal gratings of preferred size and speed, varying in spatial frequency (12 frequencies from 0 to 10 cycles/deg) or in drift direction (16 directions) (data from Goris et al. [1]).

(A) Log-likelihood improvement of each model relative to the baseline Poisson model, averaged across neurons in each population. Rows from top to bottom: LGN, V1, V2, MT. The number of neurons per area is indicated in parentheses. (B) Log-likelihood comparison of the CMP model versus the Goris independent gain model (left), Goris constant gain model (middle), and Poisson-GP model (right) for individual neurons. Colors indicate cortical areas: LGN (yellow), V1 (green), V2 (blue), MT (purple). (C) Fano factor vs. bin size curves averaged across each population. Black: real data; red: CMP; dark blue dashed: Gorisindependent gain; light blue: Goris-constant gain; gray: Poisson-GP.

Next, we characterized trial-to-trial variability by analyzing the mean–variance relationship as a function of bin size. Figure 4C shows Fano factor curves, averaged over neurons in each population, for real data and the various models. Notably, the CMP model closely follows the overdispersion patterns observed in real data across visual areas and timescales, again outperforming alternative methods. In Figure S5, we extend this analysis by including three additional CMP variants: CMP-RBF, CMP-Matérn 0, and CMP-Matérn 1. These comparisons further confirm that the standard CMP model provides superior performance across all visual areas.

Finally, to examine the importance of including a GP prior on the stimulus drive, we compared the performance of the Goris model with and without this component. (Note that the original modulated Poisson model in Goris et al. [1] does not include a prior on the stimulus drive; thus, the version without the prior is more directly comparable to their formulation.) Figure S6 presents the log-likelihood comparisons from these analyses, which clearly show that incorporating a smoothness prior on the stimulus drive generally improves the model’s ability to fit held-out data. It is also noteworthy that even the Poisson-GP model, which lacks a stochastic gain component, outperforms both variants of the Goris model when the GP prior on the stimulus drive is removed (Figure S6A–C). These results highlight that the smoothness prior we place on the stimulus drive is also crucial for accurately capturing neural variability (Figure S6D).

Variations in gain and stimulus-drive along the visual hierarchy

Finally, we compared the distributions of the inferred CMP hyperparameters across the visual areas LGN, V1, V2, and MT, as shown in Figure 5. Panels A, B, and C show the distributions of the inferred gain power-law exponent q, gain length scale g, and gain variance ρg, respectively, for each neuronal population. Figure 5D presents the median gain covariance and its corresponding autocorrelation function for each area.

Comparison of inferred CMP hyperparameters across visual areas: the lateral geniculate nucleus (LGN), V1, V2, and MT (data from [1]).

Rows top to bottom: LGN (yellow), V1 (green), V2 (blue), and MT (purple). The number of neurons per area is indicated in parentheses. (A) Distribution of gain power-law exponent q across neurons. Triangles indicate the median. (B) Distribution of gain length scale g across neurons. (C) Distribution of gain variance ρg across neurons. (D) Median gain covariance across the neuronal population (top) and gain auto-correlation (bottom). (E) Distribution of the average stimulus-drive variance across neurons. (F) Scatter plots of versus ρg across neurons, with linear regression fits (black) and slopes annotated. ***p < 0.001.

The gain power-law exponent (Figure 5A) increases along the visual hierarchy: from LGN to V1 (p < 0.001, sign test) and from V1 to V2 (p < 0.001), although there is a slight decrease from V2 to MT (p < 0.05). The gain length scale (Figure 5B) also shows a progressive increase: from LGN to V1 (p < 0.05), from V1 to V2 (p < 0.05), and from V2 to MT (p < 0.001). Additionally, the median gain autocorrelation (Figure 5D) exhibits slower temporal decay and a broader central lobe in higher visual areas. Together, these results suggest that rapid temporal fluctuations in trial-to-trial variability diminish as information propagates along the visual hierarchy.

In contrast, the gain variance ρg (Figure 5C) increases systematically along the hierarchy: from LGN to V1 (p < 0.001), and from V1 to V2 (p < 0.001). This trend indicates that the magnitude of gain fluctuations becomes larger at later stages of the visual pathway, consistent with prior observations [1].

Next, we examined the distribution of stimulus-drive variance, averaged across stimuli, defined as: , as shown in Figure 5E. A larger value of generally indicates stronger stimulus encoding by the neuron. In contrast to the gain variance trend, the stimulus-drive variance tends to decrease along the visual pathway: from LGN to V1 (p < 0.001), and from V1 to V2 (p < 0.05). This suggests a reduction in the strength of stimulus-driven encoding as information moves through the visual hierarchy.

To further investigate the relationship between stimulus-drive and gain variability, Figure 5F shows scatter plots of versus ρg for individual neurons in each area. Each panel includes a linear regression line with the slope indicated. Across all visual areas, there is a statistically significant negative correlation between the two quantities (p < 0.001, t-test), indicating that neurons with weaker stimulus-driven components tend to have stronger gain variability, and vice versa. This finding aligns with recent reports of an anti-correlation between stimulus encoding strength and trial-to-trial, stimulus-independent variability in sensory neurons [33, 3739].

Discussion

Advances in high-density electrophysiological recording techniques [40, 41] now allow large-scale measurements of neuronal activity across multiple brain areas and experimental conditions. Analyses of sensory responses to repeated stimuli consistently reveal substantial trial-to-trial variability in neural spiking, arising from diverse sources such as synaptic transmission noise, fluctuations in excitation–inhibition balance, noisy peripheral inputs, recurrent dynamics, adaptation, and synaptic plasticity [611]. Capturing this variability is essential for understanding neural coding [1214]. Traditional Poisson spiking models [18, 19] provide a foundational framework but predict spike count variance equal to the mean. This assumption is often violated in many brain regions where neurons exhibit overdispersion, meaning variability that exceeds the Poisson prediction [1]. Although numerous models have been proposed to account for this [2329, 32, 33, 42, 43], few provide a detailed, time-resolved characterization of trial-to-trial variability.

Here, we introduced the Continuous Modulated Poisson (CMP) model, a principled probabilistic framework for modeling spiking variability in continuous time. The CMP model builds upon the modulated Poisson model [1] by describing the spike rate at each time point as the product of two independent processes: a stimulus-driven component and a stochastic gain process. Both components are modeled as exponentiated Gaussian processes (GPs), enabling flexible characterization of their smooth temporal dynamics. This model belongs to the class of log-Gaussian Cox processes [35], as it constitutes a doubly stochastic Poisson process. We allow the stimulus-driven components to have condition-specific hyperparameters, while the gain hyperparameters are shared across conditions, supporting efficient and interpretable joint inference.

To capture the long-range correlations observed in gain fluctuations, we introduced the Exponentiated Power Law (EPL) kernel, a new covariance function that generalizes the commonly used radial basis function (RBF) kernel. In our application, the EPL kernel consistently outperformed standard alternatives such as the RBF and Matérn kernels. In addition, we developed a scalable variational inference method that jointly estimates the latent stimulus-driven and stochastic gain components, along with their hyperparameters, directly from spiking data. We validated this approach using synthetic experiments. When applied to recordings from macaque V1 [34], the CMP model accurately partitioned response variability across time, trials, and stimulus conditions. It outperformed alternative approaches, including continuous-time extensions of the modulated Poisson model, in predicting held-out spiking activity and capturing the empirical Fano factor curves over time. Applying the CMP model to data from multiple stages of the visual hierarchy (LGN, V1, V2, and MT) [1], we again observed superior performance across areas and consistent predictions of overdispersion.

Analysis of the inferred CMP hyperparameters across visual areas revealed systematic trends. The temporal smoothness of gain fluctuations increased along the hierarchy, suggesting that trial-to-trial variability becomes more slowly varying in higher-order areas. In contrast, gain variance increased, consistent with an accumulation of shared noise or contextual modulation. Meanwhile, the variance of the stimulus-driven component decreased, consistent with increased selectivity and stabilization of tuning in downstream regions. A consistent negative correlation between the magnitudes of gain variance and stimulus-drive variance was observed across all areas, indicating that neurons with strong stimulus encoding tend to exhibit lower trial-to-trial variability, and vice versa. This observation aligns with previous findings of an anti-correlation between signal strength and variability in sensory neurons [33, 3739]. These results have broader implications for understanding sensory coding, revealing how the brain balances fidelity and flexibility by modulating trial-to-trial variability across space and time [44]. They also align with broader theories in systems neuroscience, such as efficient coding [45] and probabilistic inference [46], where internal variability may be structured rather than random.

Our method complements black-box models by providing interpretable estimates of both stimulus driven and modulatory influences on spiking variability. It is particularly well-suited for experimental settings with repeated stimuli and temporally structured variability, such as studies of sensory tuning or motor control [1, 34, 47]. The CMP model extends prior work by enabling continuous-time analysis of spike trains, jointly modeling stimulus-driven and stimulus-independent components across conditions, and allowing full inference over both latent processes and their hyperparameters. While we found that the EPL kernel performed best in our experiments, kernel selection may depend on the task-specific temporal structure of neural variability.

More broadly, the EPL kernel introduced here is generally applicable and may support future Gaussian process–based modeling efforts beyond neuroscience. While the EPL kernel provides a flexible and tractable way to capture a wide range of temporal correlation structures in the stochastic gain process, its generative interpretation depends on the parameter q. For q = 1, a straightforward stochastic generator is a simple low-pass filtered Gaussian white noise process, equivalent to an Ornstein–Uhlenbeck (OU) process [48, 49]. For q < 1 there is no known closed-form stochastic differential equation that exactly produces the EPL covariance, but such long-memory dynamics can be closely approximated by superpositions of OU processes with a spectrum of time constants or by fractional stochastic processes such as fractional Brownian motion [50]. Conversely, when q > 1, correlations decay more rapidly than exponentially, corresponding to short-memory dynamics with diminished low-frequency power; these can often be approximated by higher-order linear filters or by driving OU processes with temporally correlated noise. Together, these interpretations highlight how the parameter q shapes the temporal structure of trial-to-trial variability.

An important direction for future work is to move beyond descriptive modeling and explore the biological interpretability of inferred hyperparameters. For instance, differences in q or g across the visual hierarchy could reflect distinct circuit or cellular mechanisms, such as changes in synaptic integration timescales, recurrent dynamics, or bursting behavior. This perspective could help explain phenomena such as the disproportionately slow gain fluctuations we observe in the LGN (e.g., q ≈ 0.5), potentially linking them to specific biophysical or network-level processes. Establishing such connections would deepen our understanding of how structured variability arises and how it shapes sensory computation.

Mechanistic interpretations of structured variability may also benefit from parallels with other biological systems. For example, variability in bacterial flagellar motor switching and single-cell gene expression can be explained by a slow extrinsic process modulating a faster Poisson process, producing heavytailed dwell times or variance–mean scaling beyond Poisson expectations [51, 52]. This two-timescale structure closely parallels the CMP framework and suggests that future work incorporating explicit modulatory dynamics could offer deeper mechanistic insight into neural variability and its functional significance.

Beyond these conceptual extensions, there are also several methodological directions for future research. First, extensions to high-dimensional population activity [33] may allow joint modeling of shared variability across neurons. Second, adapting the model to alternative link functions [26] could broaden its applicability to different statistical structures. Third, developing inference algorithms compatible with calcium imaging data [39] would extend its utility to optical recordings. Fourth, integrating behavioral covariates [53] or state variables such as attention [54] or arousal [55] could improve inference of non-stationary trial structure. Finally, while our focus here was on neural spike trains, the general formulation of the CMP model could be applied to other domains involving overdispersed count data [5658].

To facilitate reproducibility and dissemination, we have released our implementation of the CMP model on GitHub [59]. In summary, the CMP framework provides a scalable, flexible, and interpretable approach for modeling neuronal spike variability, revealing new insights into the structure of trial-to-trial fluctuations and their organization across the visual hierarchy, while opening the door to deeper mechanistic interpretations of variability in neural systems.

CMP model inference

We start by recalling the general CMP forward model:

In our inference framework, we consider a setting with multiple repeats (n = 1, …, N) for a given stimulus condition, with data potentially collected across different stimulus conditions (s = 1, …, S). We perform inference on a uniform grid of time points (t = 1, …, T), with grid size set to dt seconds. Note that the model itself is defined in continuous time and can therefore be consistently applied to data measured at arbitrary temporal grid resolutions.

Let yt,s,n denote the spike count for the tth time bin, sth stimulus condition, and nth repeat. Further, let fs = [f1,sfT,s] and gs,n = [g1,s,ngT,s,n] denote the corresponding stimulus-driven and stochastic gain processes. Further, let and represent the log-transformed stimulus-driven and gain components, for notational brevity. Note that, following the CMP generative model, the latent variables and , along with the hyperparameters , are all unknown quantities that need to be estimated given the spiking observations y = y1:T,1:S,1:N.

Frequency domain transformation of stimulus-driven components

The radial basis function (RBF) kernel diagonalizes in the Fourier domain, a property previously exploited to improve computational efficiency [33, 60]. Motivated by this, we adopt a frequency-domain representation of the stimulus-driven components (f) in our inference procedure. Accordingly, we relate the time-domain components to their corresponding Fourier-domain components:

where (noting that typically MsT due to pruning of unnecessary Fourier coefficients that do not substantially contribute to explaining variability in [33, 60]) represents the standard real orthonormal inverse discrete Fourier transform matrix. Then, with the diagonalization of the RBF kernel [33, 60], the Fourier-transformed stimulus-driven components are distributed as independent Gaussian random variables with the following parameters:

where , with denoting the mth Fourier frequency [33, 60]. Thus, we use the frequency-domain stimulus-driven components w = w1:S in place of their timedomain counterparts f, reducing computational complexity by avoiding matrix inversions and enabling pruning (MsT). We do not apply frequency-domain inference to the gain components g, as the heavy tails of the EPL kernel do not support effective pruning.

Variational Inference

To jointly infer the latent variables w and g and hyperparameters θ, we introduce an efficient inference procedure based on Variational Inference (VI) [36], which aims to find an approximate distribution q (w, g) for the intractable true posterior p (w, g|y, θ). In VI, the approximate posterior is first introduced into the data log-likelihood by rewriting it as [32]:

where DKL(q||p) is the Kullback-Leibler divergence between the variational approximation and the true posterior. Since DKL(q||p) is non-negative, ℒ(q) provides a lower bound on the model evidence, and is therefore called the Evidence Lower Bound (ELBO). Maximizing the ELBO is equivalent to minimizing DKL(q||p), thus finding the variational approximation closest to the true posterior.

Variational distributions and the ELBO

We assume that the variational distributions q (w, g) factorizes into independent Gaussian distributions over stimuli s = 1, …, S and trials n = 1, …, N:

where and are full covariance matrices. Note that, by the property of linear transformations of Gaussian random variables [61], this in turn implies that the variational distributions over the time-domain stimulus-driven components are also Gaussian:

Accordingly, the ELBO can be derived as:

Joint inference of variational parameters and hyperparameters

In this section, we outline our proposed iterative scheme for the joint inference of variational parameters and hyperparameters, as illustrated in Algorithm 1. We alternate between optimizing the stimulus-drive hyperparameters for s = 1, …, S, and the gain hyperparameters θg = {ρg, g, q}, until convergence. Each optimization maximizes the ELBO in Equation 12, using Algorithm 2 (described in the next section) to infer the conditional variational parameters. We employed the built-in MATLAB function fmincon, a gradient-based solver for constrained nonlinear multivariable problems, to perform each maximization. To ensure stability, convergence, and valid covariance matrices, we constrained the hyperparameters to the following ranges: , and q ∈ [0.4, 2].

Inferring variational parameters given hyperparameters

Here, we outline the procedure for inferring the variational parameters by optimizing the ELBO for a given set of hyperparameters θ (i.e., assuming Kg and are known), as summarized in Algorithm 2. Note that the ELBO in Equation 12 is independent across stimulus conditions s = 1, …, S, when conditioned on θ. Therefore, we focus on estimating the variational parameters corresponding to the sth stimulus: νs, Ωs, µs,1:N, and Σs,1:N. We maximize the ELBO to find the optimal settings of these variational parameters, alternating between updating the stimulus-drive variational parameters (νs, Ωs) and the gain variational parameters (µs,1:N, Σs,1:N) until convergence. Each maximization step uses a Newton–Raphson procedure.

First, consider the estimation of the stimulus-drive variational parameters νs and Ωs, given the estimated gain variational parameters from the previous iteration. The gradient and Hessian of Equation 12 with respect to νs can be derived as:

Algorithm 1

Joint inference of variational parameters and hyperparameters

Algorithm 2

Estimation of variational parameters given hyperparameters

where is a diagonal matrix with the tth diagonal entry given by , and is a diagonal matrix with the mth diagonal entry equal to . Further, taking derivatives in Equation 12 with respect to Ωs yields:

and setting this derivative to zero results in:

Accordingly, we update the stimulus-drive variational mean νs using Newton–Raphson optimization with the gradient and Hessian in Equation 13, and subsequently update the stimulus-drive variational covariance Ωs using the Hessian as in Equation 14.

Next, we update the gain variational parameters independently for n = 1, …, N using the updated stimulus-drive variational parameters, following an equivalent Newton–Raphson procedure, with the gradient and Hessian relative to µs,n given by:

and the update rule of Σs,n given by:

Following the above derivations, we iterate between updating the stimulus-drive variational parameters and gain variational parameters until convergence. The combined iterative procedure for variational parameter inference is summarized in Algorithm 2.

Log-likelihood derivation of the CMP model

While we optimize the ELBO for parameter inference, accurately and fairly comparing models requires computing the log-likelihood on held-out data. To estimate this for the CMP model, we used the importance sampling procedure described below. Note that,

Observing that the integral in Equation 16 is intractable, we used importance sampling [62] to approximate it. We employed the variational distribution as the proposal distribution,

where . Defining:

and applying the log-sum-exp trick for numerical stability, we obtained:

where . We set I = 1000 and used the above formulation to estimate the loglikelihood of the spiking observations in the nth test trial of the sth stimulus, and finally derived the joint log-likelihood of all test data according to Equation 15.

Partitioning variability with CMP

Here we provide the general formula for partitioning variability into a Poisson component, equal to the mean, and a modulatory component, based on the CMP model in the setting where the stimulus drive fs(t) is a time-varying signal. The continuous-time formulation of the CMP model allows this partitioning to be computed for arbitrary time windows, enabling an analytic characterization of how overdispersion grows with the bin size used to count spikes. For a time bin of size Δ, the mean is given by:

and the variance is:

From these expressions, we can compute the Fano factor as a function of bin size, which is simply the ratio of variance to mean:

The details of the derivation are provided in the SI Appendix. To summarize statistics at a given bin size Δ over the entire observation window T, we use the following approach: we apply a sliding window of duration Δ, shift it systematically through the entire duration T, and average the computed statistics across all such windows.

Alternative models considered for performance comparison

We compared the performance of the CMP model with the alternative models described below.

Baseline Poisson

All log-likelihood figures reported in this paper are expressed as log-likelihood gains of each method relative to the baseline Poisson model. The baseline Poisson model assumed spike counts across trials are generated by a Poisson process:

and, to compensate for sparsity, we further assumed a Gamma prior on the firing rate λt,s Gamma (α, β), where α and β are hyperparameters. Since the Gamma distribution is conjugate to the Poisson distribution [63], the posterior distribution is also Gamma-distributed:

Thus, we found the MAP estimate of the firing rate by the mode of the posterior Gamma distribution, given by:

To fix the hyperparameters, we performed moment matching with the mean and variance . Note further that the mean and variance conditioned on λt,s are identical for this model, resulting in a Fano factor of 1.

Poisson-GP

This model includes only the stimulus-driven components, assuming that trial-to-trial variability is fully described by the stochastic nature of the Poisson process:

The stimulus drive itself is modeled by a Gaussian process, similar to the CMP model:

We inferred the hyperparameters and latent stimulus-driven components of this model using a simplified version of our proposed variational inference procedure. As in the baseline Poisson model, the mean and variance conditioned on the stimulus are identical, yielding a Fano factor of 1.

Goris-indep

This model extends the modulated Poisson framework of Goris et al. [1] to continuous time. The model in Goris et al. [1] considered spike counts within a single time bin. For spiking observations from the sth stimulus presentation with t = 1, …, T time bins, we assumed that the stimulus-independent gain is independent across time bins with shared variance , i.e.,

where gt,s,n Gamma(rs, ss), with shape parameter and scale parameter . Marginalizing over the gain variables yields a product of independent Negative Binomial distributions:

where . To ensure smoothness of the stimulus drive, we set , where denotes the firing rates estimated by the Poisson-GP model. For s = 1, …, S, we searched for the values of rs that minimize the negative log-likelihood of the spiking observations y1:T,s,1:N. Details of Equation 17 and the Fano factor for this model are provided in the SI Appendix.

Goris-const

This is another extension of the Goris model [1] to continuous time. Here, we assumed the gain is constant across time bins within a trial:

where gs,n Gamma(rs, ss) with shape parameter and scale parameter . Further, we assumed that the total spike count across all time bins in a trial, Ys,n = Σt yt,s,n, also shares the same gain variance , i.e, with . We found by fitting for Ys,n across training data n = 1, …, N as in [1], and set the stimulus encoding component , where represents the firing rates estimated by the Poisson-GP model. Details of the mean-variance relationship and log-likelihood derivations for this model are included in the SI appendix.

CMP-RBF

This model follows the same generative structure as the CMP model, except that the gain covariance is parameterized by the standard RBF kernel:

CMP-Matérn

This variant also follows the CMP framework, with the gain covariance parameterized by a Matérn kernel of order 0 in the CMP-Matérn-0 model:

and a Matérn kernel of order 1 in the CMP-Matérn-1 model:

We used the same variational inference procedure as the CMP model to infer the hyperparameters and latent variables of the CMP-RBF and CMP-Matérn models. The log-likelihood and mean-variance derivations for these models are analogous to those of the CMP model.

Datasets studied

In this paper, we applied the CMP model to two datasets recorded from anesthetized, paralyzed, adult macaque monkeys.

Multi-electrode array recordings from the primary visual cortex

Complete details about this dataset are available in [34]. In brief, extracellular recordings were obtained using a 10-by-10 fixed silicon microelectrode array (length 1mm, spaced 400 µm) implanted in the superficial layers of the primary visual cortex (area V1) of three adult male macaque monkeys. The animals were stimulated with drifting sinusoidal gratings at full contrast, with the orientation of the grating varied around the clock in steps of 5 degrees, yielding a total of 72 orientations. Each stimulus was presented for 1,280ms and followed by a blank screen of mean luminance for another 1,280ms, repeated for 50 trials per stimulus condition. We analyzed 654 isolated units from five datasets in this paper (147, 113, 113, 133, and 148 units in each dataset). For each unit and each stimulus, we randomly selected N = 45 trials for training (i.e., for hyperparameter and stimulus-drive inference) and used the remaining N = 5 trials to derive the test log-likelihood.

Single-electrode recordings along the visual hierarchy

Full details regarding these datasets are available in [1]. Briefly, extracellular recordings were made from 18 adult macaque monkeys of either sex, in different areas along the visual hierarchy: the lateral geniculate nucleus (LGN), V1, V2, and MT. The animals were presented with drifting sinusoidal gratings of the preferred size and speed, varying either in spatial frequency (12 spatial frequencies, ranging from 0 to 10 cycles/deg) or in drift direction (16 equally spaced directions). Each grating was presented for 1,000 ms and repeated at least five times. We analyzed the activity of 48, 376, 167, and 127 cells from each population, respectively. For each unit and each stimulus, we randomly selected N = 1 trial to derive the test log-likelihood and used the remaining trials for training the model.

Data availability

No new datasets were generated for this study. The analyses are based on previously published neural recordings that can be obtained from the original authors upon reasonable request. The modeling and analysis code for the Continuous Modulated Poisson (CMP) framework is available at https://github.com/Anuththara-Rupasinghe/CMP.

Acknowledgements

We are grateful to Arnulf Graf, Robbe Goris, Eero Simoncelli, and J. Anthony Movshon for providing the datasets used in this study. AR was supported by NIH T32 institutional training grant (T32MH065214). ASC was supported by NSF CAREER award 2340338. JWP was supported by grants from the Simons Collaboration on the Global Brain (SCGB AWD543027) the NIH BRAIN initiative (R01DA056404, and R01EB02694), the National Eye Institute (NEI) of the National Institutes of Health (NIH) (R01EY033064), and a U19 NIH-NINDS BRAIN Initiative Award (5U19NS123716).

Additional files

SI Appendix

Additional information

Funding

NIH (T32MH065214)

  • Anuththara Rupasinghe

National Science Foundation (NSF) (2340338)

  • Adam Charles

Simons Foundation (SF) (543027)

  • Jonathan W Pillow

HHS | NIH | BRAIN Initiative (The BRAIN Initiative) (R01DA056404)

  • Jonathan W Pillow

HHS | NIH | BRAIN Initiative (The BRAIN Initiative) (R01EB02694)

  • Jonathan W Pillow

HHS | NIH | National Eye Institute (NEI) (R01EY033064)

  • Jonathan W Pillow

HHS | NIH | BRAIN Initiative (The BRAIN Initiative) (5U19NS123716)

  • Jonathan W Pillow