Abstract
Neurons exhibit substantial trial-to-trial variability in response to repeated stimuli, posing a major challenge for understanding the information content of neural spike trains. In visual cortex, responses show greater-than-Poisson variability, whose origins and structure remain unclear. To address this puzzle, we introduce a continuous, doubly stochastic model of spike train variability that partitions neural responses into a smooth stimulus-driven component and a time-varying stochastic gain process. We applied this model to spike trains from four visual areas (LGN, V1, V2, and MT) and found that the gain process is well described by an exponentiated power law, with increasing amplitude and slower decay at higher levels of the visual hierarchy. The model also provides analytical expressions for the Fano factor of binned spike counts as a function of timescale, linking observed variability to underlying modulatory dynamics. Together, these results establish a principled framework for characterizing neural variability across cortical processing stages.
Introduction
Neural responses exhibit trial-to-trial variability, even in response to repeated presentations of the same stimulus [1–5]. Several hypotheses have been proposed for the source of this variability, including noise in synaptic transmission [6], fluctuations in balanced excitation and inhibition [7], noisy peripheral sensors [8], recurrent dynamics [9], neural adaptation [10] and synaptic plasticity [11]. To assess the impacts of variability on neural coding, researchers have developed a variety of statistical models that seek to capture both the stimulus dependence and stochasticity of neural spike responses [12–17]. The simplest such model for spike train data is the Poisson process [18, 19], which assumes that spike count variability in disjoint time intervals is independent. Under this model, the variance of the binned spike counts is equal to the mean for any choice of time bin. However, a large literature has shown that this relationship does not hold precisely in many brain areas. Rather, spike count variance commonly exceeds the mean, a phenomenon known as “overdispersion” [1, 7, 20–30].
To account for super-Poisson response variability in the visual pathway, Goris et al. [1] introduced the “modulated Poisson” model, which partitions neural variability into a stimulus-driven component and a modulatory gain component. Specifically, the model assumes that spike counts are Poisson distributed, with a mean determined by the product of two terms: a stimulus-related component that reflects the neuron’s stimulus tuning, and a stimulus-independent stochastic gain that varies randomly across trials (Figure 1A). Goris et al. [1] showed that this model provides an accurate account of spike count variability across the early visual pathway, with the degree of overdispersion linked to the strength of the modulatory component.

(A) Schematic of the Goris modulated Poisson model [1], which assumes that spike counts (y) are generated by a Poisson process whose rate is the product of two components: a stimulusdependent drive f (s) and a stimulus-independent gain g. The gain g is a scalar and is independently drawn across trials. (B, C) Two continuous-time extensions of the Goris model. (B) Constant gain model: assumes that the sampling interval of the gain equals the trial duration. Left: stimulus-induced rate, gain process, and instantaneous firing rate for three example trials. Middle: gain autocorrelation function. Right: Fano factor as a function of bin size (dashed line indicates the baseline Poisson model). (C) Independent gain model: assumes that the sampling interval of the gain is much shorter than the length of a single trial. Panels are organized as in (B). (D) Schematic of the proposed Continuous Modulated Poisson (CMP) model. Here, spike trains y(t) are generated by a Poisson process whose rate is the product of a time-varying stimulus drive fs(t) and a continuous-time stochastic gain process g(t). The gain process g(t) follows a Gaussian process (GP) with temporal correlations governed by an exponentiated power law kernel, and is independently drawn across trials. (E) Panels are organized as in (B), for the CMP model.
However, an important drawback of the modulated Poisson model is that it applies to binned spike counts but not to spike trains. It describes the variability of spike counts measured in discrete time bins, and thus cannot characterize variability on shorter timescales. Moreover, it does not account for how overdispersion depends on the size of the time bins used to count spikes.
To address this limitation, we begin by formulating a continuous-time version of the modulated Poisson model introduced by Goris et al. [1]. First, we replace the (scalar) stimulus component with a stimulus-dependent time-varying Poisson firing rate. Then, we specify a sample rate for the stochastic gain variable; we assume that the gain is constant within each discrete sample interval and is drawn independently for each interval. For example, we might assume that the sample interval for the modulatory gain is equal to the stimulus duration (Figure 1B). In this case, the gain is constant for each trial and independent across trials. This produces a gain “process” with a flat temporal auto-correlation, and a Fano factor (variance-to-mean ratio) that grows linearly with the size of the window used to count spikes (Figure 1B, right). Alternately, we could assume that the sample interval is much shorter than the length of a single trial (Figure 1C). In this case, the auto-correlation of the gain process is narrow, and the Fano factor curve remains flat for all bin sizes larger than the sample interval (Figure 1C, right).
These continuous interpretations of the Goris model rely on the assumption that the modulatory gain is constant during each sample interval and then jumps discontinuously between intervals, which seems unrealistic for biological systems. Moreover, such models have limited ability to capture complex relationships between time bin size and Fano factor, given that they can only produce curves that grow linearly and then saturate (Figure 1B,C).
To overcome these limitations, we introduce the continuous modulated Poisson (CMP) model, a flexible extension of the Goris model to continuous-time spike responses. The CMP model describes spike trains as arising from a Poisson process with a rate given by the product of two continuous-time processes: (1) a “stimulus-driven” component, which describes the firing rate fluctuations induced by the stimulus; and (2) a modulatory “gain” component, which captures stimulus-independent trial-to-trial variability using a stochastic process (Figure 1D). Both components are modeled as exponentiated Gaussian processes (GPs), providing a flexible framework for capturing smooth, continuous dynamics with tunable temporal structure [31–33].
To more precisely capture the temporal correlations in the modulatory gain process, we introduce a novel covariance function that allows these correlations to decay more slowly than those described by the standard radial basis (or “Gaussian”) kernel. We refer to this as the Exponentiated Power Law (EPL) kernel, which models autocorrelation decay according to an exponentiated power law with exponent q ≤ 2 (Figure 1E, middle). This added flexibility enables the model to more accurately capture nonlinear relationships between bin size and Fano factor (Figure 1E, right) compared to previous approaches. Moreover, the CMP framework provides closed-form expressions for how the Fano factor scales with time bin size, offering a direct way to probe how variability accumulates and transforms across different temporal resolutions.
We applied the CMP model to spike train recordings from four visual areas (LGN, V1, V2, and MT) [1, 34] and uncovered novel insights into variability across the visual hierarchy. In particular, we observed a systematic increase in the magnitude of gain fluctuations with progression along the visual hierarchy, consistent with prior observations [1]. Furthermore, comparisons of the inferred EPL kernels across visual areas revealed that gain autocorrelation exhibits slower temporal decay at higher levels of the visual hierarchy. A comparison of the variance of the stimulus drive with the variance of the modulatory gain process across neurons further revealed that these two components tend to be negatively correlated in all visual areas. These findings support existing hypotheses while providing new insight into variability structure and demonstrate that the CMP model outperforms existing methods in capturing time-resolved overdispersion.
In summary, this study makes five main contributions. (1) We extend classical variability partitioning approaches by introducing a continuous-time model that captures how neural variability evolves across temporal scales, overcoming the limitations of bin-based analyses. (2) We introduce the Exponentiated Power Law (EPL) kernel, which models slow, heavy-tailed correlations in modulatory gain and links them to distinct timescales of variability. (3) We derive analytic expressions for the Fano factor as a function of time bin size, enabling variability to be partitioned across temporal scales within a single unified model. (4) We develop a scalable inference framework that jointly estimates stimulus-driven and modulatory components directly from spike train data. (5) By applying CMP to large-scale recordings across multiple stages of the visual hierarchy, we reveal systematic trends in gain dynamics and stimulus modulation, providing new insights into how variability is structured and propagated across cortical processing stages. Altogether, CMP offers a principled and flexible framework for partitioning neural variability in continuous time across stimulus conditions and brain areas.
Results
Continuous modulated Poisson (CMP) model
Our proposed model describes spike trains using a doubly stochastic point process model [35], in which the firing rate of a Poisson process is itself a stochastic process. Specifically, we model a spike train y(t) elicited in response to a stimulus s using a Poisson process:

where the firing rate, or “conditional intensity”, λ(t), is a random variable defined as the product of two terms:

with fs(t) representing the stimulus-dependent component that captures stimulus-locked fluctuations in firing rate, and g(t) denoting a stochastic gain process that accounts for stimulus independent fluctuations in neuronal excitability (Figure 1D-E).
We assume that the stimulus component fs(t) varies independently across stimuli and evolves smoothly in time. To express this smoothness, we place a Gaussian process (GP) prior over the log firing rate:

where 

where 

The gain process g(t) is modeled analogously as an exponentiated Gaussian process:

where Kg is the covariance function that characterizes temporal correlations in the gain process. Unlike the stimulus component, the gain is a latent variable that varies independently across repeated presentations of the same stimulus, thereby inducing greater-than-Poisson variability in the spike counts. For a fixed stimulus s = s* presented N times, the model includes a shared stimulus component 
To more accurately model the stochastic structure of real neural spike trains, we introduce a novel covariance function for the gain process, referred to as the Exponentiated Power Law (EPL) kernel:

where ρg denotes the marginal variance, ℓg is the temporal length scale, and q is the power-law exponent controlling the rate of correlation decay over long timescales. We show in SI Appendix that this kernel defines a valid covariance function for 0 < q ≤ 2. Because the gain reflects stimulus-independent variability, its hyperparameters are shared across all stimulus conditions. This formulation provides a unified and parsimonious generative model that captures both smooth, stimulus-locked activity and structured trial-to-trial variability across stimuli.
Figure 2 illustrates example gain covariance functions under different settings of the hyperparameters ℓg (length scale) and q (power-law exponent). When q = 2 (right column), the EPL kernel reduces to the standard RBF covariance function defined in equation 4. For q < 2, the correlations exhibit heavier tails than in the RBF case, decaying more slowly at long time lags. In the limiting case of ℓg → ∞ or q → 0, the gain becomes effectively constant over each stimulus presentation interval [0, T], corresponding to the continuous-time Goris model shown in Figure 1B.

Illustration of different settings of the proposed Exponentiated Power Law (EPL) covariance function with marginal variance ρg = 0.1.
(A) EPL auto-covariance functions with different settings of length-scale (top row: ℓg = 0.02, bottom row: ℓg = 0.1) and power-law exponent (columns left to right: q = 0.5, q = 1, q = 2). Colored traces below each auto-covariance show three sample gain processes g(t), obtained by sampling from the corresponding GP and exponentiating. (B) Fano Factor curves showing how Fano Factor changes as a function of the bin size used to count spikes, for EPL covariances in the top and bottom rows in (A), respectively.
Partitioning variability with CMP
Under the CMP model, spike train variability can be partitioned into a Poisson component, which is equal to the mean, and a modulatory component, which controls the degree of overdispersion. The continuous-time formulation of the CMP model allows us to compute this partitioning for arbitrary time intervals, and thus to analytically characterize how overdispersion grows with time bin size. In the case of a non-fluctuating stimulus component 

where 

where

is the average of the exponentiated covariance function exp(Kg) over all pairs of times in the interval [0, Δ]. (Note: this formula holds for any covariance function, not just the EPL covariance function we use for Kg). From these two terms, the Fano factor as a function of bin size can be computed as the ratio of variance to mean:

This shows that the Fano Factor approaches 1 in the limit of small bin size Δ, low firing rate 
Model fitting
To fit the CMP model to data, we need to compute the probability over spike counts in finite-sized time bins, given that spike trains are generally measured at a finite sampling rate. Computing this probability involves integrating a conditionally Poisson spike count probability over all realizations of the gain process g(t):

where 
To validate our inference procedure, we applied it to a synthetic dataset generated from the CMP model, as shown in SI Appendix. Figure S1A illustrates the true and inferred stimulus-drive and gain processes for two trials of two different stimuli, while Figure S1B compares the true and inferred Gaussian process hyperparameters. These results demonstrate that the proposed inference framework accurately recovers ground-truth hyperparameters, tuning functions, and gain processes on a discretized time grid. Note that the model itself is defined in continuous time and can therefore be consistently applied to data measured at arbitrary temporal resolutions.
Continuous partitioning of variability in the macaque V1
We applied our method to partition response variability in five populations of neurons recorded from the superficial layers of the macaque primary visual cortex (data from [34]). These datasets were collected from three monkeys using drifting grating stimuli presented at S = 72 equally spaced orientations. Each grating was displayed for 1280 ms, followed by a 1280 ms blank period, resulting in a total trial duration of 2.56 s. Each grating orientation was repeated N = 50 times. In total, we analyzed spiking activity from 654 isolated units (147, 113, 113, 133, and 148 neurons across the five datasets).
Figure 3A shows the inferred time-domain activity of a representative neuron from dataset 1, using a temporal grid resolution of dt = 0.02 s, for three different drift orientations (00, 1050, and 2800). The rows (top to bottom) show the PSTH derived from the data, the estimated stimulus drive, and–for two different trials per orientation–the inferred gain processes, firing rates, and observed spike rasters. While the estimated stimulus drive closely matches the PSTH, the inferred gain processes capture trial-specific temporal fluctuations that are clearly visible in the spike rasters. Notably, when the neuron is strongly driven by the stimulus (middle column), the instantaneous per-trial firing rate is dominated by the stimulus drive. In contrast, when the neuron is weakly driven (left column), the stochastic gain process plays a larger role in shaping the per-trial firing rate. These observations indicate that the CMP model meaningfully partitions response variability across stimulus conditions. Figure 3B shows the estimated EPL gain covariance, corresponding to optimal hyperparameters ρg = 2.9, ℓg = 0.93, and q = 0.95.

Performance comparisons of different models on neural population data recorded from macaque primary visual cortex under 72 drifting sinusoidal gratings (data from Graf et al. [34]).
(A) Top to bottom: Tuning curve of a sample neuron; for three selected orientations (00,1050,2800), the PSTH, inferred stimulus drive, inferred gain (trial 1), inferred firing rate (trial 1), observed spike raster (trial 1), inferred gain (trial 2), inferred firing rate (trial 2), observed spike raster (trial 2), and Fano factor vs. bin size curves for real data (black), CMP model (red), Goris-independent gain model (dark blue dashed), Goris-constant gain model (light blue), and Poisson-GP model (gray). (B) EPL gain covariance (top) and autocorrelation (bottom) under the CMP model for the neuron in panel A. (C) Average test log-likelihood improvement of each model relative to the baseline Poisson model, computed on held-out data across all 654 V1 neurons and 72 stimulus orientations.
The continuous-time formulation of the CMP model allows variability to be partitioned at arbitrary temporal resolutions, enabling analytical characterization of the relationship between overdispersion and bin size (see Methods and Materials for details). The bottom row of Figure 3A shows Fano factor curves derived from the CMP model (red), compared with curves computed from the data (black), the Goris model with independent gain (dark blue dashed), the Goris model with constant gain (light blue), and the Poisson-GP model (gray). The two variants of the Goris model are continuous-time extensions of the original modulated Poisson model [1]: the independent gain model assumes that the sampling interval of the gain process is much shorter than the trial duration, whereas the constant gain model assumes that the gain remains fixed throughout the entire trial. The Poisson-GP model corresponds to the CMP model without a gain component, meaning it includes only a stimulus drive (see Methods and Materials for details of all model variants). These results show that the CMP model closely matches the overdispersion observed in real data across orientations and bin sizes, outperforming all other methods. While both Goris model variants produce overdispersion [1], the independent gain model yields Fano factor curves that are largely flat, and the constant gain model produces curves that increase linearly with bin size, neither of which captures the true empirical relationship.
We next compared model performance using the average log-likelihood improvement relative to the baseline Poisson model on held-out data from all 654 V1 neurons across all 72 orientations (Figure 3C). The CMP model achieved the highest predictive performance across all neurons. In Figure S2, we further compared the CMP model to three variants with different gain kernels: CMP-RBF (where the EPL kernel in equation 6 is replaced by the standard RBF kernel [31]), CMP-Matérn 0, and CMP-Matérn 1 (with Matérn kernels of order 0 and 1, respectively). These results demonstrate that the EPL kernel provides the best fit to the gain structure in the data.
To evaluate the utility of sharing gain hyperparameters across stimuli, we compared the CMP model with a variant in which the gain hyperparameters {ρg, ℓg, q} are fit independently for each of the s = 1, …, S stimulus conditions, denoted by 
Finally, we assessed the consistency of CMP inference with respect to temporal discretization by comparing inference results across three sufficiently fine grid resolutions: dt = 0.02 s, dt = 0.05 s, and dt = 0.1 s. As shown in Figure S4A, the estimated gain hyperparameters remain stable across these resolutions, indicating that the inference procedure is robust to moderate changes in grid size once the temporal discretization is fine enough to approximate the underlying continuous-time model. Figure S4B shows corresponding test log-likelihoods across models and grid resolutions, confirming that CMP consistently outperforms alternative models over this range.
Continuous partitioning of variability along the visual hierarchy
Next, we evaluated model performance across different areas of the visual hierarchy by applying each method to neural responses recorded in the lateral geniculate nucleus (LGN), V1, V2, and MT (data from Goris et al. [1]). The recordings were obtained using drifting sinusoidal gratings of the preferred size and speed, varying either in S = 12 spatial frequency (ranging from 0 to 10 cycles/deg) or in S = 16 equally spaced drift direction. Each grating was presented for one second and repeated at least N = 5 times. We analyzed spiking activity from 48, 376, 167, and 127 neurons from LGN, V1, V2, and MT, respectively, using a grid resolution of dt = 0.02 s.
Figure 4A shows the average log-likelihood improvement on held-out data, relative to the baseline Poisson model, for each method across all four areas. The CMP model consistently outperformed all other models across the visual hierarchy. To examine performance at the single-neuron level, Figure 4B presents scatter plots comparing the log-likelihoods of the CMP model with those of the Poisson-GP model, the Goris independent gain model, and the Goris constant gain model. These results show that the CMP model systematically achieves better fits across individual neurons in all regions.

Performance comparisons of different models on neuronal population data recorded from different areas along the visual hierarchy—the lateral geniculate nucleus (LGN), V1, V2 and MT—under drifting sinusoidal gratings of preferred size and speed, varying in spatial frequency (12 frequencies from 0 to 10 cycles/deg) or in drift direction (16 directions) (data from Goris et al. [1]).
(A) Log-likelihood improvement of each model relative to the baseline Poisson model, averaged across neurons in each population. Rows from top to bottom: LGN, V1, V2, MT. The number of neurons per area is indicated in parentheses. (B) Log-likelihood comparison of the CMP model versus the Goris independent gain model (left), Goris constant gain model (middle), and Poisson-GP model (right) for individual neurons. Colors indicate cortical areas: LGN (yellow), V1 (green), V2 (blue), MT (purple). (C) Fano factor vs. bin size curves averaged across each population. Black: real data; red: CMP; dark blue dashed: Gorisindependent gain; light blue: Goris-constant gain; gray: Poisson-GP.
Next, we characterized trial-to-trial variability by analyzing the mean–variance relationship as a function of bin size. Figure 4C shows Fano factor curves, averaged over neurons in each population, for real data and the various models. Notably, the CMP model closely follows the overdispersion patterns observed in real data across visual areas and timescales, again outperforming alternative methods. In Figure S5, we extend this analysis by including three additional CMP variants: CMP-RBF, CMP-Matérn 0, and CMP-Matérn 1. These comparisons further confirm that the standard CMP model provides superior performance across all visual areas.
Finally, to examine the importance of including a GP prior on the stimulus drive, we compared the performance of the Goris model with and without this component. (Note that the original modulated Poisson model in Goris et al. [1] does not include a prior on the stimulus drive; thus, the version without the prior is more directly comparable to their formulation.) Figure S6 presents the log-likelihood comparisons from these analyses, which clearly show that incorporating a smoothness prior on the stimulus drive generally improves the model’s ability to fit held-out data. It is also noteworthy that even the Poisson-GP model, which lacks a stochastic gain component, outperforms both variants of the Goris model when the GP prior on the stimulus drive is removed (Figure S6A–C). These results highlight that the smoothness prior we place on the stimulus drive is also crucial for accurately capturing neural variability (Figure S6D).
Variations in gain and stimulus-drive along the visual hierarchy
Finally, we compared the distributions of the inferred CMP hyperparameters across the visual areas LGN, V1, V2, and MT, as shown in Figure 5. Panels A, B, and C show the distributions of the inferred gain power-law exponent q, gain length scale ℓg, and gain variance ρg, respectively, for each neuronal population. Figure 5D presents the median gain covariance and its corresponding autocorrelation function for each area.

Comparison of inferred CMP hyperparameters across visual areas: the lateral geniculate nucleus (LGN), V1, V2, and MT (data from [1]).
Rows top to bottom: LGN (yellow), V1 (green), V2 (blue), and MT (purple). The number of neurons per area is indicated in parentheses. (A) Distribution of gain power-law exponent q across neurons. Triangles indicate the median. (B) Distribution of gain length scale ℓg across neurons. (C) Distribution of gain variance ρg across neurons. (D) Median gain covariance 


The gain power-law exponent (Figure 5A) increases along the visual hierarchy: from LGN to V1 (p < 0.001, sign test) and from V1 to V2 (p < 0.001), although there is a slight decrease from V2 to MT (p < 0.05). The gain length scale (Figure 5B) also shows a progressive increase: from LGN to V1 (p < 0.05), from V1 to V2 (p < 0.05), and from V2 to MT (p < 0.001). Additionally, the median gain autocorrelation (Figure 5D) exhibits slower temporal decay and a broader central lobe in higher visual areas. Together, these results suggest that rapid temporal fluctuations in trial-to-trial variability diminish as information propagates along the visual hierarchy.
In contrast, the gain variance ρg (Figure 5C) increases systematically along the hierarchy: from LGN to V1 (p < 0.001), and from V1 to V2 (p < 0.001). This trend indicates that the magnitude of gain fluctuations becomes larger at later stages of the visual pathway, consistent with prior observations [1].
Next, we examined the distribution of stimulus-drive variance, averaged across stimuli, defined as: 

To further investigate the relationship between stimulus-drive and gain variability, Figure 5F shows scatter plots of 
Discussion
Advances in high-density electrophysiological recording techniques [40, 41] now allow large-scale measurements of neuronal activity across multiple brain areas and experimental conditions. Analyses of sensory responses to repeated stimuli consistently reveal substantial trial-to-trial variability in neural spiking, arising from diverse sources such as synaptic transmission noise, fluctuations in excitation–inhibition balance, noisy peripheral inputs, recurrent dynamics, adaptation, and synaptic plasticity [6–11]. Capturing this variability is essential for understanding neural coding [12–14]. Traditional Poisson spiking models [18, 19] provide a foundational framework but predict spike count variance equal to the mean. This assumption is often violated in many brain regions where neurons exhibit overdispersion, meaning variability that exceeds the Poisson prediction [1]. Although numerous models have been proposed to account for this [23–29, 32, 33, 42, 43], few provide a detailed, time-resolved characterization of trial-to-trial variability.
Here, we introduced the Continuous Modulated Poisson (CMP) model, a principled probabilistic framework for modeling spiking variability in continuous time. The CMP model builds upon the modulated Poisson model [1] by describing the spike rate at each time point as the product of two independent processes: a stimulus-driven component and a stochastic gain process. Both components are modeled as exponentiated Gaussian processes (GPs), enabling flexible characterization of their smooth temporal dynamics. This model belongs to the class of log-Gaussian Cox processes [35], as it constitutes a doubly stochastic Poisson process. We allow the stimulus-driven components to have condition-specific hyperparameters, while the gain hyperparameters are shared across conditions, supporting efficient and interpretable joint inference.
To capture the long-range correlations observed in gain fluctuations, we introduced the Exponentiated Power Law (EPL) kernel, a new covariance function that generalizes the commonly used radial basis function (RBF) kernel. In our application, the EPL kernel consistently outperformed standard alternatives such as the RBF and Matérn kernels. In addition, we developed a scalable variational inference method that jointly estimates the latent stimulus-driven and stochastic gain components, along with their hyperparameters, directly from spiking data. We validated this approach using synthetic experiments. When applied to recordings from macaque V1 [34], the CMP model accurately partitioned response variability across time, trials, and stimulus conditions. It outperformed alternative approaches, including continuous-time extensions of the modulated Poisson model, in predicting held-out spiking activity and capturing the empirical Fano factor curves over time. Applying the CMP model to data from multiple stages of the visual hierarchy (LGN, V1, V2, and MT) [1], we again observed superior performance across areas and consistent predictions of overdispersion.
Analysis of the inferred CMP hyperparameters across visual areas revealed systematic trends. The temporal smoothness of gain fluctuations increased along the hierarchy, suggesting that trial-to-trial variability becomes more slowly varying in higher-order areas. In contrast, gain variance increased, consistent with an accumulation of shared noise or contextual modulation. Meanwhile, the variance of the stimulus-driven component decreased, consistent with increased selectivity and stabilization of tuning in downstream regions. A consistent negative correlation between the magnitudes of gain variance and stimulus-drive variance was observed across all areas, indicating that neurons with strong stimulus encoding tend to exhibit lower trial-to-trial variability, and vice versa. This observation aligns with previous findings of an anti-correlation between signal strength and variability in sensory neurons [33, 37–39]. These results have broader implications for understanding sensory coding, revealing how the brain balances fidelity and flexibility by modulating trial-to-trial variability across space and time [44]. They also align with broader theories in systems neuroscience, such as efficient coding [45] and probabilistic inference [46], where internal variability may be structured rather than random.
Our method complements black-box models by providing interpretable estimates of both stimulus driven and modulatory influences on spiking variability. It is particularly well-suited for experimental settings with repeated stimuli and temporally structured variability, such as studies of sensory tuning or motor control [1, 34, 47]. The CMP model extends prior work by enabling continuous-time analysis of spike trains, jointly modeling stimulus-driven and stimulus-independent components across conditions, and allowing full inference over both latent processes and their hyperparameters. While we found that the EPL kernel performed best in our experiments, kernel selection may depend on the task-specific temporal structure of neural variability.
More broadly, the EPL kernel introduced here is generally applicable and may support future Gaussian process–based modeling efforts beyond neuroscience. While the EPL kernel provides a flexible and tractable way to capture a wide range of temporal correlation structures in the stochastic gain process, its generative interpretation depends on the parameter q. For q = 1, a straightforward stochastic generator is a simple low-pass filtered Gaussian white noise process, equivalent to an Ornstein–Uhlenbeck (OU) process [48, 49]. For q < 1 there is no known closed-form stochastic differential equation that exactly produces the EPL covariance, but such long-memory dynamics can be closely approximated by superpositions of OU processes with a spectrum of time constants or by fractional stochastic processes such as fractional Brownian motion [50]. Conversely, when q > 1, correlations decay more rapidly than exponentially, corresponding to short-memory dynamics with diminished low-frequency power; these can often be approximated by higher-order linear filters or by driving OU processes with temporally correlated noise. Together, these interpretations highlight how the parameter q shapes the temporal structure of trial-to-trial variability.
An important direction for future work is to move beyond descriptive modeling and explore the biological interpretability of inferred hyperparameters. For instance, differences in q or ℓg across the visual hierarchy could reflect distinct circuit or cellular mechanisms, such as changes in synaptic integration timescales, recurrent dynamics, or bursting behavior. This perspective could help explain phenomena such as the disproportionately slow gain fluctuations we observe in the LGN (e.g., q ≈ 0.5), potentially linking them to specific biophysical or network-level processes. Establishing such connections would deepen our understanding of how structured variability arises and how it shapes sensory computation.
Mechanistic interpretations of structured variability may also benefit from parallels with other biological systems. For example, variability in bacterial flagellar motor switching and single-cell gene expression can be explained by a slow extrinsic process modulating a faster Poisson process, producing heavytailed dwell times or variance–mean scaling beyond Poisson expectations [51, 52]. This two-timescale structure closely parallels the CMP framework and suggests that future work incorporating explicit modulatory dynamics could offer deeper mechanistic insight into neural variability and its functional significance.
Beyond these conceptual extensions, there are also several methodological directions for future research. First, extensions to high-dimensional population activity [33] may allow joint modeling of shared variability across neurons. Second, adapting the model to alternative link functions [26] could broaden its applicability to different statistical structures. Third, developing inference algorithms compatible with calcium imaging data [39] would extend its utility to optical recordings. Fourth, integrating behavioral covariates [53] or state variables such as attention [54] or arousal [55] could improve inference of non-stationary trial structure. Finally, while our focus here was on neural spike trains, the general formulation of the CMP model could be applied to other domains involving overdispersed count data [56–58].
To facilitate reproducibility and dissemination, we have released our implementation of the CMP model on GitHub [59]. In summary, the CMP framework provides a scalable, flexible, and interpretable approach for modeling neuronal spike variability, revealing new insights into the structure of trial-to-trial fluctuations and their organization across the visual hierarchy, while opening the door to deeper mechanistic interpretations of variability in neural systems.
CMP model inference
We start by recalling the general CMP forward model:

In our inference framework, we consider a setting with multiple repeats (n = 1, …, N) for a given stimulus condition, with data potentially collected across different stimulus conditions (s = 1, …, S). We perform inference on a uniform grid of time points (t = 1, …, T), with grid size set to dt seconds. Note that the model itself is defined in continuous time and can therefore be consistently applied to data measured at arbitrary temporal grid resolutions.
Let yt,s,n denote the spike count for the tth time bin, sth stimulus condition, and nth repeat. Further, let fs = [f1,s … fT,s]⊤ and gs,n = [g1,s,n … gT,s,n]⊤ denote the corresponding stimulus-driven and stochastic gain processes. Further, let 




Frequency domain transformation of stimulus-driven components
The radial basis function (RBF) kernel diagonalizes in the Fourier domain, a property previously exploited to improve computational efficiency [33, 60]. Motivated by this, we adopt a frequency-domain representation of the stimulus-driven components (f) in our inference procedure. Accordingly, we relate the time-domain components to their corresponding Fourier-domain components:

where 



where 

Variational Inference
To jointly infer the latent variables w and g and hyperparameters θ, we introduce an efficient inference procedure based on Variational Inference (VI) [36], which aims to find an approximate distribution q (w, g) for the intractable true posterior p (w, g|y, θ). In VI, the approximate posterior is first introduced into the data log-likelihood by rewriting it as [32]:

where DKL(q||p) is the Kullback-Leibler divergence between the variational approximation and the true posterior. Since DKL(q||p) is non-negative, ℒ(q) provides a lower bound on the model evidence, and is therefore called the Evidence Lower Bound (ELBO). Maximizing the ELBO is equivalent to minimizing DKL(q||p), thus finding the variational approximation closest to the true posterior.
Variational distributions and the ELBO
We assume that the variational distributions q (w, g) factorizes into independent Gaussian distributions over stimuli s = 1, …, S and trials n = 1, …, N:

where 


Accordingly, the ELBO can be derived as:

Joint inference of variational parameters and hyperparameters
In this section, we outline our proposed iterative scheme for the joint inference of variational parameters and hyperparameters, as illustrated in Algorithm 1. We alternate between optimizing the stimulus-drive hyperparameters 

Inferring variational parameters given hyperparameters
Here, we outline the procedure for inferring the variational parameters by optimizing the ELBO for a given set of hyperparameters θ (i.e., assuming Kg and 
First, consider the estimation of the stimulus-drive variational parameters νs and Ωs, given the estimated gain variational parameters from the previous iteration. The gradient and Hessian of Equation 12 with respect to νs can be derived as:

Algorithm 1
Joint inference of variational parameters and hyperparameters

Algorithm 2
Estimation of variational parameters given hyperparameters


where 




and setting this derivative to zero results in:

Accordingly, we update the stimulus-drive variational mean νs using Newton–Raphson optimization with the gradient and Hessian in Equation 13, and subsequently update the stimulus-drive variational covariance Ωs using the Hessian as in Equation 14.
Next, we update the gain variational parameters independently for n = 1, …, N using the updated stimulus-drive variational parameters, following an equivalent Newton–Raphson procedure, with the gradient and Hessian relative to µs,n given by:

and the update rule of Σs,n given by:

Following the above derivations, we iterate between updating the stimulus-drive variational parameters and gain variational parameters until convergence. The combined iterative procedure for variational parameter inference is summarized in Algorithm 2.
Log-likelihood derivation of the CMP model
While we optimize the ELBO for parameter inference, accurately and fairly comparing models requires computing the log-likelihood on held-out data. To estimate this for the CMP model, we used the importance sampling procedure described below. Note that,


Observing that the integral in Equation 16 is intractable, we used importance sampling [62] to approximate it. We employed the variational distribution 


where 

and applying the log-sum-exp trick for numerical stability, we obtained:

where 
Partitioning variability with CMP
Here we provide the general formula for partitioning variability into a Poisson component, equal to the mean, and a modulatory component, based on the CMP model in the setting where the stimulus drive fs(t) is a time-varying signal. The continuous-time formulation of the CMP model allows this partitioning to be computed for arbitrary time windows, enabling an analytic characterization of how overdispersion grows with the bin size used to count spikes. For a time bin of size Δ, the mean is given by:

and the variance is:

From these expressions, we can compute the Fano factor as a function of bin size, which is simply the ratio of variance to mean:

The details of the derivation are provided in the SI Appendix. To summarize statistics at a given bin size Δ over the entire observation window T, we use the following approach: we apply a sliding window of duration Δ, shift it systematically through the entire duration T, and average the computed statistics across all such windows.
Alternative models considered for performance comparison
We compared the performance of the CMP model with the alternative models described below.
Baseline Poisson
All log-likelihood figures reported in this paper are expressed as log-likelihood gains of each method relative to the baseline Poisson model. The baseline Poisson model assumed spike counts across trials are generated by a Poisson process:

and, to compensate for sparsity, we further assumed a Gamma prior on the firing rate λt,s ∼ Gamma (α, β), where α and β are hyperparameters. Since the Gamma distribution is conjugate to the Poisson distribution [63], the posterior distribution is also Gamma-distributed:

Thus, we found the MAP estimate of the firing rate by the mode of the posterior Gamma distribution, given by:

To fix the hyperparameters, we performed moment matching with the mean 

Poisson-GP
This model includes only the stimulus-driven components, assuming that trial-to-trial variability is fully described by the stochastic nature of the Poisson process:

The stimulus drive itself is modeled by a Gaussian process, similar to the CMP model:

We inferred the hyperparameters and latent stimulus-driven components of this model using a simplified version of our proposed variational inference procedure. As in the baseline Poisson model, the mean and variance conditioned on the stimulus are identical, yielding a Fano factor of 1.
Goris-indep
This model extends the modulated Poisson framework of Goris et al. [1] to continuous time. The model in Goris et al. [1] considered spike counts within a single time bin. For spiking observations from the sth stimulus presentation with t = 1, …, T time bins, we assumed that the stimulus-independent gain is independent across time bins with shared variance 

where gt,s,n ∼ Gamma(rs, ss), with shape parameter 


where 


Goris-const
This is another extension of the Goris model [1] to continuous time. Here, we assumed the gain is constant across time bins within a trial:

where gs,n ∼ Gamma(rs, ss) with shape parameter 







CMP-RBF
This model follows the same generative structure as the CMP model, except that the gain covariance is parameterized by the standard RBF kernel:

CMP-Matérn
This variant also follows the CMP framework, with the gain covariance parameterized by a Matérn kernel of order 0 in the CMP-Matérn-0 model:

and a Matérn kernel of order 1 in the CMP-Matérn-1 model:

We used the same variational inference procedure as the CMP model to infer the hyperparameters and latent variables of the CMP-RBF and CMP-Matérn models. The log-likelihood and mean-variance derivations for these models are analogous to those of the CMP model.
Datasets studied
In this paper, we applied the CMP model to two datasets recorded from anesthetized, paralyzed, adult macaque monkeys.
Multi-electrode array recordings from the primary visual cortex
Complete details about this dataset are available in [34]. In brief, extracellular recordings were obtained using a 10-by-10 fixed silicon microelectrode array (length 1mm, spaced 400 µm) implanted in the superficial layers of the primary visual cortex (area V1) of three adult male macaque monkeys. The animals were stimulated with drifting sinusoidal gratings at full contrast, with the orientation of the grating varied around the clock in steps of 5 degrees, yielding a total of 72 orientations. Each stimulus was presented for 1,280ms and followed by a blank screen of mean luminance for another 1,280ms, repeated for 50 trials per stimulus condition. We analyzed 654 isolated units from five datasets in this paper (147, 113, 113, 133, and 148 units in each dataset). For each unit and each stimulus, we randomly selected N = 45 trials for training (i.e., for hyperparameter and stimulus-drive inference) and used the remaining N = 5 trials to derive the test log-likelihood.
Single-electrode recordings along the visual hierarchy
Full details regarding these datasets are available in [1]. Briefly, extracellular recordings were made from 18 adult macaque monkeys of either sex, in different areas along the visual hierarchy: the lateral geniculate nucleus (LGN), V1, V2, and MT. The animals were presented with drifting sinusoidal gratings of the preferred size and speed, varying either in spatial frequency (12 spatial frequencies, ranging from 0 to 10 cycles/deg) or in drift direction (16 equally spaced directions). Each grating was presented for 1,000 ms and repeated at least five times. We analyzed the activity of 48, 376, 167, and 127 cells from each population, respectively. For each unit and each stimulus, we randomly selected N = 1 trial to derive the test log-likelihood and used the remaining trials for training the model.
Data availability
No new datasets were generated for this study. The analyses are based on previously published neural recordings that can be obtained from the original authors upon reasonable request. The modeling and analysis code for the Continuous Modulated Poisson (CMP) framework is available at https://github.com/Anuththara-Rupasinghe/CMP.
Acknowledgements
We are grateful to Arnulf Graf, Robbe Goris, Eero Simoncelli, and J. Anthony Movshon for providing the datasets used in this study. AR was supported by NIH T32 institutional training grant (T32MH065214). ASC was supported by NSF CAREER award 2340338. JWP was supported by grants from the Simons Collaboration on the Global Brain (SCGB AWD543027) the NIH BRAIN initiative (R01DA056404, and R01EB02694), the National Eye Institute (NEI) of the National Institutes of Health (NIH) (R01EY033064), and a U19 NIH-NINDS BRAIN Initiative Award (5U19NS123716).
Additional files
Additional information
Funding
NIH (T32MH065214)
Anuththara Rupasinghe
National Science Foundation (NSF) (2340338)
Adam Charles
Simons Foundation (SF) (543027)
Jonathan W Pillow
HHS | NIH | BRAIN Initiative (The BRAIN Initiative) (R01DA056404)
Jonathan W Pillow
HHS | NIH | BRAIN Initiative (The BRAIN Initiative) (R01EB02694)
Jonathan W Pillow
HHS | NIH | National Eye Institute (NEI) (R01EY033064)
Jonathan W Pillow
HHS | NIH | BRAIN Initiative (The BRAIN Initiative) (5U19NS123716)
Jonathan W Pillow
References
- [1]Partitioning neuronal variabilityNature Neuro-science 17:858–865Google Scholar
- [2]Neuronal variability: non-stationary responses to identical visual stimuliBrain Research 79:405–418https://doi.org/10.1016/0006-8993(74)90438-7Google Scholar
- [3]Orientation specificity and response variability of cells in the striate cortexVision Research 13:1771–1779https://doi.org/10.1016/0042-6989(73)90094-1Google Scholar
- [4]Amplification of trial-to-trial response variability by neurons in visual cortexPLoS biology 2:e264Google Scholar
- [5]Neural variability: friend or foe?Trends in Cognitive Sciences 19:322–328https://doi.org/10.1016/j.tics.2015.04.005Google Scholar
- [6]An evaluation of causes for unreliability of synaptic transmissionProceedings of the National Academy of Sciences 91:10380–10383https://doi.org/10.1073/pnas.91.22.10380Google Scholar
- [7]The variable discharge of cortical neurons: Implications for connectivity, computation, and information codingJournal of Neuroscience 18:3870–3896https://doi.org/10.1523/JNEUROSCI.18-10-03870.1998Google Scholar
- [8]The photovoltage of macaque cone photoreceptors: Adaptation, noise, and kineticsJournal of Neuroscience 19:1203–1216https://doi.org/10.1523/JNEUROSCI.19-04-01203.1999Google Scholar
- [9]Chaos in neuronal networks with balanced excitatory and inhibitory activityScience 274:1724–1726https://doi.org/10.1126/science.274.5293.1724Google Scholar
- [10]Visual adaptation: Neural, psychological and computational aspectsVision Research 47:3125–3131https://doi.org/10.1016/j.visres.2007.08.023Google Scholar
- [11]Synaptic mechanisms for plasticity in neocortexAnnual Review of Neuro-science 32:33–55https://doi.org/10.1146/annurev.neuro.051508.135516PubMedGoogle Scholar
- [12]Neuronal spike trains and stochastic point processes: I. the single spike trainBiophysical Journal 7:391–418https://doi.org/10.1016/S0006-3495(67)86596-2Google Scholar
- [13]A point process framework for relating neural spiking activity to spiking history, neural ensemble, and extrinsic covariate effectsJournal of Neurophysiology 93:1074–1089https://doi.org/10.1152/jn.00697.2004PubMedGoogle Scholar
- [14]Statistical models for neural encoding, decoding, and optimal stimulus designIn:
- Cisek Paul
- Drew Trevor
- Kalaska John F.
- [15]Time-rescaling methods for the estimation and assessment of non-poisson neural encoding modelsIn: Advances in Neural Information Processing Systems Google Scholar
- [16]Statistical models of spike trainsStochastic methods in neuroscience 24:278–303Google Scholar
- [17]Capturing the dynamical repertoire of single neurons with generalized linear modelsNeural Computation 29:3260–3289https://doi.org/10.1162/neco_a_01021Google Scholar
- [18]Introduction to Theoretical Neurobiology. Cambridge Studies in Mathematical BiologyCambridge University Press Google Scholar
- [19]Spikes: exploring the neural codeMIT Press Google Scholar
- [20]Comparison of receptive field properties of neurons in area 17 of normal and bilaterally amblyopic catsExperimental Brain Research 99:399–410https://doi.org/10.1007/BF00228976Google Scholar
- [21]Responses of neurons in primary and inferior temporal visual cortices to natural scenesProceedings of the Royal Society of London. Series B: Biological Sciences 264:1775–1783https://doi.org/10.1098/rspb.1997.0246Google Scholar
- [22]Stochastic model of the overdispersion in the place cell dischargeBiosystems 58:27–32Google Scholar
- [23]Fully bayesian inference for neural models with negative-binomial spikingIn: Advances in Neural Information Processing Systems Google Scholar
- [24]High-dimensional neural spike train analysis with generalized count linear dynamical systemsIn: Advances in Neural Information Processing Systems Google Scholar
- [25]Flexible models for spike count data with both over- and underdispersionJournal of Computational Neuroscience 41:29–43https://doi.org/10.1007/s10827-016-0603-yGoogle Scholar
- [26]Dethroning the Fano Factor: A Flexible, Model-Based Approach to Partitioning Neural VariabilityNeural Computation 30:1012–1045https://doi.org/10.1162/neco_a_01062Google Scholar
- [27]Construction and analysis of non-poisson stimulus-response models of neural spiking activityJournal of Neuroscience Methods 105:25–37Google Scholar
- [28]An efficient and flexible spike train model via empirical bayesIEEE Transactions on Signal Processing 69:3236–3251https://doi.org/10.1109/TSP.2021.3076885Google Scholar
- [29]A doubly stochastic renewal framework for partitioning spiking variabilitybioRxiv Google Scholar
- [30]Additive continuous-time joint partitioning of neural variabilityIn: 2018 Conference on Cognitive Computational Neuroscience Google Scholar
- [31]Gaussian processes for machine learningMIT Press Google Scholar
- [32]Variational Latent Gaussian Process for Recovering Single-Trial Dynamics from Population Spike TrainsNeural Computation 29:1293–1316https://doi.org/10.1162/NECO_a_00953Google Scholar
- [33]Identifying signal and noise structure in neural population activity with gaussian process factor modelsIn: Advances in Neural Information Processing Systems Google Scholar
- [34]Decoding the activity of neuronal populations in macaque primary visual cortexNature Neuroscience 14:239–245https://doi.org/10.1038/nn.2733Google Scholar
- [35]Some statistical methods connected with series of eventsJournal of the Royal Statistical Society: Series B (Methodological) 17:129–157https://doi.org/10.1111/j.2517-6161.1955.tb00188.xGoogle Scholar
- [36]An introduction to variational methods for graphical modelsMachine Learning 37:183–233https://doi.org/10.1023/A:1007665907178Google Scholar
- [37]Fundamental bounds on the fidelity of sensory cortical codingNature 580:100–105https://doi.org/10.1038/s41586-020-2130-2Google Scholar
- [38]Informationlimiting correlations in large neural populationsJournal of Neuroscience 40:1668–1678https://doi.org/10.1523/JNEUROSCI.2072-19.2019Google Scholar
- [39]Direct extraction of signal and noise correlations from two-photon calcium imaging of ensemble neuronal activityeLife 10:e68046https://doi.org/10.7554/eLife.68046Google Scholar
- [40]Fully integrated silicon probes for high-density recording of neural activityNature 551:232–236https://doi.org/10.1038/nature24636Google Scholar
- [41]Multiplexed, high density electrophysiology with nanofabricated neural probesPLOS One 6:1–11https://doi.org/10.1371/journal.pone.0026204Google Scholar
- [42]A universal probabilistic spike count model reveals ongoing modulation of neural variabilityIn: Advances in Neural Information Processing Systems Google Scholar
- [43]Modelling the neural code in large populations of correlated neuronseLife 10:e64615https://doi.org/10.7554/eLife.64615Google Scholar
- [44]Noise in the nervous systemNature Reviews Neuroscience 9:292–303https://doi.org/10.1038/nrn2258Google Scholar
- [45]Natural image statistics and neural representationAnnual review of neuroscience 24:1193–1216Google Scholar
- [46]Neural variability and sampling-based probabilistic representations in the visual cortexNeuron 92:530–543https://doi.org/10.1016/j.neuron.2016.09.038Google Scholar
- [47]Neural variability in premotor cortex provides a signature of motor preparationJ Neurosci 26:3697–3712Google Scholar
- [48]Handbook of stochastic methods for physics, chemistry and the natural sciences, volume 13 of Springer Series in SynergeticsSpringer-Verlag, Berlin Google Scholar
- [49]On the theory of the brownian motionPhys. Rev 36:823–841https://doi.org/10.1103/PhysRev.36.823Google Scholar
- [50]Fractional brownian motions, fractional noises and applicationsSIAM Review 10:422–437http://www.jstor.org/stable/2027184Google Scholar
- [51]How white noise generates power-law switching in bacterial flagellar motorsPhys. Rev. Lett 94:208101https://doi.org/10.1103/PhysRevLett.94.208101Google Scholar
- [52]Protein concentration fluctuations in the high expression regime: Taylor’s law and its mechanistic originPhys. Rev. X 12:011051https://doi.org/10.1103/PhysRevX.12.011051Google Scholar
- [53]Spontaneous behaviors drive multidimensional, brainwide activityScience 364:eaav7893https://doi.org/10.1126/science.aav7893Google Scholar
- [54]Attention improves performance primarily by reducing interneuronal correlationsNature Neuroscience 12:1594–1600https://doi.org/10.1038/nn.2439Google Scholar
- [55]Waking state: Rapid variations modulate neural and behavioral responsesNeuron 87:1143–1161https://doi.org/10.1016/j.neuron.2015.09.012Google Scholar
- [56]Negative Binomial RegressionCambridge University Press Google Scholar
- [57]Regression Analysis of Count Data. Econometric Society MonographsCambridge University Press Google Scholar
- [58]Modelling Frequency and Count DataOxford University Press https://doi.org/10.1093/oso/9780198523314.001.0001Google Scholar
- [59]Continous partitioning of neuronal variability: Matlab codesGithub https://github.com/Anuththara-Rupasinghe/CMP
- [60]Scalable bayesian inference for high-dimensional neural receptive fieldsbioRxiv https://doi.org/10.1101/212217Google Scholar
- [61]The multivariate normal distributionSpringer Science & Business Media Google Scholar
- [62]Importance sampling: a reviewWIREs Computational Statistics 2:54–60https://doi.org/10.1002/wics.56Google Scholar
- [63]Conjugate Priors for Exponential FamiliesThe Annals of Statistics 7:269–281https://doi.org/10.1214/aos/1176344611Google Scholar
Article and author information
Author information
Version history
- Preprint posted:
- Sent for peer review:
- Reviewed Preprint version 1:
Cite all versions
You can cite all versions using the DOI https://doi.org/10.7554/eLife.109719. This DOI represents all versions, and will always resolve to the latest one.
Copyright
© 2026, Rupasinghe et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics
- views
- 0
- downloads
- 0
- citations
- 0
Views, downloads and citations are aggregated across all versions of this paper published by eLife.