A dynamic generative model can extract interpretable oscillatory components from multichannel neurophysiological recordings
Abstract
Modern neurophysiological recordings are performed using multichannel sensor arrays that are able to record activity in an increasingly high number of channels numbering in the 100s to 1000s. Often, underlying lowerdimensional patterns of activity are responsible for the observed dynamics, but these representations are difficult to reliably identify using existing methods that attempt to summarize multivariate relationships in a post hoc manner from univariate analyses or using current blind source separation methods. While such methods can reveal appealing patterns of activity, determining the number of components to include, assessing their statistical significance, and interpreting them requires extensive manual intervention and subjective judgment in practice. These difficulties with component selection and interpretation occur in large part because these methods lack a generative model for the underlying spatiotemporal dynamics. Here, we describe a novel component analysis method anchored by a generative model where each source is described by a biophysically inspired statespace representation. The parameters governing this representation readily capture the oscillatory temporal dynamics of the components, so we refer to it as oscillation component analysis. These parameters – the oscillatory properties, the component mixing weights at the sensors, and the number of oscillations – all are inferred in a datadriven fashion within a Bayesian framework employing an instance of the expectation maximization algorithm. We analyze highdimensional electroencephalography and magnetoencephalography recordings from human studies to illustrate the potential utility of this method for neuroscience data.
eLife assessment
This method article proposes a valuable oscillation component analysis (OCA) approach, in analogy to independent component analysis (ICA), in which source separation is achieved through biophysically inspired generative modeling of neural oscillations. The empirical evidence justifying the approach's advantage is solid. This work will be of interest to researchers in the fields of cognitive neuroscience, neural oscillation, and MEG/EEG.
https://doi.org/10.7554/eLife.97107.3.sa0Introduction
Human neurophysiological recordings such as scalp electroencephalogram (EEG), magnetoencephalogram (MEG), stereoelectroencephalogram (SEEG), local field potentials, etc., consist of ∼10^{2} of sensors that record mixtures of predominantly cortical network oscillations (Buzsáki, 2006; Buzsáki and Draguhn, 2004; Lopes da Silva, 2013; Wang, 2010; Helfrich and Knight, 2016). The network oscillations have distinct spatiotemporal signatures, based on the functional brain areas involved, their interconnections, and the electromagnetic mapping between the source currents and the sensors. However, the sourcetosensor mixing and the further superposition of measurement noise complicate the interpretation of the sensorlevel data and its topography (Schaworonkow and Nikulin, 2022). Given the widespread and growing availability of highdensity neural recording technologies (Stevenson and Kording, 2011), there is clearly a pressing need for analysis tools that can recover underlying dynamic components from highly multivariate data.
This problem fits within a larger class of blind source separation (BSS) problems for which there are a plethora of component analysis algorithms that attempt to extract underlying source activity as linear weighted combinations of the sensor activity. The decomposition weights are designed according to some predefined criterion on the extracted time series depending on the application. Independent component analysis (ICA) (Hyvärinen and Oja, 2000; Jung et al., 2000; Cardoso, 1999; Hyvärinen, 2013) has been particularly popular within neuroscience for a number of reasons. It requires no assumption of the original sources except for statistical independence, that is, no or minimal mutual information between the components (Bell and Sejnowski, 1995), and in principle requires little to no user intervention to run. However, a drawback of the method is that it assumes that the samples of each component time trace are independent and identically distributed, which is not generally true in physical or physiological applications. This leads to another major drawback of ICA: it relies on the cumulative histograms of sensor recordings and if these histograms are Gaussian, ICA is in principle unable to separate such sources (Brookes et al., 2011). This includes, for example, physiological signals such as EEG, MEG, sEEG, etc., that are at least approximately Gaussian distributed due to the central limit theorem (Muirhead, 1982) since these signals reflect the linear combination of many sources of activity. Finally, given the lack of assumptions on the structure of the underlying signals, there is no guarantee that ICA will extract relevant oscillatory components or any other highly structured dynamic pattern for that matter.
Given the prominence and ubiquity of oscillations in neurophysiological data, applications of component analysis methods in neuroscience have taken a different approach to emphasize oscillatory dynamics, employing canonical correlation analysis (Robinson et al., 2017), generalized eigenvalue decomposition (Parra and Sajda, 2003), joint decorrelation (de Cheveigné and Parra, 2014), or similar methods to identify a set of spatial weights to maximize the signaltonoise ratio in the extracted component within a narrow band around a given frequency of interest (Nikulin et al., 2011; de Cheveigné and Arzounian, 2015; Cohen, 2017; Cohen, 2018). The extracted component then inherits the intrinsic oscillatory dynamics around that frequency without the need for a predesigned narrowband filter (Yeung et al., 2007). These techniques do acknowledge the inherent temporal dynamics of the oscillatory sources, but do so via nonparametric sample correlation or crossspectrum matrix estimates that are sensitive to noise or artifacts and that require substantial amounts of data to achieve consistency.
Another problem common to all of these component decomposition methods is that they do not directly estimate the source to sensor mixing matrix. Instead, they estimate spatial filters that extract the independent components, which are not directly interpretable. The source to sensor mixing matrix can only be estimated by solving an inverse problem that requires knowledge of the noise covariance matrix (Haufe et al., 2014), which adds another complication and source of potential error. Finally, the properties of the extracted component time courses are not known a priori and must be assessed after the fact, typically by using nonparametric tests as well as visual inspection.
These difficulties with component selection and interpretation occur in large part because existing BSS methods lack a generative model for the underlying spatiotemporal dynamics. With a probabilistic generative model, it is possible to specify a soft constraint on the dynamic properties of the underlying components, while maintaining other desirable properties such as independence between components. Component selection can be handled automatically within a statistical framework under the model, and interpretation is straightforward in principle if the components can be described by a small number of parameters. Here, we propose a novel component analysis method that uses a clever statespace model (Wiener, 1966; Harvey, 1990; Harvey and Trimbur, 2003) to efficiently represent oscillatory dynamics in terms of latent analytic signals (Bracewell, 2000) consisting of both the real and imaginary components of the oscillation. The observed data are then represented as a superposition of these latent oscillations, each weighted by a multichannel mixing matrix that describes the spatial signature of the oscillation. We estimate the parameters of the model and the mixing matrices using generalized expectation maximization (GEM) and employ empirical Bayes model selection to objectively determine the number of components. In these ways, we address the major shortcomings described above for many component analysis methods. We refer to our novel method as oscillation component analysis (OCA) akin to ICA. In what follows, we describe the model formulation in detail and demonstrate the performance of the method on simulated and experimental data sets, including highdensity EEG during propofol anesthesia and sleep, as well as restingstate MEG from the Human Connectome Project.
Theory
Statespace oscillator model
Oscillatory time series can be described using the following statespace representation (Wiener, 1966; Harvey, 1990; Harvey and Trimbur, 2003):
where the oscillation state, $\mathbf{x}}_{n}=[{x}_{n,1},{x}_{n,2}{]}^{\mathrm{\top}$, is a twodimensional state vector. The stochastic difference equation summarizes oscillatory dynamics using random rotation in twodimensional state space through a deterministic rotation matrix, $\mathcal{R}$, explicitly parameterized by the oscillation frequency, $f$, the sampling rate, $f}_{s$, and the damping factor, a, as
and a stochastic driving noise, assumed to be stationary and Gaussian with variance $\sigma}^{2$. These elements of this state vector trace out two time series that maintains an approximate $\pi /2$ radian phase difference, and therefore are closely related to the real and imaginary parts of an analytic signal (Bracewell, 2000) in the complex plane (see Appendix 2, section ‘Oscillation states and analytic signals’). The oscillation states are henceforth called analytic signal with minor abuse of notation. An arbitrary fixed projection (i.e., on the real line) of the state vector realizations generates the observed noisy oscillation time series at the sensor. Multiple oscillations can be readily incorporated in this statespace model by simply considering their linear combination. Recently, several investigators (Matsuda and Komaki, 2017a; Beck et al., 2018; Beck et al., 2022) utilized this statespace representation to extract underlying oscillatory time courses from singlechannel EEG time traces.
Generalization to multichannel data
In order to represent multichannel neural recordings, we employ this oscillatory statespace representation within a BSS model (Parra, 1998). In the proposed generative model, a sensor array with L sensors records neurophysiological signals produced by superimposition of M distinct oscillations supported by underlying brain networks or circuits, which we will refer to as oscillation sources.
Each oscillation source is governed by the abovementioned statespace representation, resulting in a structured 2M dimensional state space. The structure of the underlying brain networks or circuits governs how each oscillation source is observed at the sensor array, via a $M\times 2$ spatial distribution matrix, ${\mathbf{c}}_{\ast ,m}=[{\mathbf{c}}_{1,m},{\mathbf{c}}_{2,m},\cdots ,{\mathbf{c}}_{L,m}]$. In other words, the l th electrode in the sensor array observes a specific projection of the underlying mth analytic oscillation given by ${\mathbf{c}}_{l,m}=[{c}_{l,m,1},{c}_{l,m,2}]$, which encodes the amplitude and phase of the mth oscillation time course at that electrode. This probabilistic generative model for multichannel recordings is motivated by a potential biophysical mechanism of electromagnetic traveling wave generation in brain parenchyma (Galinsky and Frank, 2020; see Appendix 1, section ‘Mechanistic origin’).
An illustration
In Figure 1A, middle panel, we depict an oscillation state as an analytic signal, $\mathbf{x}}_{n$ (black arrows), in 2D state space that is rotating around the origin according to the given statespace model of left panel: the red dashed line traces the oscillation states over time. The actual amount of rotation between every pair of consecutive timepoints is a random variable centered around $2\pi f/{f}_{s}=0.22\pi$, with the spread determined by damping parameter, $a=0.99$, and process noise covariance. The right panel shows two noisy measurements, reflecting two different projections: one on the real axis (blue traces), another on the line making $\pi /4$ radian angle to the real axis (orange traces). Because of this angle between the lines of projection, these two measurements maintain approximately $\pi /4$ radian phase difference throughout the time course of the oscillation.
We note that several earlier works Matsuda and Komaki, 2017b; Quinn et al., 2021 have similar multivariate oscillator models, albeit from the perspective of a spatiospectral eigendecomposition of the companion form of a multivariate autoregressive (MVAR) parameter matrix (Neumaier and Schneider, 2001). As described earlier, we employ this statespace form here in the context of a BSS problem.
Next, we briefly describe how one can infer the hidden oscillator states from observed multivariate timeseries dataset given the oscillation statespace model parameters and potentially adjust the oscillation statespace model parameters to the dataset. We defer the mathematical derivations to Appendix 3 to maintain lucidity of the presentation.
Learning algorithm
Priors
We assume simple prior distributions on the measurement noise and sensorlevel mixing coefficients to facilitate stable and unique recovery of the oscillation components. To obtain an $M$oscillator analysis of $L$channel data, we consider a Gaussian prior on the spatial mixing components, $\mathbf{c}}_{l,m$, with precision α, and an inverseWishart prior on sensor noise covariance matrix, $\mathbf{R}$, with scale, $\mathbf{\Psi}$, and degrees of freedom, $\nu$:
We treat the oscillation parameters, $({f}^{(m)},\text{}{a}^{(m)},\text{}{({\sigma}^{2})}^{(m)}):={\mathit{\theta}}^{(m)}$, and distributional parameters of the assumed priors, $\alpha$, $\mathbf{\Psi},\text{}\nu$, as hyperparameters. Figure 1B shows the probabilistic graphical model portraying oscillations as a dynamical system evolving in time, with the priors as parent nodes.
Variational Bayes inference
Unfortunately, given the priors and the interplay between oscillation source time courses and sensorlevel mixing patterns, exact computation of the loglikelihood, and thus the exact posterior, is intractable. We therefore employ variational Bayes (VB) inference (Quinn and Šmídl, 2006), a computationally efficient inference technique to obtain a closedform approximation to the exact Bayes posterior. Originally introduced by Hinton and van Camp, 1993, VB inference simplifies the inference problem through a restrictive parameterization that reduces the search space for distributions, splitting up the problem into multiple partial optimization steps that are potentially easier to solve (Attias, 1999). The name comes from the negative variational free energy, also known as evidence lower bound (Neal and Hinton, 1998), used to assess the quality of the aforementioned approximation and as a surrogate for the loglikelihood (see Appendix 3, section ‘Negative variational free energy’).
In particular, given the number of oscillatory components $M$ and the corresponding hyperparameters $\mathit{\theta}=[{\mathit{\theta}}^{(1)},{\mathit{\theta}}^{(2)},\cdots ,{\mathit{\theta}}^{(M)}]$ and $\alpha ,\text{}\mathbf{\Psi},\text{}\nu$ , we use the following VB decoupling (Attias, 1999) of the posterior distribution of mixing matrix $\mathbf{C}$, oscillation states $\mathbf{x}}_{t$, and noise covariance matrix $\mathbf{R}$:
This particular choice allows the approximate posteriors of these quantities to be represented in closed form by multivariate Gaussian, multivariate Gaussian, and inverseWishart distributions, respectively (see Appendix 3, section ‘Variational Bayes inference’). Figure 1C shows the graphical representation of posterior distribution after the VB decoupling. This essentially allows us to perform an iterative posterior inference procedure, where we cyclically update the posteriors $q\phantom{\rule{thinmathspace}{0ex}}({\mathbf{x}}_{t}\mid M)$, $q\phantom{\rule{thinmathspace}{0ex}}(\mathbf{C}\mid M)$, and $q\phantom{\rule{thinmathspace}{0ex}}(\mathbf{R}\mid M)$ using the latest sufficient statistics from other two distributions (see Appendix 3, section ‘Variational Bayes inference’ for more details).
Generalized EM to update hyperparameters
Since the parameters of the statespace model, and of the assumed prior distributions, are not known a priori, we then obtain their point estimates using an instance of the GEM algorithm, which utilizes the aforementioned approximate inference in the Estep (Dempster et al., 1977; Neal and Hinton, 1998). We start the learning algorithm by initializing θ, $\overline{\mathbf{\Lambda}}$, $\overline{\mathit{C}}$, α to appropriate values (see Appendix 3, section ‘Initialization of parameters and hyperparameters’ for details). We then cyclically update the posteriors $q\phantom{\rule{thinmathspace}{0ex}}({\mathrm{x}}_{\mathrm{t}}\mid M)$, $q\phantom{\rule{thinmathspace}{0ex}}(\mathbf{C}\mid M)$, and $q\phantom{\rule{thinmathspace}{0ex}}(\mathbf{R}\mid M)$ until these iterations stop changing the negative variational free energy. At the end, we update the hyperparameters θ, α from the current inference and proceed to the next inference with the updated hyperparameters. These update rules form an outer update loop that refines the hyperparameters, within which operates an inner update step that iteratively improves the approximate posterior distribution of the model parameters. We defer the update rules to Appendix 3, section ‘Generalized EM algorithm’. The algorithm terminates when the hyperparameter adjustments no longer increase the free energy of the model, that is, we resort to parametric empirical Bayes estimation (see Appendix 3, section ‘Empirical Bayes inference and model selection).
Selecting optimal number of oscillations
The learning algorithm assumes that the number of oscillation sources, $M$, is known a priori, which is rarely the case for experimentally recorded data. In theory, our probabilistic treatment could assign a loglikelihood for a given dataset to every model structure, that is, the number of oscillation sources, that can be used as a goodnessoffit score. Unfortunately, the loglikelihood is intractable to evaluate; however, the negative variational free energy can be used as a surrogate.
We formalize this idea by considering that the model structure, M, is drawn from a discrete uniform distribution over a finite contiguous set, $(1,{M}_{max})$:
with $M}_{max$ being the maximal number of oscillation sources. This allows us to compute the posterior on the model structure, $q(M)$, within the same variational approximation framework. Using the negative variational free energy expression of the oscillator model, it is easy to show that $q(M)$ is proportional to the exponential of the negative variational free energy of the Moscillator model (see Appendix 3, section ‘Model structure posterior’). Once the model posteriors are obtained, one can choose to simply select the model with maximum negative variational free energy (empirical Bayes model selection, see Appendix 3, section ‘Empirical Bayes inference and model selection’), or locate the knee or jump point that exhibits a significant local change in $q(M)$ (for such an example, see the DiffBic function; Zhao et al., 2008).
Oscillation component analysis
A preliminary version of this algorithm has been presented previously (Das and Purdon, 2022). In order to analyze experimental data, we devised the a standardized pipeline as demonstrated in Figure 1D closely following the MNE ICA pipeline (Gramfort et al., 2014). We refer to this pipeline as OCA. The pipeline, implemented as Python class OCA, accepts the sensor recordings either as continuous raw data or a set of discrete epochs, each with an initial sensor noise covariance matrix. The hyperparameters for the measurement noise covariance prior are derived from the supplied initial sensor noise covariance matrix and are not updated in this pipeline. The sensor data is standardized (i.e., prewhitened) against the sensor noise covariance: this step also applies all active signalspace projectors to the data if present such as the average reference (Uusitalo and Ilmoniemi, 1997). The prewhitened data is then decomposed using PCA. The first n_components are then passed to the parameter learning iterations for hyperparameters θ, α, followed by the joint inference of the mixing matrix, oscillation time courses, and residual noise covariance. The optimal number of oscillations is then selected as described earlier. The final noise covariance is computed from the remaining PCA components that are not supplied to oscillation fitting and the OCA residuals. Once the oscillation hyperparameters and the mixing matrix have been estimated, the oscillation time courses can be estimated from any continuous recordings or discrete epochs. Finally, the pipeline can also reconstruct a multichannel signal from any arbitrary subset of oscillation components. We emphasize here that the main reason for using PCA here is to perform rotation of the data, so that (1) rankdeficient data (i.e., average referenced EEG data) can be handled conveniently by throwing out the zerovariance component. (2) The variance the multichannel data can be uniformly distributed in the remaining PCs. For these reasons, we retain components explaining 99.9% variance of the multichannel recordings, that is, almost all components with nonzero variance. The PCA step essentially performs rotation of the data in the spatial dimension, leaving the temporal relationships untouched.
Results
Finally, we apply OCA on a simulation example and several experimentally recorded datasets. We demonstrate the versatility of OCA analysis using two EEG datasets, one under propofolinduced anesthesia, another during sleep, and one restingstate MEG dataset. The simulation example shows the utility of such timedomain analysis over frequencydomain analysis in the context of multichannel recordings. In the real data applications, we showcase the implications of OCA as a mechanistically grounded tool, suggesting various downstream analysis involving the oscillation time courses and their sensor distributions.
Simulation example
We first apply the proposed oscillation decomposition method on a synthetic dataset to illustrate how our timedomain approach differs from traditional frequencydomain approach. We generated a synthetic EEG dataset using a 64channel montage, forward model and noise covariance matrix from a sample dataset distributed with the MNE software package (Gramfort et al., 2014). To select the active current sources, four regions with $5\phantom{\rule{thinmathspace}{0ex}}\phantom{\rule{thinmathspace}{0ex}}{\mathrm{m}\mathrm{m}}^{2}$ area were centered on chosen labels in the DKT atlas (Desikan et al., 2006): ‘transversetemporallh’, ‘precentralrh’, ‘inferiorparietalrh’, and ‘caudalmiddlefrontallh’. Three time courses were separately generated at a sampling frequency of 100 Hz according to univariate AR(2) dynamics tuned to generate stable oscillations at 1.6 Hz (slow/delta), 10 Hz (alpha), and 12 Hz (alpha) frequencies. The first two regions were simulated with the same slow/delta time course such that the activity in the ‘precentralrh’ area (Figure 2A, orange) lags that of ‘transversetemporallh’ area (Figure 2A, blue) by 10 ms. The last two regions, ‘inferiorparietalrh’ and ‘caudalmiddlefrontallh’, were simulated with two independent alpha time courses. We projected this source activity to the EEG sensors via a leadfield matrix and add spatially colored noise generated with the covariance structure in Figure 2D to the sensor data. Two 20 s epochs were chosen for OCA and model selection was performed within the range of (2, 3, 4, 5, 6, 8, 10) oscillations. Figure 2B presents the two most common frequencydomain visualizations of multichannel data: the top panel shows the power distribution over the scalp within different canonical EEG frequency bands, while the channelwise power spectral density (PSD) is shown in the bottom panel. Figure 2C demonstrates how the decomposition obtained by OCA provides an accurate delineation of different oscillation time courses and effective characterization of their spatial distributions in the sensorlevel mixing maps. Unlike the frequencydomain techniques, the residual time series that follows OCA extraction provides the temporally unstructured part of the multichannel recording. The estimated covariance matrix from these residuals closely resembles the original sensor noise covariance matrix that was used to corrupt the recordings (see Figure 2E). The covariance matrix estimate is strikingly close to the true covariance matrix (i.e., compare with Figure 2D). OCA provides a measure, $q(M)$, for model structure selection that objectively identifies the existence of three oscillation time courses with different time dynamics (see Figure 2F).
How does OCA compare to conventional approaches?
We performed additional analyses to compare OCA with Fourierbased frequency domain method and ICA in a different realization of a similar synthetic EEG dataset (see Figure 3A). The frequencydomain approach applied here is based on the multitaper method: given a bandwidth parameter of 2 Hz, power spectrum density is computed for individual channels. The oscillatory activity can be identified visually as the peaks in the power spectrum plot in Figure 3B, top panel, and its scalp distribution can be obtained by averaging the power within given frequency bands around the peaks in each channel (as shown in Figure 3B, bottom panels). However, these visualizations do very little to identify underlying oscillatory sources and pose additional questions. For example, the left topographical plot hints to two possible sources, but provides no evidence if they are separated spatially, temporally or both. Similarly, the right two topographical plots are almost identical and show no separability between these the oscillatory sources that generate two distinct peaks in spectrum. Regarding ICA, we use the ‘extended infomax’ (Lee et al., 1999) implementation provided by MNEpython 1.2 (Gramfort et al., 2014) to see if ICA can distinguish the three statistically independent sources in these data. The leading four ICA components are shown in Figure 3C. The ICAidentified components mix the underlying independent signals, which is not surprising since all components follow a Gaussian distribution. Meanwhile, OCA is able to recover three distinct oscillatory components consistent with the ground truth. OCA is able to do so because it explicitly models temporal dynamics of the components.
We repeated similar analyses using real human EEG recording and found a similar result where ICA produced components that mixed slow and alpha band signals, whereas OCA identified distinct oscillatory components (see Appendix 4, section ‘Comparison of OCA to traditional approaches in experimental EEG data’).
EEG recording during propofolinduced unconsciousness
Next, we demonstrate utility of OCA on experimental data using EEG recordings from a healthy volunteer undergoing propofolinduced unconsciousness, previously described in Purdon et al., 2013. For induction of unconsciousness, the volunteer underwent a computercontrolled infusion of propofol to achieve monotonically increasing levels of effectsite concentration in steps of $1\text{}\text{\xb5}g\text{}{\mathrm{m}\mathrm{L}}^{1}$. Each target effectsite concentration level was maintained for 14 min. The EEG was recorded using a 64channel BrainVision MRI Plus system (Brain Products) with a sampling rate of 5000Hz, bandwidth 0.016–1000 Hz. The volunteers were instructed to close their eyes throughout the study. Here, we investigated EEG epochs during maintenance of effectsite concentrations of $0\text{}\text{\xb5g}\text{}{\mathrm{m}\mathrm{L}}^{1}$ (i.e., baseline, eyes closed), $2\text{}\text{\xb5g}\text{}{\mathrm{m}\mathrm{L}}^{1}$, and $4\text{}\text{\xb5g}\text{}{\mathrm{m}\mathrm{L}}^{1}$ for OCA. We selected 10 clean 3.5 s epochs corresponding to those target effectsite concentration (see ‘Materials and methods’).
We fitted OCA models with 20, 25, 30, 35, 45, 50, 55, 60 oscillation components and selected the model with highest negative variational free energy as the best OCA model. Following these empirical Bayes criteria, we selected OCA models with 30, 30, 50 components for these three conditions, respectively. When the center frequencies were grouped within the canonical frequency bands (Buzsáki, 2006), we found 17, 17, 29 slow/delta (0.1–4 Hz) oscillation components and 17, 9, 15 alpha (8–13 Hz) oscillation components for the three conditions, respectively. Figure 4A–C shows PSDs of reconstructed EEG activity within each band, defined as the cumulative projection of these grouped oscillation components to the sensors. Since each of these oscillations have different sensor distributions, the increasing number of oscillation components can be associated with fragmentation of neuronal networks that support these oscillations.
We also quantified the coherency of the reconstructed EEG activity within these commonly used frequency bands as the ratio of the power of the strongest component to the total power of the oscillation components within the bands. For the data analyzed here, slow + delta band coherency decreases gradually while the alpha band coherency increases with the increasing propofol effectsite concentration. To investigate this observation further, we visualized three dominant alpha components from each conditions in order of their strengths in Figure 4D–F. In each column, the left subpanels consist of two topographic plots: the left one showing the distributions of the strength (scaled between 0 to 1) and the right one showing the relative phase of the estimated analytic oscillation mixing, $\mathbf{c}}_{i,j$. The right subpanels display examples of the extracted analytic oscillations, that is, estimated real and imaginary components of 1 s long oscillation segments. The rightmost black bars show the coherency measure within alpha band. Clearly, during the $4\text{}\text{\xb5g}\text{}{\mathrm{m}\mathrm{L}}^{1}$ effectsite concentration, the amplitude of the leading alpha oscillation component is much bigger than the others, which aligns with our observation of the coherency measure. This finding previously reported analyses showing a rise of slow + delta oscillation power, rapid fragmentation of a slow + delta oscillation network, and emergence of global alpha network during propofolinduced anesthesia (Lewis et al., 2012; Murphy et al., 2011). Further, as the propofol effectsite concentration increases, the temporal dynamics increasingly resemble a pure sinusoidal signal, as evident from the PSD of the alpha band reconstruction that becomes more concentrated around the center frequency. Lastly, the topographic plots in Figure 4D–F illustrate how alpha activity shifts from posterior electrodes to frontal electrode with increasing propofol concentration (Cimenser et al., 2011; Purdon et al., 2013).
EEG recordings under different sleep stages
The HDEEG data shown in Figure 5 is from a young adult subject (age 28, female) undergoing a sleep study, recorded using the ANT Neuro eego mylab system. After preprocessing (see ‘Materials and methods’), we visually selected 10 clean 5 s segments of data that exhibited strong alpha oscillations, each from relaxed wakeful (eyes closed) and rapid eye movement (REM) sleep to apply OCA.
We fitted OCA models with 20, 25, 30, 35, 45, 50, 55, 60 oscillation components and selected the model structure with the highest negative variational free energy as the optimum OCA model. The empirical Bayes criterion selected 55 and 50 OCA components for relaxed wakeful (eyes closed) and REM sleep, respectively. Figure 5 follows a format similar to Figure 4: panels A nd B show the power spectrum density of the recorded EEG and the reconstructed sensor data from the OCA components grouped in slow, theta and alpha bands; while panels C and D visualize three of the extracted oscillation components within the alpha band, ordered according to their strengths. The topographical maps show the scalp distribution of these components, while the blue and red traces show 1slong oscillation timeseries pair corresponding to each component.
Activity within the alpha band (8–13 Hz) could be summarized by a few oscillation components: for example, only seven and three OCA components had their center frequency within alpha band during relaxed wakeful (eyes closed) condition and REM sleep, respectively. These alpha oscillation components during relaxed wakeful (eyes closed) condition and REM sleep appeared to be distinct in terms of their temporal dynamics: relaxed wakeful alpha components are more regular (i.e., closer to sinusoidal activity) than REM alpha. But REM alpha coherency (0.4307) is moderately higher than that of relaxed wakeful alpha (0.3195). The spatial distribution of the REM alpha component appeared to be confined primarily within the posterior channels while the awake alpha is more distributed. To quantify the degree of similarity or difference between the spatial distributions, we calculated the principal angle between the spatial mixing maps of the awake eyes closed and REM alpha components. The principal angle measures the similarity between two subspaces, where an angle of 0° indicates that one subspace is a subset of the other and an angle of 90° indicates that at least one vector in a subspace is orthogonal to the other (Bjorck and Golub, 1973). The spatial mixing maps for awake eyes closed alpha and REM sleep had a principal angle of 44.31°, suggesting that the subspaces, and thus the underlying cortical generators, were substantially different.
Restingstate MEG recording
Finally, we demonstrate the utility of OCA on a MEG dataset from the Human Connectome Project.The MEG data used here is the restingstate recording from subject 104 012_MEG, session 3.
We fit OCA on 34 clean 5s epochs, and from among the OCA with 20, 25, 30, 35, 40, 45, 50, 55 components, empirical Bayes criterion selected the OCA with 30 components. Out of these, 13 were within the slow + delta band and 10 were alpha oscillations. For this example, we sought to highlight a different downstream analysis of the extracted oscillation components. We chose the three leading slow oscillation components and the three leading alpha oscillation components from the set of identified oscillation components.
For each oscillation, OCA extracts a pair of time traces, that is, one real time trace and one imaginary time trace (see Appendix 2, section ‘Oscillation states and analytic signals’). These ‘real’ and ‘imaginary’ indices can be utilized to compute the instantaneous phase and instantaneous amplitude of individual oscillation components at every timepoint, without having to rely on Hilbert transform (Wodeyar et al., 2021).
Further, we pick any one of the slow oscillation components and one of the alpha oscillation components and consider the ordered pair of slow component phase and alpha component amplitude at a single timepoint as a sample drawn from the joint distribution of the ‘phase–amplitude’ of these two components. These samples then can be used to quantify cross frequency phase–amplitude coupling between these components via either nonparametric (Cohen, 2008; Cheng et al., 2018; MartínezCancino et al., 2019) or parametric modeling (Soulat et al., 2022; Perley and Coleman, 2022).
Figure 6 demonstrates a possible way of investigating the coupling between the slow oscillation component phase and alpha oscillation component amplitude: the conditional mean of the alpha amplitude given slow/delta phase is higher at some specific phases between all three slow waves and OSC012, but overall flat for other alpha oscillation components. This example illustrates how the OCA algorithm can be used to identify specific oscillatory components that show phase–amplitude coupling.
Discussion
OCA is a novel approach to the multichannel component decomposition problem that can identify a number of independent spatiotemporal components, but does so in the context of underlying temporal dynamics specified by a statespace model. In the statespace representation used here, the dynamics of an elemental oscillation are parameterized by a central frequency, f, a damping parameter, a, and the secondorder statistics of stochastic driving noise, $\sigma}^{2$. Meanwhile, an associated spatial mixing pattern at the sensor level quantifies the contribution of the elemental oscillation to each observed channel.
The OCA learning algorithm uses an instance of the GEM algorithm to iteratively match the parameters of the oscillation statespace parameters to the secondorder statistics of the M/EEG data. The goodnessoffit for the datadriven matching procedure is defined within a Bayesian framework, as is the inference for the oscillation time courses, their sensorlevel mixing patterns, and the measurement noise covariance matrix. This same goodnessoffit metric is further used to determine the number of oscillations present in the multichannel data via empirical Bayes model selection. Once the number of oscillations, oscillation statespace parameters, their sensorlevel mixing patterns, and the measurement noise covariance matrix are estimated, the elemental oscillatory activities can be extracted as a pair of time courses, that is, the ‘real’ and ‘imaginary’ parts of the analytical signal, from any given observation.
The oscillator statespace parameters discovered in OCA are akin to the ad hoc parameters, for example, peak frequency, fullwidthhalfmaxima bandwidth, and peak oscillation power, etc., encountered in Fourier/waveletbased frequencydomain nonparametric timeseries analysis (Donoghue et al., 2020). However, OCA circumvents a major drawback of frequencydomain methods when searching for neural oscillations from multichannel recordings. Generally, the sourcetosensor mixing complicates the interpretation of the topographies generated by signal processing tools that treat each channel individually (Schaworonkow and Nikulin, 2022). In a nonparametric frequencydomain approach, the only way to discover this spatially correlated structure across sensors is to perform eigenvalue analysis on the crossspectral density on a frequencybyfrequency basis, also known as global coherence analysis (Cimenser et al., 2011; Weiner et al., 2023; Mitra and Bokil, 2007). If the eigenvalue distribution exhibits sufficient skewness, the associated frequency is deemed coherent and the leading eigenvectors are identified as the ‘principal’ sensor networks at that frequency; in other words, a proxy for a strong oscillatory component. As the number of sensors increases, this analysis becomes increasingly intractable: in a sensor array with ∼10^{2} sensors, the crossspectral density matrix has ∼10^{4} entries. Alternatively, the statespace representation used here in OCA provides a convenient and compact way to perform multichannel timedomain modeling and analysis (Nise, 2011). The OCA modeling approach decouples the oscillatory dynamics and the spatial mixing pattern effectively by estimating one set of parameters for each discovered oscillation and the associated spatial mixing matrix, thereby avoiding the need for crossspectral density matrices and frequency parameters for individual channels altogether (Gunasekaran et al., 2023). Another major advantage of OCA over global coherence analysis is that OCA automatically identifies the dominant coherent oscillations across the channels based on its probabilistic generative model and Bayesian learning approach.
The iterative parameter estimation procedure of OCA is clearly more computationally burdensome compared to conventional frequencydomain nonparametric methods. However, once the OCA parameters are estimated, OCA can decompose any given multichannel recording segments into the oscillation timeseries pairs. In that sense, OCA parameter learning can be viewed as iterative estimation of a pair of optimal spatio–temporal filters (Anderson and Moore, 2005) for each oscillation, parameterized by the estimated oscillation statespace parameters and spatial maps. These optimal spatiotemporal filters are then applied to the multichannel data to extract the oscillation time courses. As a result, the extracted narrowband activity is endogenous to the data, rather than being imposed by an arbitrary narrowband filter (Yeung et al., 2007).
This behavior of OCA is similar to oscillatory component extraction methods based on datadriven spatial filtering, where the extracted component inherits the intrinsic oscillatory dynamics around the specified frequency (Nikulin et al., 2011; de Cheveigné and Arzounian, 2015; Cohen, 2017; de Cheveigné and Arzounian, 2015). In fact, OCA employs the same philosophy as spatiotemporal source separation (Cohen, 2018), where the extracted narrowband signal and its timeshifted versions are again projected to a temporal coordinate space to further enhance the power within the given frequency band. However, spatiotemporal source separation establishes the temporal constraint via nonparametric sample correlation estimates that are sensitive to noise or artifacts and that require substantial amounts of data for estimation. Another important distinction is that spatiotemporal source separation determines the sensor weights first and then obtains the temporal filter kernel from the projection of the multichannel data. In contrast, OCA updates the sensor weights and the parameters for the temporal filter iteratively within the Bayesian formulation of the statespace framework, thus jointly estimating the spatial and temporal components simultaneously from the data.
The class of ICAbased methods, on the other hand, assumes that the component time courses share no mutual information, that is, are statistically independent. Without any explicit assumption on the temporal dynamics of the generative process, the properties of the identified components may be difficult to interpret and may also be unreliable. In fact, the identified ICA components typically require subjective visual inspection by experts for their possible interpretation. A number of investigators have recognized these limitations with ICA, in particular the assumption of temporal independence, and have proposed generalizations of the ICA framework to incorporate autoregressive modeling of source dynamics (Pearlmutter and Parra, 1996; Parra, 1998). Despite demonstrating improved source separation performance in naturalistic signals like music (Pearlmutter and Parra, 1996), adoption of these methods for the analysis of neurophysiological data has been slow. This may be due in part to the lack of interpretability of the higher order autoregressive model structures. Alternatively, Brookes et al., 2011 use a combination of Hilbert envelope computation within predefined frequency bands and temporal downsampling to identify meaningful temporally independent time signals, but stop short of trying to disambiguate different oscillatory sources. Thus, they only assess the connectivity pattern within predefined bands, that is, how different areas of the brain are harmonized through modulation of the oscillations or vice versa inside those predefined bands. The spatial maps recovered from anatomically projected restingstate MEG data by this method resemble spatial patterns of fMRI restingstate networks. OCA is close to the first variant of ICA, but describes the identified component sources using a 2D statespace vector that efficiently represents oscillatory dynamics (Matsuda and Komaki, 2017a) in a manner that is easy to interpret. Matsuda and Komaki, 2017b described a similar statespace model for multivariate time series, but did not apply the model to neurophysiological data, perhaps due to its high dimensionality. The OCA algorithm differs by way of its learning and model selection algorithms, which allow it to select the number of components to extract in a statistically principled and datadriven manner. Overall, OCA delivers the best of the previously mentioned eigenanalysisbased source separation methods and ICAs: a decomposition method that can identify independent components, but where each component represents an underlying oscillatory dynamical system. The generative link between each component and the underlying dynamical system makes interpretation of the components straightforward. Here, we focus specifically on analyzing neurophysiological data that are exemplified by highly structured oscillatory dynamics. But the methods we describe apply to more general, arbitrary dynamics that can be approximated by linear statespace models, and can be equally useful so long as the statespace model is interpretable.
There is another important distinction between OCA and the aforementioned BSS techniques. These methods directly estimate the sensor weights for a weighted linear combination of sensor recordings that produce the individual temporal components (Haufe et al., 2014). In other words, these BSS methods provide a backward decoding algorithm. The linear weights are not directly interpretable since they do not specify how the individual temporal components map to the observed data. That uniquely interpretable information is provided by a forward mapping which must be estimated in a separate step. When the number of linear components matches with the number of sensors, this forward mapping can be calculated via simple matrix inversion. However, when there is a reduced set of components the forward mapping can only be obtained by solving an inverse problem that requires knowledge of the source and sensor sample covariance matrix. In contrast, the OCA model involves only the (forward) spatial distribution matrix and our algorithm directly estimates it. In fact, the backward extraction model of OCA involves nonzero weights for timeshifted signals and is never explicitly computed. In that sense, OCA avoids an extraneous transformation step that could inadvertently introduce errors to the spatial mixing pattern estimates.
Another popular timedomain approach for oscillatory signal discovery is the multichannel extension of empirical mode decomposition. Empirical mode decomposition models the timeseries data as linear combination of intrinsic oscillations called intrinsic mode functions (IMFs). Identification of IMFs depends critically on finding the local mean of the local extrema, which is not welldefined in the context of multichannel recordings. Rehman and Mandic, 2010 chose to take realvalued projections of the $n$channel data, with the projections taken along direction vectors uniformly sampled on a unit spherical surface in an $n$dimensional coordinate space. The local extrema of each of the projected signals are extrapolated to obtain a set of multivariate signal envelopes. Clearly, this random sampling in a highdimensional space is computationally demanding, and the fully nonparametric formulation requires a substantial amount of data, making the procedure sensitive to noise. On the contrary, OCA employs a parametric model that represents oscillatory dynamics in a manner that is statistically efficient and resilient to noise.
Conclusion
In summary, starting from a simple probabilistic generative model of neural oscillations, OCA provides a novel datadriven approach for analyzing multichannel synchronization within underlying oscillatory modes that are easier to interpret than conventional frequencywise, crosschannel coherence. The overall approach adds significantly to existing methodologies for spatiotemporal decomposition by adding a formal representation of dynamics to the underlying generative model. The application of OCA on simulated and real M/EEG data demonstrates its capabilities as a principled dimensionality reduction tool that simultaneously provides a parametric description of the underlying oscillations and their activation pattern over the sensor array.
Materials and methods
M/EEG preprocessing
All prepossessing were performed using MNEpython 1.2 (Gramfort et al., 2014) and Eelbrain 0.37 (Brodbeck et al., 2023), with default setting of the respective functions.
EEG recording during propofolinduced unconsciousness
Request a detailed protocolThe EEG was bandpass filtered between 0.1 Hz to 40Hz, followed by downsampling to 100 Hz and average referencing prior to performing OCA. The 0.1 Hz highpass filtering was done to filter out slow drifts in the EEG recordings, which can adversely affect the OCA fitting procedure. All selected epochs had peaktopeak signal amplitude less than 1000 mV. The prior on the noisecovariance was estimated from the same EEG recording, after highpass filtering above 30 Hz.
Sleep EEG recording
Request a detailed protocolThe EEG data was first downsampled to 100 Hz after bandpass filtering within 1–40 Hz. The flat and noisy channels are first identified upon visual inspection. We computed neighborhood correlation for the rest of the channels and marked channels with median correlation, $\rho <0.4$ as bad channels. These channels are dropped for the subsequent analysis.
MEG recording from the Human Connectome Project
Request a detailed protocolWe divided the entire restingstate MEG recording from subject 104 012_MEG, session 3 into 5 s epochs, and selected 34 epochs with peaktopeak amplitude less than 4000 fT for OCA. Prior to fitting OCA, we used signalspace projection (Uusitalo and Ilmoniemi, 1997) to remove heartbeat and eye movement artifacts from MEG signals. The repaired MEG recordings are then downsampled to 100 Hz after applying appropriate antialiasing filter.
Code availability
Request a detailed protocolAn implementation of the state inference and parameter learning algorithms described in this article is available as part of purdonlabmeeg Python library on GitHub at https://github.com/proloyd/purdonlabmeeg, copy archived at Das, 2024 under MIT license. The simulation script to generate Figure 2, included in the repository, demonstrates how to run OCA on preprocessed EEG/MEEG data and visualize the results. Codes for analyzing experimental EEG and MEG recordings exactly follow the simulation script, i.e., is essentially a single function call with the experimental data (preprocessed according to previous section) as an argument and the results are to also be visualized as done in the simulation script. Thus they were not included as a separate repository, but can be made available to individuals upon request.
Appendix 1
Mechanistic origin
Here, we detail how the multichannel multioscillator statespace model can be understood in terms of a biophysical model of brain wave generation.
Chargecontinuity and electromagnetic brain waves
Maxwell’s equation provides the following chargecontinuity equation that governs the spontaneous spatialtemporal evolution of the potential, $\mathrm{\Phi}$, in brain tissues:
This differential equation is satisfied by traveling waves of the form, $\mathrm{\Phi}\sim \mathrm{exp}\left[j\left(\mathbf{k}\cdot \mathbf{r}+\mathrm{\Omega}t\right)\right]$, where $\mathbf{r}$ and $t$ denote the spatial coordinate and time, respectively, with the wave number, $\mathbf{k}$, and the complex frequency, $\mathrm{\Omega}$, satisfying following dispersion relation: $D(\mathrm{\Omega},\mathbf{k})=j\mathrm{\Omega}k{}^{2}{\mathrm{\Sigma}}_{ij}{k}_{i}{k}_{j}j{\mathrm{\partial}}_{i}{\mathrm{\Sigma}}_{ij}{k}_{j}$, in tensor notation. Separating the real and imaginary parts of the complex frequency, $\mathrm{\Omega}=\omega +j\gamma$, we arrive at the following expressions for decaying and oscillatory frequencies:
Clearly, wave solutions $\mathbf{\Phi}$ satisfying $\gamma /\omega \ll 1$, that is, the oscillatory frequency is much larger than the decaying frequency, can give rise to electromagnetic oscillations in brain parenchyma, governed by the brain composition and architecture. Detailed treatment of such a model, considering the anisotropy and inhomogeneity of the brain tissue, can be found in Galinsky and Frank, 2020.
Traveling wave solution to the statespace model
Here, we consider one such oscillatory electric potential being recorded at $L$ sensors placed at coordinates $\mathbf{r}}_{1},{\mathbf{r}}_{2},\cdots ,{\mathbf{r}}_{L$ (w.r.t. a reference electrode placed at infinity). The potentials sampled (sampling frequency, ${f}_{s}=1/\mathrm{\Delta}t$) at those electrodes at time index $n$ are given as
with $A}_{l$ representing the amplitude observed at lth electrode. The separability of the spatial part ${\varphi}^{spatial}(l)={A}_{l}\mathrm{exp}\left[j\left(\mathbf{k}\cdot {\mathbf{r}}_{l}\right)\right]$ and the temporal part ${\mathrm{\Phi}}^{temporal}(n)=\mathrm{exp}\left[j\left(\mathrm{\Omega}\text{}n\mathrm{\Delta}t\right)\right]$ of the general solution is crucial for the following development since only the temporal part, ${\mathrm{\Phi}}^{temporal}(n+1)$, evolves in time,
while the spatial part ${\varphi}^{spatial}(l)$ remains constant. Adapting the following realvalued matrixvector notation that consists of the real and imaginary parts of the temporal and spatial components, $\mathrm{\Phi}}^{temporal}(n)={x}_{n,1}+j{x}_{n,2$ and $\mathrm{\Phi}}^{spatial}(l)={m}_{l,1}+j{m}_{l,2$, we arrive at:
where $y}_{n+1}^{(l)$ is the recorded potential at electrode $l$ at time index $(n+1)$. Essentially, the complexvalued oscillatory function, ${\mathrm{\Phi}}^{temporal}(n)$, is an analytic signal whose real and imaginary parts constitute the ordered pair, ${\mathbf{x}}_{n}=[{x}_{n,1},{x}_{n,2}]$, which we call the oscillation state. Equation A1.5 describes a circular motion of the oscillation state $\mathbf{x}}_{n$ in a 2D state space with frequency $\omega$ at every time step through the damped rotation matrix $\mathrm{exp}\left[\gamma \mathrm{\Delta}t\right]\mathcal{R}(\omega \mathrm{\Delta}t)$, while a projection, dictated by the electrode location (Equation A1.6), is recorded at the EEG electrodes as an oscillation (see Figure 1a). This deterministic model forms the basis for the probabilistic linear Gaussian statespace model in Equation 1 which introduces process and observation noise, $\mathit{\upsilon}}_{n$ and $\u03f5}_{n$, respectively, with redefined spatial components as $c}_{l,\ast}={A}_{l}{m}_{l,\ast$, and the decay term $\mathrm{exp}\left[\gamma \mathrm{\Delta}t\right]$ as a damping parameter, $a$. We also represented the frequency of oscillation, f, as f = ω/2π by replacing $\omega \mathrm{\Delta}t=2\pi f/{f}_{s}$.
Appendix 2
Oscillation states and analytic signals
Here, we consider realvalued observations, $y}_{n}\in \mathbb{R$, admitting a statespace representation in complex plane, that is, generated from complexvalued states, $z}_{n}\in \mathbb{C$:
where $\overline{z}}_{n$ denotes complex conjugate of $\overline{z}}_{n$, q_{n} and r_{n} are complexvalued and realvalued random variables. In steady state, that is, $t\to \mathrm{\infty}$, the states, z_{n}, admit Fourier representation given by
It is clear that if q_{n} is an analytic signal, that is, $Q\phantom{\rule{thinmathspace}{0ex}}({e}^{j\omega})$ has no negative frequency component, the same will be true for $Z\phantom{\rule{thinmathspace}{0ex}}({e}^{j\omega})$.
Further, since q_{n} is an analytic signal, there exists a realvalued signal s_{n}, such that $q}_{n}={s}_{n}+j(h\ast s{)}_{n$, where discrete time Fourier transform of h_{n} is given by
Considering s_{n} to be a white noise with variance $\sigma}^{2$, it is easy to verify that $h}_{n}\ast {s}_{n$ is also white noise with variance $\sigma}^{2$ and uncorrelated to s_{n}. This makes q_{n} to be a ‘analytic’ white noise, that is power spectrum density of q_{n} is flat for $0>\omega >\pi$, and zero in $\pi >\omega >0$. We can then derive the following statespace representation, with real valued but 2D states as in Equation 1, but with additional constraint that $\upsilon}_{n,1$ and $\upsilon}_{n,2$ are related by $\upsilon}_{n,2}=(h\ast {\upsilon}_{1}{)}_{n$. But since such constraint is not easy to enforce during the inference or model learning from experimental data, so we relaxed the constraint on the statenoise covariance to be the one stated in Equation 1.
Extraction of instantaneous amplitude and phase
However, we can still compute the instantaneous amplitude and phase of the signal from this statespace representation from the following observation. The oscillation states assume their successive values in a way that traces limit cyclelike maps around origin (see Figure 1a). This allows us to define the instantaneous phase as the angle the oscillation state makes from a fixed reference direction at any given timepoint (Rosenblum et al., 1997, section A.1 and 2). For simplicity, we consider the positive direction on the real line as the reference. The amplitude is simply given by the distance from the origin.
In this sense, the oscillator statespace representation provides a generalization of phasor concept, similar to analytic signals (i.e., by allowing for timevarying amplitude, phase, and frequency, in contrast to invariant amplitude, phase, and frequency of phasor) but with relatively relaxed constraint on the relation between real and imaginary parts.
Appendix 3
Model parameter estimation and model selection
Negative variational free energy
For an Lchannel M/EEG recording ${\mathbf{y}}_{t},\text{}t=1,2,\cdots ,T$, $M$oscillator probabilistic statespace oscillator model admits to the following distribution:
Since $\mathbf{F}$ and $\mathbf{Q}$ are parameterized using hyperparameter $\mathit{\theta}$, we will reparameterize lefthand side of Equation A3.1 as $p\left(\left\{{\mathbf{x}}_{t},{\mathbf{y}}_{t}\right\}\mid \mathbf{C},\mathbf{R},\mathit{\theta},M\right)$. To simplify our exposition, we define $\mathbf{X}=\left[{\mathbf{x}}_{1},{\mathbf{x}}_{2},\cdots ,{\mathbf{x}}_{N}\right]$ and use $\mathbf{X}$ and $\left\{{\mathbf{x}}_{t}\right\}$ interchangeably (similarly for $\mathbf{Y}$). We also extensively use $\mathit{\text{vec}}(\circ )$ notation and associated terminology introduced in Muirhead, 1982.
We note that computation of ensemble likelihood of the presented model requires marginalizing over $\mathbf{X},\mathbf{C},\mathbf{R}$, and $M$ given the priors in Equations 4 and 6:
However, this involves an intractable integration that cannot be performed analytically. We avoid this intractability by using a lower bound to the ensemble likelihood as a surrogate for the same. Specifically, we invoke Neal–Hinton representation theorem (Neal and Hinton, 1998) to obtain a lower bound on the ensemble loglikelihood as
which holds for any arbitrary conditional distribution $q$ and can be computed analytically if $q$ is chosen carefully. This lower bound is known as negative variational free energy (Quinn and Šmídl, 2006) and attains the actual ensemble loglikelihood only when $q$ is the exact Bayes posterior distribution (Attias, 1999). In the last line, we use $\u27e8\circ \u27e9}_{q\left(\mathbf{X},\mathbf{C},\mathbf{R}\mid M\right)$ to denote the average of the expression within $\u27e8\u27e9$ with respect to the model posterior $q\left(\mathbf{X},\mathbf{C},\mathbf{R}\mid M\right)$.
Since negative variational free energy as a lower bound on the ensemble loglikelihood, maximization of negative variational free energy will tend to maximize of the ensemble loglikelihood. Even though global maximization of the ensemble loglikelihood is not guaranteed, the ensemble loglikelihood at the point of maximum negative variational free energy is guaranteed to at least greater than the negative variational free energy. On the other hand, the negative variational free energy can be shown to be proportional to the negative Kullback–Leibler divergence between the exact posterior and approximate posterior (Quinn and Šmídl, 2006). So, the maximization shall result in closer approximation exact posterior. In the next section, we work with the intuition that maximization of negative variational free energy leads to better approximation to the exact posterior and increment of ensemble loglikelihood.
Variational Bayes inference
However, the fact that the ensemble loglikelihood cannot be found analytically makes computation of the exact Bayes posterior distribution intractable. We thus employ an efficient VB inference procedure that approximates the exact Bayes posterior with a distribution of form:
that is, that factorizes over $\left\{{\mathbf{x}}_{t}\right\},\mathbf{C},\mathbf{R}$, conditional on the model structure, $M$. This particular choice of functional constraint leads to tractable conditional distributions when the trailing term of Equation A3.2 is subjected to maximization w.r.t. functional form:
Here, we omit the trivial derivation of the functional form for brevity of the presentation; curious readers are encouraged to expand the trailing term of Equation A3.2 and follow Attias, 1999, section 2.4 as an exercise to obtain the optimal functional forms.
Once the abovementioned optimal functional forms of the approximate posterior distribution have been identified, the next task is to find the optimal quantities that parameterize those functional forms. We show that the parameters of any of these distribution can be expressed in terms of the M/EEG data and moments of other two distributions. Generally speaking, the expressions can be found by collecting relevant terms in the expansion of trailing term of Equation A3.2, followed by simple algebraic manipulation. For example, the parameters of $q(\mathbf{R}\mid M)$ are given by
Expression for the parameters of the Gaussian distribution is a bit involved if one attempts collecting terms and try completing squares. Instead, we use the following facts about Gaussian logposteriors: (1) since the mode and mean of a multivariate Gaussian distribution coincide, the mean of Gaussian distributions can be found by maximizing the approximate logposterior, and (2) the negative Hessian of the logposterior is the inverse of covariance matrix (Fahrmeir, 1992). It is worth mentioning here that the logposterior maximization can be carried out in closed form by solving a highdimensional system of linear equations (firstorder optimality criterion, Boyd and Vandenberghe, 2004), which involves Hessian matrix inversion.
In case of the latent oscillator states, q($\mathbf{X}\mid M)$, the expressions of inverse covariance matrix and the mean vector is as follows:
We exploit the block tridiagonal structure of the hessian matrix to realize a fast and stable inverse (Asif and Moura, 2000; Jain et al., 2007). This facilitates fast computation of the mean and covariance matrix of the approximate logposterior.
Before tackling the case of the mixing matrix, $\mathbf{C}$, we note that each latent oscillator state is a pair of two coordinates and these two oscillator coordinates correspond to two consecutive columns of the mixing matrix, that is, contribution of any given latent oscillator state on the observation can be viewed as the inner product, that is, $\mathbf{y}}_{t}^{(l)}={\mathbf{c}}_{l,m}{\mathbf{x}}_{t}^{(m)$, so that the column pairs $\left({\mathbf{C}}_{:,2l1:2l}\right)$ and latent oscillator states $\left({\mathbf{X}}_{2l1:2l,:}\right)$ are only unique up to a scaling and a rotation. We resolve this ambiguity by simply fixing the first row of the mixing matrix column pair to $\left[\begin{array}{cc}1& 0\end{array}\right]$ for all the oscillators. With this modification, the parameters of $q(\mathbf{C}\mid M)$ are given as
where
Using the properties of Kronecker products (Muirhead, 1982), the inversion of the negative Hessian matrix and solution of the system of the linear equations can be carried out in a fast and efficient manner.
Evidently, this interconnectedness of the distributional parameters of the marginals of $\left\{{\mathbf{x}}_{t}\right\},\mathbf{C},\mathbf{R}$, given model structure $M$, is a testament to the interdependence of these three variables. Since joint posterior distribution of is assumed to factorize over the marginals, that is, $\left\{{\mathbf{x}}_{t}\right\},\mathbf{C},\mathbf{R}$, given model structure $M$ and observations $\mathbf{Y}$ are assumed to be independent, interdependence has been captured by the interrelated distributional parameters. In fact, the interrelated parameters pass information among them using the sufficient statistic (SS) of these distributions that are required to compute the averages in the update equations. The VB approach for posterior inference therefore leads to an iterative scheme for each model structure, $M$, when the hyperparameters $\left(\mathit{\theta},\alpha \right)$ are specified: we cyclically update the marginal posterior parameters of $\left\{{\mathbf{x}}_{t}\right\},\mathbf{C},\mathbf{R}$ according to Equations A3.8, A3.9, and A3.7, , respectively. Each time a distribution (i.e., its parameters) is updated, the SS associated with the distribution is also recomputed to pass to the next update. The iterations are repeated until a stopping criteria is met: stopping criteria could be a fixed number of iterations or when the objective, that is, variational free energy stabilizes. The marginal posterior parameters obtained at the last iteration are the optimal distributional parameters providing the VB inference.
Generalized EM algorithm
The VB inference of form presented in Equation A3.3 still requires the hyperparameters $\mathit{\theta}$ and $\alpha$ to be specified. The hyperparameters that maximize the negative variational free energy are the second best choice for the hyperparameters, right after the ones that maximize the ensemble likelihood. Since the direct maximization w.r.t. the hyperparameters is impractical, we use an generalized version of expectation maximization (EM) algorithm that uses the variational posterior inference instead of the exact posterior to compute the expectation of the complete loglikelihood expression (Dempster et al., 1977; Fahrmeir, 1992; Shumway and Stoffer, 1982) to learn these hyperparameters from the M/EEG recording. One requires the following averages computed w.r.t. the approximate posterior $q(\mathbf{X}\mid M)$,
to update the hyperparameters ${\mathit{\theta}}^{(m)}=\left({f}^{(m)},\text{}{a}^{(m)},\text{}{({\sigma}^{2})}^{(m)}\right)$ as follows:
On the other hand, α update takes a relatively simple form:
In brief, we start with an initial set of the hyperparameters θ and α, and keep alternating between variational posterior inference (Estep) and hyperparameter update (Mstep) until the hyperparameters or the variational free energy stabilize. Once the suitable hyperparameters are obtained, the variational inference is run for the last time, providing us with the final oscillation components.
Evaluation of variational free energy
Considering update rules of Equations A3.7, A3.13, , and Equation A3.10 we can simplify the second term in Equation A3.2 as
After noting the following property of Gaussian random variable, $\mathbf{Z}\sim {\mathcal{N}}_{p}({\mathit{\mu}}_{z},{\mathbf{\Sigma}}_{z})$:
Equation A3.14 can be further simplified as:
We note here that the logdeterminant of $\mathbf{\Sigma}}_{x$ could be computed while solving Equation A3.8 exploiting the tridiagonal structure of ${\mathrm{\Sigma}}_{x}}^{1$. Similarly, the logdeterminant of $\mathbf{\Sigma}}_{c$ could be computed while solving Equation A3.9 using the properties of Kronecker products. Matrices $\mathbf{\Phi}$ and $\mathbf{\Omega}$ are in the sensor space dimension, thus logdeterminant computation is relatively cheaper. The remaining matrix $\mathbf{Q}$ is a diagonal matrix, thus its logdeterminant can be computed easily. As a whole, the variational free energy for model structure $M$ can be computed as a byproduct of the iterative updates.
Initialization of parameters and hyperparameters
Because the ensemble loglikelihood (hence the negative variational free energy) is not concave in general, initial values for the parameters and hyperparameters must be carefully chosen. We use the interpretation of the MVAR processes as a superimposition of multiple oscillatory components (Matsuda and Komaki, 2017b; Quinn et al., 2021; Neumaier and Schneider, 2001).
First of all, we estimate $\mathbf{\Psi}$ using the fact that for singlechannel noisy recording, $y}^{(l)$ where ${S}_{{y}^{(l)}{y}^{(l)}}(f)$ denotes the PSD of the signal:
We generalize Equation A3.16 to multichannel recording by highpass filtering the M/EEG recording at ${f}_{s}\mathrm{\Delta}f$, ($\mathrm{\Delta}f>0$), and estimating inverseWishart parameters from the sample covariance of the highpass filtered data, $\stackrel{~}{\mathbf{y}}}_{t$. We set $\nu =T$, and $\mathbf{\Psi}=\sum _{t=1}^{T}{\mathbf{x}}_{t}{\mathbf{x}}_{t}^{\mathrm{\top}}$. We also initialize the noise covariance matrix as $\mathbf{R}=\sum _{t=1}^{T}{\mathbf{x}}_{t}{\mathbf{x}}_{t}^{\mathrm{\top}}/T$.
We then fit an MVAR model of order $p$ on the multichannel data. Order $p$ is chosen according to Akaike information criteria (Cavanaugh and Neath, 2019). We perform eigendecomposition of the companion form of the MVAR parameter matrix (Neumaier and Schneider, 2001) to yield $\mathit{\lambda}$, $\mathbf{V}$, and $\mathbf{W}$ as eigenvalues, right and left eigenvectors. We choose the eigenvalues with $\mathrm{\Im}\{{\lambda}^{(m)}\}>0$ and corresponding right eigenvectors $\mathbf{w}}^{(m)$ and collect them as $\hat{\mathit{\lambda}}$ and $\hat{\mathbf{W}}$ (assume there are $M$ such eigenvalues). The frequency and damping parameters are initialized as
For ${\sigma}^{(1)}}^{2},{{\sigma}^{(m)}}^{2},\cdots ,{{\sigma}^{(M)}}^{2$, we use the linear combination of the theoretical PSD of the oscillatory components that approximate the multitaper PSD (Babadi and Brown, 2014) of the leading time series the best. In short, we compute the multitaper PSD of the leading time series and the theoretical PSDs of the oscillatory components (Soulat et al., 2022, Supplementary Information) on the same frequency grid of spacing ${f}_{s}/(2K)$. The multitaper PSD, $\mathit{\rho}$, is a $1\times K$ vector, whereas the theoretical PSDs, $\mathit{\varphi}}^{(m)$, form a $M\times K$ matrix $\mathbf{\Phi}$. We initialize the ${\sigma}^{(1)}}^{2},{{\sigma}^{(m)}}^{2},\cdots ,{{\sigma}^{(M)}}^{2$ as the solution of the following linear equation:
In order to select a subset of the discovered M oscillatory components, we sort the oscillatory components (and columns of the $\hat{\mathbf{W}}$) according to their theoretical contribution (most to least):
and select the leading oscillatory parameters (and columns of the $\hat{\mathbf{W}})$.
With Equation A3.17, the spatial mixing components for the leading time series become ${\mathbf{c}}_{l,m}=[1\text{}\text{}0]$. The rest of the elements of spatial mixing matrix, $\mathbf{C}$, are initialized from $\hat{\mathbf{W}}$ as
Lastly, α is initialized by $2\phantom{\rule{thinmathspace}{0ex}}(L1)M/{\Vert {\mathbf{C}}_{2:L}\Vert}_{Fro}$ with the abovementioned $\mathbf{C}$.
Model structure posterior
Finally, it is easy to verify that the following discrete distribution maximizes Equation A3.2:
which can be computed readily once the final variational inferences are made for $M=1,\cdots ,{M}_{max}$. The negative variational free energy of the modeling framework is given by
Empirical Bayes inference and model selection
Since the model order is not known, we estimate the model order, $M$, by maximizing the approximate posterior distribution of model order, $q(M)$, and treat $M$ as if it were known to be equal to this estimate. The same procedure is followed for the hyperparameters, for which we set them to the values that maximize the negative variational free energy. This way of determining model order or estimating unknown parameters comes under the umbrella of empirical Bayes framework (Robbins, 1964; Efron and Morris, 1973; Efron and Morris, 1975; Morris, 1983). Briefly, it provides a firstorder approximation to the posterior mean of these quantites and neglects the uncertainty (i.e., secondorder statistics) of the estimated quantities.
Appendix 4
Comparison of OCA to traditional approaches in experimental EEG data
We compare OCA with an oscillation finding approach based on channelwise PSD visualization and ICA in a subset of real human EEG recording, taken from Figure 4 to provide empirical justification for OCA. For Appendix 4—figure 1, we used a 3.5slong EEG segment during the maintenance of 2 µg effectsite concentration of propofol. We expect strong frontal alpha activity and overall increase in slow activity as per existing literature (Purdon et al., 2013; Cimenser et al., 2011).
The frequencydomain approach examined here is based on the popular multitaper method (de Cheveigné and Parra, 2014): given a bandwidth parameter, power spectrum density is computed for individual channels and a butterfly plot is made (see Appendix 4—figure 1A, left panels). The oscillatory alpha activities are then identified visually (red overlay) as the peaks in these power spectrum plots, and their scalp distribution are obtained by averaging channelwise power within given frequency bands around the peaks as shown in Appendix 4—figure 1A, right panels. We chose three different bandwidth, 1 Hz, 2 Hz, and 5 Hz, to demonstrate the inconsistency of frequency domain methods in ‘producing’ peaks in the line plots of the spectrum. Spectrum with 1 Hz bandwidth (top row) exhibits a lot of spurious peaks, whereas spectrum with 5 Hz bandwidth (bottom row) exhibits wide peaks, possibly encompassing multiple peaks. This inconsistency of ad hoc visual identification of peaks makes it subjective. Similarly, the topographic plots in Appendix 4—figure 1A demonstrates similar inconsistency: as estimation bandwidth increases (i.e., spectral resolution decreases), the topographic maps becomes less susceptible to spurious power leakage until the estimation bandwidth matches the bandwidth of underlying processes (compare middle row to top row). As the estimation bandwidth further increases, a severe spectral leakage renders the topographic maps largely uninformative of underlying separable oscillatory sources (bottom row).
ICA decomposition on this multichannel recordings identifies components that are neither slow oscillation nor alpha oscillation but combination of multiple oscillations as one would expect based on the observation that cumulative histograms of EEG recordings is approximately Gaussian (Brookes et al., 2011). We use ‘extended infomax’ (Lee et al., 1999) implementation provided by MNEpython 1.2 (Gramfort et al., 2014), the leading four ICA components are shown in Appendix 4—figure 1B.
Lastly, OCA is able to identify several oscillatory components (in both slow and alpha band) from this multichannel data, and extract their individual time series and topographic distribution. OCA is able to do so due to its explicit modeling of temporal dynamics in the form of the structured statespace representation. Appendix 4—figure 1B shows leading four components whose estimated center frequencies are within alpha band (9–13 Hz). This also demonstrates that OCA is bandwidthfree, that is, it adjusts estimation bandwidth to match the bandwidths of the underlying oscillations in a datadriven way as evident from the spectrum plots of the estimated oscillation time courses.
Data availability
The current manuscript is a computational study, so no data have been generated for this manuscript. An implementation of the state inference and parameter learning algorithms described in this article is available as part of purdonlabmeeg Python library on GitHub at https://github.com/proloyd/purdonlabmeeg (copy archived at Das, 2024) under MIT license. The simulation script to generate Figure 2, included in the repository, demonstrates how to run OCA on preprocessed EEG/MEEG data and visualize the results. Codes for analyzing experimental EEG and MEG recordings exactly follow the simulation script, i.e., is essentially a single function call with the experimental data (preprocessed according to previous section) as an argument and the results are to also be visualized as done in the simulation script. Thus they were not included as a separate repository, but can be made available to individuals upon request.

ConnectomeDBID 104012MEG. MEG data in Section Resting state MEG recording.
References

ConferenceInversion of block matrices with block banded inverses: application to KalmanBucy filtering2000 International Conference on Acoustics, Speech and Signal Processing. pp. 608–611.https://doi.org/10.1109/ICASSP.2000.862055

ConferenceInferring parameters and structure of latent variable models by variational bayesProceedings of the Fifteenth Conference on Uncertainity in Artificial Intelligence. pp. 21–30.

A review of multitaper spectral analysisIEEE Transactions on BioMedical Engineering 61:1555–1564.https://doi.org/10.1109/TBME.2014.2311996

ConferenceState space oscillator models for neural data analysisAnnual International Conference of the IEEE Engineering in Medicine and Biology Society. pp. 4740–4743.https://doi.org/10.1109/EMBC.2018.8513215

Numerical methods for computing angles between linear subspacesMathematics of computation 27:579.https://doi.org/10.2307/2005662

BookMcGrawhill series in electrical and computer engineeringIn: Bracewell RN, editors. The Fourier Transform and Its Applications. McGraw Hill. pp. 361–363.

Neuronal oscillations in cortical networksScience 304:1926–1929.https://doi.org/10.1126/science.1099745

BookRhythms of the BrainOxford University Press.https://doi.org/10.1093/acprof:oso/9780195301069.001.0001

Highorder contrasts for independent component analysisNeural Computation 11:157–192.https://doi.org/10.1162/089976699300016863

The Akaike information criterion: Background, derivation, properties, application, interpretation, and refinementsWIREs Computational Statistics 11:1460.https://doi.org/10.1002/wics.1460

Assessing transient crossfrequency coupling in EEG dataJournal of Neuroscience Methods 168:494–499.https://doi.org/10.1016/j.jneumeth.2007.10.012

Using spatiotemporal source separation to identify prominent features in multichannel data without sinusoidal filtersThe European Journal of Neuroscience 48:2454–2465.https://doi.org/10.1111/ejn.13727

ConferenceExtracting common oscillatory timecourses from multichannel recordings: oscillation component analysis56th Asilomar Conference on Signals, Systems, and Computers. pp. 602–606.https://doi.org/10.1109/IEEECONF56349.2022.10052084

SoftwarePurdonlabmeeg, version swh:1:rev:c7e69979d99174348bac7cdd42815ff5815bee52Software Heritage.

Scanning for oscillationsJournal of Neural Engineering 12:066020.https://doi.org/10.1088/17412560/12/6/066020

Maximum likelihood from incomplete data via the EM AlgorithmJournal of the Royal Statistical Society Series B 39:1–22.https://doi.org/10.1111/j.25176161.1977.tb01600.x

Parameterizing neural power spectra into periodic and aperiodic componentsNature Neuroscience 23:1655–1665.https://doi.org/10.1038/s4159302000744x

Stein’s estimation rule and its competitors—an empirical bayes approachJournal of the American Statistical Association 68:117–130.https://doi.org/10.1080/01621459.1973.10481350

Data analysis using stein’s estimator and its generalizationsJournal of the American Statistical Association 70:311–319.https://doi.org/10.1080/01621459.1975.10479864

Posterior mode estimation by extended kalman filtering for multivariate dynamic generalized linear modelsJournal of the American Statistical Association 87:501–509.https://doi.org/10.1080/01621459.1992.10475232

Characterizing endogenous delta oscillations in human MEGScientific Reports 13:11031.https://doi.org/10.1038/s41598023375141

BookForecasting, structural time series models and the kalman filterCambridge University Press.https://doi.org/10.1017/CBO9781107049994

General modelbased filters for extracting cycles and trends in economic time seriesReview of Economics and Statistics 85:244–255.https://doi.org/10.1162/003465303765299774

Oscillatory dynamics of prefrontal cognitive controlTrends in Cognitive Sciences 20:916–930.https://doi.org/10.1016/j.tics.2016.09.007

ConferenceKeeping the Neural Networks Simple by Minimizing the Description Length of the WeightsProceedings of the Sixth Annual Conference on Computational Learning Theory  COLT ’93. pp. 5–13.https://doi.org/10.1145/168304.168306

Independent component analysis: algorithms and applicationsNeural Networks 13:411–430.https://doi.org/10.1016/s08936080(00)000265

Independent component analysis: recent advancesPhilosophical Transactions of the Royal Society A 371:20110534.https://doi.org/10.1098/rsta.2011.0534

BookNumerically stable algorithms for inversion of block tridiagonal and banded matricesDepartment of Electrical and Computer Engineering Technical Reports.

Time series decomposition into oscillation components and phase estimationNeural Computation 29:332–367.https://doi.org/10.1162/NECO_a_00916

Multivariate time series decomposition into oscillation componentsNeural Computation 29:2055–2075.https://doi.org/10.1162/NECO_a_00981

BookObserved brain dynamicsOxford University Press.https://doi.org/10.1093/acprof:oso/9780195178081.001.0001

Parametric empirical bayes inference: theory and applicationsJournal of the American Statistical Association 78:47–55.https://doi.org/10.1080/01621459.1983.10477920

BookLearning in graphical modelsIn: Jorden MI, editors. A view of the em algorithm that justifies incremental, sparse, and other variants. Springer Netherlands. pp. 355–368.https://doi.org/10.1007/9789401150149_12

Estimation of parameters and eigenmodes of multivariate autoregressive modelsACM Transactions on Mathematical Software 27:27–57.https://doi.org/10.1145/382043.382304

Blind source separation via generalized eigenvalue decompositionJournal of Machine Learning Research 4:1261–1269.https://doi.org/10.5555/945365.964305

ConferenceMaximum likelihood blind source separation: a contextsensitive generalization of ICAAdvances in neural information processing systems. pp. 613–619.

ConferenceA mutual information measure of phaseamplitude coupling using high dimensional sparse models44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC). pp. 21–24.https://doi.org/10.1109/EMBC48229.2022.9871816

Multivariate empirical mode decompositionProceedings of the Royal Society A 466:1291–1302.https://doi.org/10.1098/rspa.2009.0502

The empirical bayes approach to statistical decision problemsThe Annals of Mathematical Statistics 35:1–20.https://doi.org/10.1214/aoms/1177703729

ConferenceCanonical correlation analysis of EEG for classification of motor imagery2017 IEEE International Conference on Systems, Man and Cybernetics (SMC). pp. 2317–2321.https://doi.org/10.1109/SMC.2017.8122967

Phase synchronization in driven and coupled chaotic oscillatorsIEEE Transactions on Circuits and Systems I 44:874–881.https://doi.org/10.1109/81.633876

An approach to time series smoothing and forecasting using the EM algorithmJournal of Time Series Analysis 3:253–264.https://doi.org/10.1111/j.14679892.1982.tb00349.x

State space methods for phase amplitude coupling analysisScientific Reports 12:15940.https://doi.org/10.1038/s41598022184753

How advances in neural recording affect data analysisNature Neuroscience 14:139–142.https://doi.org/10.1038/nn.2731

Signalspace projection method for separating MEG or EEG into componentsMedical & Biological Engineering & Computing 35:135–140.https://doi.org/10.1007/BF02534144

Neurophysiological and computational principles of cortical rhythms in cognitionPhysiological Reviews 90:1195–1268.https://doi.org/10.1152/physrev.00035.2008

ConferenceKnee point detection on bayesian information criterion20th IEEE International Conference on Tools with Artificial Intelligence (ICTAI). pp. 431–438.https://doi.org/10.1109/ICTAI.2008.154
Article and author information
Author details
Funding
National Institutes of Health (R01AG05408101A1)
 Patrick L Purdon
Tiny Blue Dot Foundation
 Patrick L Purdon
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Acknowledgements
This work was supported by the National Institutes of Health (grant no. R01AG05408101A1) and Tiny Blue Dot Foundation. As requested by the Human Connectome Project, the following text is copied verbatim: MEG data in sSection ‘Resting state MEG recording’ ‘were provided by the Human Connectome Project, WUMinn Consortium (Pprincipal Iinvestigators: David Van Essen and Kamil Ugurbil; 1U54MH091657) funded by the 16 NIH Institutes and Centers that support the NIH Blueprint for Neuroscience Research; and by the McDonnell Center for Systems Neuroscience at Washington University.
Version history
 Preprint posted:
 Sent for peer review:
 Reviewed Preprint version 1:
 Reviewed Preprint version 2:
 Version of Record published:
Cite all versions
You can cite all versions using the DOI https://doi.org/10.7554/eLife.97107. This DOI represents all versions, and will always resolve to the latest one.
Copyright
© 2024, Das et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics

 592
 views

 46
 downloads

 0
 citations
Views, downloads and citations are aggregated across all versions of this paper published by eLife.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading

 Neuroscience
Female sexual receptivity is essential for reproduction of a species. Neuropeptides play the main role in regulating female receptivity. However, whether neuropeptides regulate female sexual receptivity during the neurodevelopment is unknown. Here, we found the peptide hormone prothoracicotropic hormone (PTTH), which belongs to the insect PG (prothoracic gland) axis, negatively regulated virgin female receptivity through ecdysone during neurodevelopment in Drosophila melanogaster. We identified PTTH neurons as doublesexpositive neurons, they regulated virgin female receptivity before the metamorphosis during the thirdinstar larval stage. PTTH deletion resulted in the increased EcRA expression in the whole newly formed prepupae. Furthermore, the ecdysone receptor EcRA in pC1 neurons positively regulated virgin female receptivity during metamorphosis. The decreased EcRA in pC1 neurons induced abnormal morphological development of pC1 neurons without changing neural activity. Among all subtypes of pC1 neurons, the function of EcRA in pC1b neurons was necessary for virgin female copulation rate. These suggested that the changes of synaptic connections between pC1b and other neurons decreased female copulation rate. Moreover, female receptivity significantly decreased when the expression of PTTH receptor Torso was reduced in pC1 neurons. This suggested that PTTH not only regulates female receptivity through ecdysone but also through affecting female receptivity associated neurons directly. The PG axis has similar functional strategy as the hypothalamic–pituitary–gonadal axis in mammals to trigger the juvenile–adult transition. Our work suggests a general mechanism underlying which the neurodevelopment during maturation regulates female sexual receptivity.

 Neuroscience
Theoretical computational models are widely used to describe latent cognitive processes. However, these models do not equally explain data across participants, with some individuals showing a bigger predictive gap than others. In the current study, we examined the use of theoryindependent models, specifically recurrent neural networks (RNNs), to classify the source of a predictive gap in the observed data of a single individual. This approach aims to identify whether the low predictability of behavioral data is mainly due to noisy decisionmaking or misspecification of the theoretical model. First, we used computer simulation in the context of reinforcement learning to demonstrate that RNNs can be used to identify model misspecification in simulated agents with varying degrees of behavioral noise. Specifically, both prediction performance and the number of RNN training epochs (i.e., the point of early stopping) can be used to estimate the amount of stochasticity in the data. Second, we applied our approach to an empirical dataset where the actions of low IQ participants, compared with high IQ participants, showed lower predictability by a wellknown theoretical model (i.e., Daw’s hybrid model for the twostep task). Both the predictive gap and the point of early stopping of the RNN suggested that model misspecification is similar across individuals. This led us to a provisional conclusion that low IQ subjects are mostly noisier compared to their high IQ peers, rather than being more misspecified by the theoretical model. We discuss the implications and limitations of this approach, considering the growing literature in both theoretical and datadriven computational modeling in decisionmaking science.